The biggest controversy in AI training data last year wasn't about the lack of materials, but about the inability to prove their origin—models would be criticized for "data infringement" as soon as they went live. The recently launched Seal module by Walrus ecosystem offers a new approach: when uploading files, they are first split into hundreds of fragments using erasure coding, then access permissions are directly written into objects on the Sui chain via threshold keys. The inference service can only access authorized fragments in an isolated environment, and the original files are completely inaccessible. This way, the source of model training data can stand firm legally, the community can verify it, and there's no way to shift blame.
Compared to other solutions—such as a major storage platform that only provides content hashes, or another platform that uses publicly permanent storage for a one-time deal—Seal combines privacy protection, controllable deletion, and tamper-proof features, which were originally conflicting requirements.
The downside is also obvious: key rotation costs are not low. If you need to replace 200GB of data keys at once, you must first unlock the old keys and then authorize new ones. The operation process is more complex than conventional solutions, especially when rushing to meet project deadlines, which can be a headache for development teams. However, community-provided script templates are already circulating, and future integration into CI/CD workflows should improve this significantly.
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
20 Likes
Reward
20
7
Repost
Share
Comment
0/400
FlashLoanLarry
· 01-09 11:54
This Seal module is indeed powerful. The erasure coding fragment splitting trick makes it impossible for the data source to shift blame.
View OriginalReply0
BloodInStreets
· 01-09 11:52
Hey, finally someone has sorted out this mess. The previous plans were really a joke.
View OriginalReply0
SlowLearnerWang
· 01-09 11:36
Oops, it's another thing I should have paid attention to earlier but only now understand... The erasure coding fragment splitting trick is indeed brilliant. Finally, someone has made the "I didn't infringe" matter crystal clear.
View OriginalReply0
WhaleShadow
· 01-09 11:34
Now the legal team really has no excuse to shift blame, but someone has to die before the key rotation is postponed beyond the deadline.
View OriginalReply0
OldLeekConfession
· 01-09 11:34
Wow, this is the real on-chain proof of existence. Finally, someone has figured out how to trace data sources.
View OriginalReply0
MetaverseVagabond
· 01-09 11:27
Wow, someone finally sorted out this mess. The combination of erasure coding and threshold keys is indeed powerful.
The biggest controversy in AI training data last year wasn't about the lack of materials, but about the inability to prove their origin—models would be criticized for "data infringement" as soon as they went live. The recently launched Seal module by Walrus ecosystem offers a new approach: when uploading files, they are first split into hundreds of fragments using erasure coding, then access permissions are directly written into objects on the Sui chain via threshold keys. The inference service can only access authorized fragments in an isolated environment, and the original files are completely inaccessible. This way, the source of model training data can stand firm legally, the community can verify it, and there's no way to shift blame.
Compared to other solutions—such as a major storage platform that only provides content hashes, or another platform that uses publicly permanent storage for a one-time deal—Seal combines privacy protection, controllable deletion, and tamper-proof features, which were originally conflicting requirements.
The downside is also obvious: key rotation costs are not low. If you need to replace 200GB of data keys at once, you must first unlock the old keys and then authorize new ones. The operation process is more complex than conventional solutions, especially when rushing to meet project deadlines, which can be a headache for development teams. However, community-provided script templates are already circulating, and future integration into CI/CD workflows should improve this significantly.