6
Min Read

How to Pin a Document to a Blockchain

Harris Wilson
Web3 Specialist
July 8, 2024
How to Pin a Document to a Blockchain
Update
Since this post was written, Hyperledger FireFly has reached 1.0. Learn more here!

In the world of blockchain-backed applications, managing data storage effectively is a common business challenge. Blockchains, by design, are not optimized for storing large volumes of data. They excel in providing decentralization and immutability, but their limited storage capacity poses a problem for businesses handling extensive documents and sensitive information.

To address this, a combination of on-chain and off-chain storage is often employed. On-chain storage ensures data integrity and transparency, crucial for compliance with regulations like GDPR. Meanwhile, off-chain storage offers the scalability and efficiency needed to manage vast amounts of data while maintaining control over its privacy and visibility.

We maintain a connection between off-chain data and our state machine with a concept called ‘pinning’. Here, applications establish a cryptographic proof of the data object which can be transacted on-chain alongside the appropriate contextual metadata to reconcile what the data is to necessary parties.

Illustration of data storage on a blockchain

A Step-by-Step Guide to Pinning Data to Your Blockchain

To learn how to pin a document to a blockchain, watch this video or follow the steps outlined below.

1. Store a data object in a secure off-chain storage location to be subsequently referenced on a blockchain.

Some options for off-chain storage can be found below:

  • IPFS is frequently paired with blockchain networks, and is a good choice for storing data that should be visible to the world, or to all of the network operators in a particular permissioned network. IPFS nodes form a peer-to-peer network. Each piece of data uploaded to an IPFS node will receive a unique content identifier (CID) based on a hash of the data, and then will be stored and replicated to other IPFS peers.
  • Hyperledger FireFly provides flexible options for private data storage. Simple text/JSON values are stored in a database such as PostgreSQL, while blob/binary data is stored via the Data Exchange service in cloud storage such as Amazon S3. Both are configurable via the plugin architecture, and can be extended with features for selectively sharing and disclosing data to other members of a network.
  • Other locations, such as existing databases and cloud storage accounts, can also be viable for off-chain storage.

In any case, the data storage implementation needs to meet only two requirements:

  • Each document or piece of data should be stored alongside a unique, verifiable identifier derived from its contents (such as a cryptographic hash, detailed below). It should be possible to efficiently look up any item using such an identifier.
  • It should define a naming scheme for describing a stored item, usually in the form of a URI, which distinguishes it from data stored elsewhere. For instance, IPFS typically uses the scheme ipfs://<cid>, and Hyperledger FireFly often uses firefly://data/<hash>.

2. Compute a cryptographic hash of the document contents and record it on-chain.

Pinning a document to the blockchain is almost always done by computing a one-way cryptographic hash of the document contents (such as SHA-256) and then storing that hash via a blockchain transaction (often alongside some other identifying information).

Here are some common examples of token standards that use this pattern:

  • ERC-721 includes an optional (but very frequently implemented) "metadata extension" which allows each NFT to be tied to a token URI. This token URI generally contains a reference to a JSON file (stored in IPFS or similar off-chain storage) containing additional metadata about the NFT and what it represents. The metadata JSON file may in turn contain additional references to other off-chain pieces of data.
  • ERC-1155 includes an optional ERC1155Metadata_URI extension which has nearly identical semantics to ERC-721 URIs.
  • ERC-1400 (and precursor ERC-1643) includes a setDocument method for recording the hash and URI of one or more off-chain documents related to the token.

Note, document pinning is not restricted to usage with tokens. Any smart contract can leverage this pattern by storing a hash and/or URI that references a document stored elsewhere.

3. Access the off-chain storage location to look up a document via the identifier recorded on-chain.

Each network participant can subscribe to blockchain events pertaining to the smart contract(s) that they care about. From these subscriptions, they are made aware of new transactions and the contents, which can include these cryptographic hashes referenced in Step 2.

Once made aware of a new data object, they can leverage the cryptographic proof contained in the transaction to retrieve the data from their off-chain storage location.

Beyond the storage of the data, there typically needs to be a mechanism for delivering files off-chain from one party’s storage location to another, so that members of a network can access the data when necessary based on the data’s identifier.

  • IPFS provides this capability inherently due to its distributed nature - anyone with access to the network can retrieve a given file using the CID.
  • In the case of private data storage, some messaging component is commonly required that enables connectivity between established stores of information for each party. Hyperledger FireFly includes this capability out of the box, leveraging its pluggable architecture to allow each member in a network to configure their own data storage component while maintaining inherent knowledge of each identity and messaging rails to deliver from point to point.

4. Re-compute the document hash and verify it against the blockchain record to prove authorship and validity.

The logic to evaluate these proofs is commonly contained within an application layer containing the  knowledge of how to replicate the proof based on a data object input. Because open cryptographic standards are leveraged, this logic is not difficult to embed across various access points in your apps.

Final Thoughts

Document pinning is a common practice, and can take many forms, but the fundamentals of the process require reliable infrastructure components to maintain the linkage and serve relevant data objects when called upon. The benefits of this pattern for blockchain applications are immense, as it can unlock more flexible privacy models in multi-party use cases and allow for higher transaction throughput and more optimized storage on chain.

The Kaleido platform provides both the on- and off-chain data storage services as well as the smart contract management engine to rapidly build and deploy these use cases at scale. The platform’s robust API surface ensures developers can easily connect directly to their internal systems and enterprise applications for a seamless integration requiring no low-level blockchain expertise.

Explore Simplified Blockchain Storage

Kaleido makes storage click-button simple. Put our platform to work for you.

Try It Free

Explore Simplified Blockchain Storage

Kaleido makes storage click-button simple. Put our platform to work for you.

Try It Free
Interested in Blockchain?

Start learning blockchain and creating enterprise solutions today with a free Kaleido account!

Create Free Account
Don't forget to share this article!
Interested in Blockchain?

Start learning blockchain and creating enterprise solutions today with a free Kaleido account!

Create Free Account

Explore Simplified Blockchain Storage

Kaleido makes storage click-button simple. Put our platform to work for you.

Try It Free

Explore Simplified Blockchain Storage

Kaleido makes storage click-button simple. Put our platform to work for you.

Try It Free

The Ultimate Enterprise Blockchain Glossary

Your guide to everything from asset tokenization to zero knowledge proofs

Download Now

Swift Utilizes Kaleido in New CBDC Sandbox

Learn how Swift, the world’s leading provider of secure financial messaging services, utilizes Kaleido in its CBDC Sandbox project.

Download Now

Related Posts

Confidential UTXO Model: Enhancing Blockchain Privacy for Tokenization

Confidential UTXO: Preserving Privacy in Blockchain-Based Systems

Marc Lewis
Managing Editor
Privacy, Security, Scalability: Comparing UTXO vs. Account Model

Privacy, Security, Scalability: Comparing UTXO vs. Account Model

Marc Lewis
Managing Editor
Blockchain Privacy for EVM: An Overview of the Evolving Landscape

Decoding Blockchain Privacy for EVM: An Overview of the Evolving Landscape

Ray Chen
Product Manager

Blockchain made radically simple for the enterprise

No Credit Card Required
ISO27K & SOC2 Type 2 Compliant
Free Training & Support