Merkle trees
A Merkle tree is a binary tree of SHA-256 (or any other collision-resistant hash function) hashes built over an ordered list of data items. The data items can be arbitrarily large. The tree for N data items consists of 2N-1 nodes in total => e.g N=8, Total number of nodes = 8 +4 +2 +1 = 15 = 2N-1. Every node of the tree is assigned a label. The leaves of the Merkle tree are labeled with the N data items. The label of each internal parent node of the Merkle tree is the hash of the two labels on its two child nodes. The root of the tree is labeled with the Merkle root hash, called the Merkle root. Every label is 32 bytes.
Membership witnesses: Users who do not store any of the data items in the Merkle tree can still verify the inclusion of a data item. Verifying inclusion of a data item in the Merkle tree requires checking the correctness of a path of hashes from the data item to the root of the tree. The proof required for verification contains the labels on the sibling nodes of all nodes along a path from the data item leaf to the root of the tree. In total this is a 32 log N byte proof.
Updates: When a new item is appended to the end of the list, the new Merkle root can be updated by recomputing only a logarithmic number of hashes. The Merkle root can be updated in this way while streaming the new items of the list and only storing log N existing labels. These are the labels on the nodes that will be required for the inclusion proof of the new item. Similarly, changing the value of an existing data item at a leaf of the tree requires updating only the labels up the path from leaf to root and can be recomputed given the log N inclusion proof labels for the previous value of the data item.
Deletes: Deletion is a special case of an update. An item is deleted from the tree by updating the value of the data item at the existing position to NULL. The labels on NULL nodes no longer need to be stored. The same applies to labels of nodes that are a root of a subtree of only NULL nodes.
Rebalancing: Deletion of many data items will eventually result in many NULL nodes in the tree and thus an imbalanced tree whose depth is logarithmic in the total number of items ever inserted (i.e. including NULL nodes) rather than the current number of items. An optimization is therefore to use a self-balancing tree data-structure such as a 2-3 tree rather than a plain binary tree. Self-balancing trees retain logarithmic depth in the total number of current (non-NULL) items. A Merkle tree can be constructed over any tree in the same way as it is constructed over a standard binary tree. Deleting an item in a 2-3 tree requires reorganization of only a logarithmic number of sibling relationships in the tree and therefore requires updating only a logarithmic number of hash labels.
In the above diagram, “Tx0” represents a transaction. Its corresponding hash is “Hash0”. After “Tx0” and “Tx1” have been made, “Hash0” and “Hash1” are combined to produce “Hash01”. This process is repeated until the last hash, or the root hash, is produced.