Categories

Bloom filter

You are here:
< All Topics

A Bloom filter uses k hash functions H1,…,Hk. The Bloom filter is stored as a bit vector of length M bits. All bits are initialized to 0. When an item is inserted into the Bloom filter, the item is hashed with each of the k hash functions to derive k hash values, and each hash value points to an index in {1,…,M} of the Bloom filter bit vector that is then set to 1. The Bloom filter indicates that an item is included if the value at each index determined by the k hash functions is set to 1. Otherwise, the item is definitely not included. Checking if an item is in the Bloom filter just requires recomputing each of these k hash values and then checking the bit at the index.

 

False positives are possible but occur with small probability. To maintain a false positive probability of p for N items set k = -log p and M = 1.44 k N. For example, with N inserted items and M = 14N the false positive probability is approximately 1/1000.

 

Bloom filters are a secure commitment to the data items that are not inserted into the filter because they do not have any false negatives. They can be used to practically commit to a subset S* of items from a relatively small superset S (e.g. the UTXOs as a subset of all the TXOs) by inserting all items from S that are not in S* (e.g. the subset of spent TXOs). The size of this commitment is proportional to the set S and is therefore only practical when the superset of possible items is not too large.

 

Bloom filters are not a secure commitment to the data items that inserted into the filter unless the false positive rate p is negligibly small, e.g. log p = –128. In this case, the filter contains approximately 185 bits per inserted item, which is 28% smaller than storing the 32 byte SHA-256 hash of every element in the set.

 

 

Above is a diagram for a basic Bloom filter with k = 3, where A, B, and C are inserted with hash functions h1, h2, and  h3.

Table of Contents
Scroll to Top