# Bloom filter

A Bloom filter uses *k* hash functions *H**1**,…,H**k*. The Bloom filter is stored as a bit vector of length *M* bits. All bits are initialized to 0. When an item is inserted into the Bloom filter, the item is hashed with each of the *k* hash functions to derive *k* hash values, and each hash value points to an index in {1,…,*M*} of the Bloom filter bit vector that is then set to 1. The Bloom filter indicates that an item is included if the value at each index determined by the *k* hash functions is set to 1. Otherwise, the item is definitely not included. Checking if an item is in the Bloom filter just requires recomputing each of these *k* hash values and then checking the bit at the index.

False positives are possible but occur with small probability. To maintain a false positive probability of *p* for *N* items set *k = -log p* and *M = *1.44 *k N*. For example, with *N* inserted items and *M* = 14*N* the false positive probability is approximately 1/1000.

Bloom filters are a secure commitment to the data items that are *not* inserted into the filter because they do not have any false negatives. They can be used to practically commit to a subset *S** of items from a relatively small superset *S* (e.g. the UTXOs as a subset of all the TXOs) by inserting all items from *S* that are *not* in *S** (e.g. the subset of spent TXOs). The size of this commitment is proportional to the set *S* and is therefore only practical when the superset of possible items is not too large.

Bloom filters are not a secure commitment to the data items that inserted into the filter unless the false positive rate *p* is negligibly small, e.g. *log p = –*128. In this case, the filter contains approximately 185 bits per inserted item, which is 28% smaller than storing the 32 byte SHA-256 hash of every element in the set.

Above is a diagram for a basic Bloom filter with *k* = 3, where A, B, and C are inserted with hash functions h1, h2, and h3.