Saturday, December 12, 2009

Bloom Filter

One problem is to determine if a given value x exists in a set S. When the size of the set S is huge, there can be computational cost and storage difficulty.

Bloom filter solves this problem by a way similar to this:
1) preprocess the set: for every value w in the set S, compute a feature vector (v_w1, v_w2, .. v_wn). In the entire vector space, set these as 1, set the rest as 0.
2) compute the feature vector for the given value x (v_x1, v_x2, ..., v_xn). Now if all of v_x1, v_x2, ..., v_xn are 1, then it's possible the value x is in the set S.

The issue is possible false-positive: x is not in the set, but its feature vector are all 1s. To solve the issue, the key is to choose good vector computation function(s). Usually the probability of false positive is 0.0001 or below.

The advantage, on the other hand, is that there is NO false-negative: if computation shows the value x does not exist, then this is 100% correct. Therefore bloom filter is very reliable and efficient in ruling out a value that does not exist.

See this page for a more detailed explanation for bloom filter, and a piece of comment.
More topics of the application of math from this author are here.

No comments:

Blog Archive

Followers