Skip to content

Instantly share code, notes, and snippets.

@leoantony72
Created June 23, 2022 05:38
Show Gist options
  • Save leoantony72/46eeb2d7ca2ddc4ccb0d87181e31a34d to your computer and use it in GitHub Desktop.
Save leoantony72/46eeb2d7ca2ddc4ccb0d87181e31a34d to your computer and use it in GitHub Desktop.
Introduction to Bloom filter with real life use cases🐱‍💻

Suppose you are creating an account on Twitch or any other website, you will immediately think of a cool username like noobmaster69🙃 and when you submit the form you immediately get User Name Already Taken😡 you will try out other cool names or even your own name but you get the same Err Message, Quite Frustrating right😂.

But have ever thought about how this works behind the scenes?🤔

Social media platforms like twitch/youtube have millions of users but still, these companies quickly check the availability of usernames by searching millions of usernames. hearing searching makes you think about linear & binary search Are they using these that we learned in high school? Nop Something far better

Suppose I have an SQL database with 10 million users, with a Unique key on the usernames column. To check if the username is taken or not I will Write the following Query

SELECT username FROM users WHERE username = $1;

It will go through 10million rows and check if the username exists or not, it's Slow right😅 . That's where Bloom Filter comes in.

Bloom Filter

A Bloom filter is a space-efficient probabilistic data structure that is used to test whether an element is present on a set or not. It's space-efficient and fast but it has a drawback it is probabilistic in nature and there might be false-positive(depends on the number of elements, hash function, and size ) results that are ok for storing usernames.

  • False Positive increase as we add elements
  • You can't delete elements from the bloom filter
  • No false-negative results

How does Bloom Filter work ?🤔

bloomfilter!

Bloom Filter is a bit array of m bits, all set to zero

How do we add elements to Bloom Filter?🤔

We pass the element to k no. of the hash function which returns a value and we divide it by the modules of the size of the bit array.

Eg: Suppose we need to add the username noobmaster69

h1("noobmaster69") % 10 = 2
h2("noobmaster69") % 10 = 9
h3("noobmaster69") % 10 = 7

h1 , h2 & h3 - Are 3 different hash function, value 10 is the size of the bit array

For now, we pass the username into 3 different hash functions which return 3 different values. Now we set bits at indices 2,9 & 7 to 1 bloomfilter!

How do we check if the element exists in the bloom filter?🤔

We hash the value with 3 different hash function

h1("noobmaster69") % 10 = 2
h2("noobmaster69") % 10 = 9
h3("noobmaster69") % 10 = 7

We get indices at 2,9,7 and we check if all the indices are set to 1 in the bit array, If not element is certainly not added to the bloom filter. If all the bits are set then we can say "noobmaster69" is probably present in the bloom filter.

Now we add another username "Doctor Strange"🧙‍♂️ to our list

h1("Doctor Strange") % 10 = 1
h2("Doctor Strange") % 10 = 6
h3("Doctor Strange") % 10 = 10

Set bits at indices 1,5 & 10 to 1

bloomfilter!

Now we need to check if the username "wizard" is added or not in the bit array

h1("wizard") % 10 = 7
h2("wizard") % 10 = 10
h3("wizard") % 10 = 2

bloomfilter!

Now if we check the indices at 7,10,2 it's all set to 1🤷‍♂️, why is that happening we never added "wizard" to the bit array this is the drawback of bloom filter we get false-positive results. so how do we prevent false-positive results we can't but we can control the probability of getting a false positive by controlling the size of the bit array and the number of the hash function.

  • Increase In the size of the bit array results in fewer false positives.
  • Increase in the number of the hash functions results in a decrease in the probability of the false-positive results.

Check this website - Online Bloom Filter Calculator.

Is Bloom Filter Space Efficient?🤔

Data structures like hashmaps, arrays, or linked lists store the element itself which is not efficient. in hashmap, We have to store the string itself with key-value pair, what if we have 10million usernames it will take quite some space.

In the bloom filter, we don't store the data itself. we pass the string to a hash function and flip 0 to 1 in the bit array this would reduce the space of the bloom filter. we don't use a bloom filter to test if an element is present, you use it to test whether it's certainly not present, since it guarantees no false negatives. where is the Bloom filter used?🤔

  • Username is taken or not
  • Weak password detection (github.com/leoantony72/weak-password-detector)
  • Blocking Ip
  • Chrome uses a bloom filter to identify malicious URLs

Redis Bloom Filter

Redis is an open-source in-memory database used as a cache mainly but recently Redis is being used as a primary database because it's blazingly fast✈.

Redis has a bloom filter that does most of the work we only have to insert data into Redis. So let's check it out

We will be using Docker to set up Redis🚀

docker run -p 6379:6379 --name redis-redisbloom redislabs/rebloom:latest

Our Redis instance will be running in port 6379, lets access it by Redis-CLI

docker exec -it redis-redisbloom bash -c redis-cli

We will use BF.RESERVE to make a bloom filter, it will take three arguments

BF.RESERVE usernames 0.00001 10000
  • Key(usernames) - Name of the bloom filter
  • err rate(0.00001) - The desired probability for false positives.
  • capacity(10000) - The number of entries intended to be added to the filter.

Add elements to the bloom filter🐱‍💻

BF.ADD keyword for adding elements to the bloom filter which takes two arguments

BF.ADD usernames noobmaster69
  • key - usernames
  • element - "noobmaster69"

This will add the username "noobmaster69" to the bloom filter

Find out whether an item exists in the bloom filter🧙‍♂️

BF.EXISTS to check if the element exists or not, this takes two argument

BF.EXISTS usernames noobmaster69
  • key - usernames
  • element - noobmaster69

If the element exists it will return 1 and if it doesn't it will return 0

Get info about bloom filter

BF. INFO to get info on bloom filter

BF.INFO usernames

It will return these informations bloomfilter!

Weak password detection🕵️‍♀️

I made a weak password detection system with Redis bloom filter, So how does this work ?🤔

I took 1000 most commonly used passwords from weak password github

Passed these passwords to the Redis Bloom filter and when users submit a form with a password like "password123" we will alert the user to use a strong password thus preventing weak passwords.

Check out my GitHub repo for the weak passwords detection: github.com/leoantony72/weak-password-detector

To check if the username is taken or not: github.com/leoantony72/redis-bloom-filter

That concludes this, have a great day😁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment