Skip to content

Instantly share code, notes, and snippets.

@simbo1905
Last active July 10, 2023 02:50
Show Gist options
  • Star 10 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save simbo1905/8c17c53bb0bf36adc181314c31412777 to your computer and use it in GitHub Desktop.
Save simbo1905/8c17c53bb0bf36adc181314c31412777 to your computer and use it in GitHub Desktop.
How To Load The HIBP Pwned Passwords Database Into MongoDB

How To Load The HIBP Pwned Passwords Database Into MongoDB

NIST recommends that when users are trying to set a password you should reject those that are commonly used or compromised:

When processing requests to establish and change memorized secrets, 
verifiers SHALL compare the prospective secrets against a list that 
contains values known to be commonly-used, expected, or compromised.

But how do you know what are the compromised passwords? Luckily Troy Hunter put a lot of effort into building the "Have I Been Pwned (HIBP)" database with the SHA1 hashes of 501,636,842 passwords that have been compromised on the internet. Sweet.

This means that to prevent a user setting a compromised password like P@ssword you can look it up on a public HIBP service such as this one and reject it.

If you are running a security sensitive service it is probably a bad idea to make a call to a public password hash lookup service. To get around that the public Pwned Password API at https://haveibeenpwned.com/API/v2#PwnedPasswords has you send the first 5 chars of the hash and they respond with all the matches. That might be slow or return a lot of data or be offline. So you might want to load the HIBP database into a private store such as MongoDB and check the SHA1 hash against that authorative store. You can then use a private secure API to your own MongoDB and just do an exact match SHA1 check which will be fast and since it is on your infrastructure you can ensure that it is made highly available.

There is another gist on this site for loading into Redis. Redis needs to fit in memory so would be expensive to run but that gist has a suggestion of how to hold the most used passwords in redis for fast checks before doing a slower check against all the hashs in mongo.

Prerequisites

These instructions assume that you drive a mac but should be as straightforward on linux.

  • Over 50Gi of disk (uncompressed the database is 33Gi then add to that the compressed 8Gi )
  • Homebrew to install command line tools
    • brew install aria2 for the aria2c bit torrent download client
    • brew install p7zip for the 7za tool to uncompress a the .txt.7z file
  • A mongo database with sufficent disk space.

Steps

Note that it took an hour to download the 8Gi torrent on my broadband.

The mongoimport command assumes that your mongod server is listing locally on the default port. If not you can pass commandline args to mongoimport below to connect to a remote server.

  1. aria2c https://downloads.pwnedpasswords.com/passwords/pwned-passwords-2.0.txt.7z.torrent
  2. 7za x -so pwned-passwords-2.0.txt.7z | sed 's/:/,/g' | mongoimport --fields "_id.binary(base64),c.int32()" --columnsHaveTypes --db hibp --collection pwndpsswds --type csv

If you login and query the collection it looks something like:

> db.pwndpsswds.find()
{ "_id" : BinData(0,"5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8"), "c" : 3303003 }
{ "_id" : BinData(0,"3D4F2BF07DC1BE38B20CD6E46949A1071F9D0E3D"), "c" : 2900049 }
{ "_id" : BinData(0,"7C222FB2927D828AF22F592134E8932480637C0D"), "c" : 2680521 }
{ "_id" : BinData(0,"6367C48DD193D56EA7B0BAAD25B19455E529F5EE"), "c" : 2670319 }
{ "_id" : BinData(0,"E38AD214943DAAD1D64C102FAEC29DE4AFE9DA3D"), "c" : 2310111 }

Where the primary key _id is stored as a binary byte format to reduce the storage size compared to storing a string. That means that to query by the pk you need to do a little bit of work to conver the string base64 SHA1 into a BinData type. You should test your query solution against known passwords such as P@ssword so that you don't get false negatives.

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment