Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save jcaxmacher/3773140d69492b622fced754f1118d87 to your computer and use it in GitHub Desktop.
Save jcaxmacher/3773140d69492b622fced754f1118d87 to your computer and use it in GitHub Desktop.
Building a sqlite DB for the Pwned Passwords data

(There are better ways to do this, such as making a small Python program to load in the data as binary instead of hex—but this is what I did in a pinch and it worked well for what I needed!)

Last executed 2019-06-25 with the v4 dump:

  1. Make sure you have 60 GB free disk space and some extra to spare. Alternatively, take a walk on the wild side and delete source files as soon as you've used them.
  2. Download the SHA-1 (ordered by hash) torrent from https://haveibeenpwned.com/Passwords
  3. Unpack and strip off the counts:
    7z x -so pwned-passwords-sha1-ordered-by-hash-v4.7z pwned-passwords-sha1-ordered-by-hash-v4.txt | sed 's/:.*//' > hashes.lst
    
  4. Load into sqlite:
    $ sqlite3 pwned-passwords-by-hash-v4.sqlite
    sqlite> pragma journal_mode=memory;
    sqlite> CREATE TABLE hashes("hash" BLOB PRIMARY KEY) WITHOUT ROWID;
    sqlite> .mode csv
    sqlite> .import hashes.lst hashes
    
  5. Confirm that queries will use the primary key's index:
    sqlite> EXPLAIN QUERY PLAN SELECT * FROM hashes WHERE hash = "D657187D9C9C1AD04FDA5132338D405FDB112FA1" LIMIT 1;
    0|0|0|SEARCH TABLE hashes USING PRIMARY KEY (hash=?)
    
    A sample query should return almost instantly:
    sqlite> SELECT * FROM hashes WHERE hash = "D657187D9C9C1AD04FDA5132338D405FDB112FA1" LIMIT 1;
    D657187D9C9C1AD04FDA5132338D405FDB112FA1
    

A few notes on construction:

  • Journaling is disabled to speed up initial import.
  • Making the hash the primary key tells sqlite that the data is unique and ordered—and the primary key data is stored in a B-tree, not duplicated into an additional index.
  • We can also avoid storing rowids; no idea how much space this saves us, but no harm here: https://www.sqlite.org/withoutrowid.html
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment