You need 7zip installed to grab the NPI database. (brew install p7zip
osx)
To create the index, run the init_*
scripts. You would need the doctor graph referral data to use *_refer.*
, but the NPI database will be automatically downloaded for you. Indexing happens on all cores, and takes less than 10 min on my 8 core machine.
To grab lines matching a search term, use python search_npi.py term
.
Note: index performance is good if you have a lot of memory. Index file blocks will stay hot in cache, but they are loaded each time the program is run, which is super inefficient. Should use an on-disk hashtable where the offsets can be calculated instead.