Skip to content

Instantly share code, notes, and snippets.

@thisismattmiller
Created December 19, 2017 15:12
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save thisismattmiller/6d6cfff5759cae18b042a9c981f30b0b to your computer and use it in GitHub Desktop.
Save thisismattmiller/6d6cfff5759cae18b042a9c981f30b0b to your computer and use it in GitHub Desktop.
import json
from annoy import AnnoyIndex
files = ['vectors_0.ndjson','vectors_1.ndjson','vectors_2.ndjson','vectors_3.ndjson']
t = AnnoyIndex(100) # Length of item vector that will be indexed
lookup = {}
counter = 0
for f in files:
print(f)
with open(f,'r') as file:
for line in file:
counter += 1
data = json.loads(line)
t.add_item(data['i'], data['v'])
if counter % 50000 ==0:
print(counter, end='')
print('\r', end='')
t.build(50) # 50 trees
t.save('index_50tree.ann')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment