Data structures:
Hash table mapping tokens -> <document-count, count-min-sketch(docuemnt id -> term count)>
Hash table mapping sketch indexes -> heap(<document id, term count dictionary> sorted by document id)
To search:
- sum sketches for all terms in the query
- find indexes of top k values in result sketch
- look up actual document ids and term counts for those indexes