This demonstrates the implementation of full text search for documents in Indexed DB.
- Word-breaking and stemming is used to create a list of terms for each document.
- Document records are annotated with the list of terms when added to the database.
- A multi-entry index on the list of terms is populated.
- A query is similarly processed into a list of terms.
- A join over the terms is implemented using multiple cursors on the index.
The necessity of annotating records with the word list to populate the index is a limitation of the current Indexed DB API. A feature request to support custom indexing is tracked at w3c/IndexedDB#33.
This is just a demonstration and not production-quality code. The segmenter code is a polyfill tracking an ECMAScript proposal. It may be out of sync with this demo and therefore broken. The stemmer code is unoptimized and definitely too slow for serious use. Sorry about that.
- Polyfill of
Intl.Segmenter
proposal by @littledan - https://github.com/littledan/Segmenter/blob/master/README.md - proposal
- https://gist.github.com/inexorabletash/8c4d869a584bcaa18514729332300356 - polyfill
This uses Intl.v8BreakIterator
in Chrome (which in turn uses ICU),
and falls back to a terrible English-only implementation elsewhere.
Drop this in as segment.js
- Porter Stemming Algorithm by Martin Porter
- https://tartarus.org/martin/PorterStemmer/ - algorithm
- https://tartarus.org/martin/PorterStemmer/js.txt - JS implementation
Note that this stemmer is no longer recommended by the author for practical work, but used as it's something everyone has heard of.
Drop this in as porter-stemmer.js
FullText.tokenize(text, locale)
Tokenize a string into word stems, for creating full text index.
- text: string to tokenize
- locale: locale for tokenizing (e.g.
'en'
)
Returns array of word-stems.
FullText.search(index, query, locale, callback)
Perform a full-text search.
- index: an IDBIndex mapping word-stems to records
- query: text string, e.g.
'alice bob eve'
- locale: locale for tokenizing query (e.g.
'en'
) - callback: called with array of primary keys
Must be called when the index's transaction is active. Callback will be called when the transaction is active (i.e. more requests can be made within the transaction).
Throws if query contains no words.