Instantly share code, notes, and snippets.

View set_test.rb
require 'minitest/autorun'
require 'set'
class Foo
def initialize(s)
@s = s
def ==(other)
@s == other
View idx.json
View output
[[:string_literal, [:string_content, [:@tstring_content, "abc", [1, 1]]]]]]
'a' 'b' 'c'
[:string_literal, [:string_content, [:@tstring_content, "a", [1, 1]]]],
[:string_literal, [:string_content, [:@tstring_content, "b", [1, 5]]]]],

So many of your fields look like tags or facets, e.g. sector or market, so you full text search might not be the best way to search on these.

Instead if you had full text search on the tags themselves, e.g. full text search on all the possible values for sector or market, and then, with the resuts of this tag lookup, go and find companies that have this tag. The facted search, lookup by tags (or groups of tags) isn't really lunr's forte, but you could certainly use it for full text search of the tags.

var idx = lunr(function () {
View lunr.contraction_filter.js
lunr.contractionTrimmer = function (token) {
return token.replace(/('ve|n't|'d|'ll|'ve|'s|'re)$/, "")
lunr.Pipeline.registerFunction(lunr.stopWordFilter, 'contractionTrimmer')
var englishContractions = function (idx) {
idx.pipeline.after(lunr.trimmer, lunr.contractionTrimmer)
View icon_search.js
// Assuming a 'document' strucutre like below to represent a single icon
"id": "some-unique-id",
"tags": ["tag1", "tag2", "tag3"],
"title": "My Awesome Icon"
// You can then set up an index with the following, I put a boost on the tags, but it depends on your data
View pbcopy.plist
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "">
<plist version="1.0">
View lunr.js
* lunr - - A bit like Solr, but much smaller and not as bright - 0.5.10-issue162
* Copyright (C) 2015 Oliver Nightingale
* MIT Licensed
* @license

I think response and collection are the two classes where the bulk of the stuff is happenening, the importer is just an example of the external api. Actually looking back at this I think it could definitly be improved but I guess its an example of how to do it.

I recently did something similar, this time it was streaming a file from s3, but yielding the file line by line. The AWS SDK just gives you the s3 object chunk by chunk as its read from the socket, so I had to buffer and then yield as many lines as possible from each chunk. I returned an enumerator to do this rather than mixing in enumerable, I can't really share this code though, which is a shame cos I thought it was a nice implementation at the time.

View gist:9942353
0 info it worked if it ends with ok
1 verbose cli [ '/usr/local/Cellar/node/0.8.20/bin/node',
1 verbose cli '/usr/local/bin/npm',
1 verbose cli 'install',
1 verbose cli 'lunr' ]
2 info using npm@1.2.11
3 info using node@v0.8.20
4 verbose read json /Users/olivernightingale/package.json
5 verbose read json /Users/olivernightingale/node_modules/lunr/package.json
6 verbose read json /Users/olivernightingale/package.json