Skip to content

Instantly share code, notes, and snippets.

@poltak
Last active October 9, 2017 13:58
Show Gist options
  • Save poltak/79d589d9833148dbff9061dd1588856a to your computer and use it in GitHub Desktop.
Save poltak/79d589d9833148dbff9061dd1588856a to your computer and use it in GitHub Desktop.
search-index script that simulates searches while indexing
{
"id": "page/Z2l0aHViLmNvbS9ubHAtY29tcHJvbWlzZS9jb21wcm9taXNl",
"content": "Skip content This repository Pull requests Issues Marketplace Explore Import gist organization issue Signed poltak Your profile stars Gists Help Settings Sign Watch Notifications watching notified when",
"url": "github.com/nlp-compromise/compromise",
"visits": [
"1504836098717",
"1505134022188"
],
"latest": "1505134022188",
"bookmarks": []
}
{
"id": "page/Z2l0aHViLmNvbS9Xb3JsZEJyYWluL1dlYk1lbWV4L3B1bGwvbmV3L21hc3Rlcg%3D%3D",
"content": "Skip content This repository Pull requests Issues Marketplace Explore Import gist organization issue Signed poltak Your profile stars Gists Help Settings Sign Unwatch Notifications watching notified w",
"url": "github.com/WorldBrain/WebMemex/pull/new/master",
"visits": [
"1498448586231",
"1499218625365",
"1499219629872",
"1505101643843",
"1505182505378"
],
"latest": "1505182505378",
"bookmarks": []
}
{
"id": "page/Z2l0aHViLmNvbS9qb3NzbWFjL3JlYWN0LWltYWdlcw%3D%3D",
"content": "Skip content This repository Pull requests Issues Marketplace Explore Import gist organization issue Signed poltak Your profile stars Gists Help Settings Sign Watch Notifications watching notified whe",
"url": "github.com/jossmac/react-images",
"visits": [
"1505127870156",
"1505127883028",
"1505127898544",
"1505128033557"
],
"latest": "1505128033557",
"bookmarks": []
}
{
"id": "page/Z2l0aHViLmNvbS9qb3NzbWFjL3JlYWN0L4ltYWdlcw%3D%3D",
"content": "Skip content This repository Pull requests Issues Marketplace Explore Import gist organization issue Signed poltak Your profile stars Gists Help Settings Sign Watch Notifications watching notified whe",
"url": "github.com/jossmac/react-images",
"visits": [
"1505127870156",
"1505127883028",
"1505127898544",
"1505128033557"
],
"latest": "1505128033557",
"bookmarks": []
}{
"id": "page/Z2l0aHViLmNvbS9qb3NzbWFjL3JlYWN0LWlbYWdlcw%3D%3D",
"content": "Skip content This repository Pull requests Issues Marketplace Explore Import gist organization issue Signed poltak Your profile stars Gists Help Settings Sign Watch Notifications watching notified whe",
"url": "github.com/jossmac/react-images",
"visits": [
"1505127870156",
"1505127883028",
"1505127898544",
"1505128033557"
],
"latest": "1505128033557",
"bookmarks": []
}{
"id": "page/Z2l0aHViLmNvbS9qb3NzbWFjL3JlYWN0vWltYWdlcw%3D%3D",
"content": "Skip content This repository Pull requests Issues Marketplace Explore Import gist organization issue Signed poltak Your profile stars Gists Help Settings Sign Watch Notifications watching notified whe",
"url": "github.com/jossmac/react-images",
"visits": [
"1505127870156",
"1505127883028",
"1505127898544",
"1505128033557"
],
"latest": "1505128033557",
"bookmarks": []
}{
"id": "page/Z2l0aHViLmNvbS9qb3NzbWFjx3JlYWN0LWltYWdlcw%3D%3D",
"content": "Skip content This repository Pull requests Issues Marketplace Explore Import gist organization issue Signed poltak Your profile stars Gists Help Settings Sign Watch Notifications watching notified whe",
"url": "github.com/jossmac/react-images",
"visits": [
"1505127870156",
"1505127883028",
"1505127898544",
"1505128033557"
],
"latest": "1505128033557",
"bookmarks": []
}{
"id": "page/Z2l0aHViLmNvbS9qb3zzbWFjL3JlYWN0LWltYWdlcw%3D%3D",
"content": "Skip content This repository Pull requests Issues Marketplace Explore Import gist organization issue Signed poltak Your profile stars Gists Help Settings Sign Watch Notifications watching notified whe",
"url": "github.com/jossmac/react-images",
"visits": [
"1505127870156",
"1505127883028",
"1505127898544",
"1505128033557"
],
"latest": "1505128033557",
"bookmarks": []
}
{
"id": "page/Z2l0aHViLmNsbS9qb3NzbWFjL3JlYWN0LWltYWdlcy9pc3N1ZXMvMTk%3D",
"content": "Skip content This repository Pull requests Issues Marketplace Explore Import gist organization issue Signed poltak Your profile stars Gists Help Settings Sign Watch Notifications watching notified whe",
"url": "github.com/jossmac/react-images/issues/19",
"visits": [
"1505127894708",
"1505127895522"
],
"latest": "1505127895522",
"bookmarks": []
}
{
"id": "page/Z2l0aHViLmNvbS9qb3NzbWFjL3JlYWN0LWltYWdlcy9pc3N1ZXM%3D",
"content": "Skip content This repository Pull requests Issues Marketplace Explore Import gist organization issue Signed poltak Your profile stars Gists Help Settings Sign Watch Notifications watching notified whe",
"url": "github.com/jossmac/react-images/issues",
"visits": [
"1505127883053",
"1505127884131",
"1505127897608"
],
"latest": "1505127892342",
"bookmarks": []
}
{
"id": "page/Z2l0aHViLmNvbS9qb3NzbWFjL3JlYWN0LWltYWdlcy9pc3N1ZXM%2FcT1pczppc3N1ZSB5b3V0dWJlICZ1dGY4PeKckw%3D%3D",
"content": "Skip content This repository Pull requests Issues Marketplace Explore Import gist organization issue Signed poltak Your profile stars Gists Help Settings Sign Watch Notifications watching notified whe",
"url": "github.com/jossmac/react-images/issues?utf8=%E2%9C%93&q=is%3Aissue%20youtube%20",
"visits": [
"1505127891824",
"1505127892342"
],
"latest": "1505127892342",
"bookmarks": []
}
{
"id": "page/Z2l0aHViLmNvbS9qb3NzbWFjL3JlYWN0LWltYWdlcy9pc3N1ZXM%2FcT1pczppc3N1ZSB2aWRlbyZ1dGY4PeKckw%3D%3D",
"content": "Skip content This repository Pull requests Issues Marketplace Explore Import gist organization issue Signed poltak Your profile stars Gists Help Settings Sign Watch Notifications watching notified whe",
"url": "github.com/jossmac/react-images/issues?utf8=%E2%9C%93&q=is%3Aissue%20video",
"visits": [
"1505127887114",
"1505127887641",
"1505127893642",
"1505127897012"
],
"latest": "1505127897012",
"bookmarks": []
}
{
"id": "page/Z2l0aHViLmNvbS9yZWFjdGpzL3JlYWN0LXRyYW5zaXRpb24tZ3JvdXA%3D",
"content": "Skip content This repository Pull requests Issues Marketplace Explore Import gist organization issue Signed poltak Your profile stars Gists Help Settings Sign Watch Notifications watching notified whe",
"url": "github.com/reactjs/react-transition-group",
"visits": [
"1504158672857",
"1505130270443",
"1505133013351"
],
"latest": "1505133013351",
"bookmarks": []
}
{
"id": "page/Z2l0aHViLmNvbS9qb3NzbWFjL3JlYWN0LWltYWdlcy9cc3N1ZXM%2FcT1pczppc3N1ZSB2aWRlbyZ1dGY4PeKckw%3D%3D",
"content": "Skip content This repository Pull requests Issues Marketplace Explore Import gist organization issue Signed poltak Your profile stars Gists Help Settings Sign Watch Notifications watching notified whe",
"url": "github.com/jossmac/react-images/issues?utf8=%E2%9C%93&q=is%3Aissue%20video",
"visits": [
"1505127887114",
"1505127887641",
"1505127893642",
"1505127897011"
],
"latest": "1505127897011",
"bookmarks": []
}
const fs = require('fs')
// Constants
const searchIntervalMS = 50 // # ms between each search
const searchTerm = '*' // What to search for
const getQuery = term => ({
"query": [
{
"AND": {
"content": [
term,
],
"bookmarks": [],
"visits": [],
"url": []
}
}
],
"pageSize": 10,
})
// Promisifies search-index.search, resolving to # of results, rejecting any error
const initSearch = index => query => {
const results = []
return new Promise((resolve, reject) =>
index.search(query)
.on('data', datum => results.push(datum))
.on('error', reject)
.on('finish', () => resolve(results.length)))
}
// Main
function buildIndex(err, index) {
if (err) {
throw err
}
const search = initSearch(index)
// Start searching every N ms, logging output/errors to stdout/stderr
const intervalID = setInterval(() =>
search(getQuery(searchTerm))
.then(console.log)
.catch(console.error)
, searchIntervalMS)
// Start indexing test data
console.time('indexing time')
fs.createReadStream('data.json')
.pipe(index.feed())
.on('finish', () => {
// Stop re-running search now that indexing is complete
clearInterval(intervalID)
console.timeEnd('indexing time')
})
}
// Entrypoint
require('search-index')(require('./indexopts'), buildIndex)
module.exports = {
batchSize: 500,
appendOnly: false,
indexPath: 'test',
logLevel: 'warn',
preserveCase: false,
compositeField: false,
// separator: /[|' .,\-|(\n)]+/,
nGramLength: 1,
fieldOptions: {
visits: {
fieldedSearch: true,
},
bookmarks: {
fieldedSearch: true,
},
latest: {
sortable: true,
},
content: {
fieldedSearch: true,
wildcard: true,
},
url: {
weight: 10,
fieldedSearch: true,
separator: '/',
},
},
}
@poltak
Copy link
Author

poltak commented Oct 2, 2017

Instructions:

  1. ensure all files in same dir
  2. alter searchTerm and searchIntervalMS constants in index.js (L4,5) to play with search interval timing + search term constants (default is 50ms interval + blank search, which is enough to throw the error in most runs on my machine)
  3. run node index to run main script, which will start the searches and indexing, until indexing is complete

Should be reproducible both when an index already exists and when one does not.

Errors

Main error that should be easy to reproduce with a low search interval:

..../test-si/node_modules/search-index-searcher/lib/MergeOrConditions.js:30
      lastDoc.scoringCriteria = lastDoc.scoringCriteria.concat(cur.scoringCriteria)
                                                       ^

TypeError: Cannot read property 'concat' of undefined
    at ..../test-si/node_modules/search-index-searcher/lib/MergeOrConditions.js:30:56
    at Array.reduce (<anonymous>)
    at MergeOrConditions._flush (..../test-si/node_modules/search-index-searcher/lib/MergeOrConditions.js:23:6)
    at MergeOrConditions.prefinish (_stream_transform.js:137:10)
    at emitNone (events.js:105:13)
    at MergeOrConditions.emit (events.js:207:7)
    at prefinish (_stream_writable.js:590:14)
    at finishMaybe (_stream_writable.js:598:5)
    at endWritable (_stream_writable.js:609:3)
    at MergeOrConditions.Writable.end (_stream_writable.js:560:5)

Occasionally throws this error running under the same conditions:

events.js:182
      throw er; // Unhandled 'error' event
      ^

Error: stream.push() after EOF
    at readableAddChunk (_stream_readable.js:243:30)
    at ScoreTopScoringDocsTFIDF.Readable.push (_stream_readable.js:211:10)
    at ScoreTopScoringDocsTFIDF.Transform.push (_stream_transform.js:147:32)
    at ..../test-si/node_modules/search-index-searcher/lib/ScoreTopScoringDocsTFIDF.js:33:16
    at dispatchError (..../test-si/node_modules/levelup/lib/util.js:22:36)
    at ..../test-si/node_modules/levelup/lib/levelup.js:203:14

Notes

  1. this only ever happens if searchTerm is set to either wildcard '*' or a term that appears in the test data (try 'repository'). Setting searchTerm to something like 'elephant' (doesn't appear in data.json) should never throw any errors, even if searchIntervalMS is set to something crazy like 1
  2. main error seems to occur when two identical docs appear one-after-the-other in the MergeOrConditions' resultSet AND their scoringCriteria is undefined (hence the TypeError)
  3. never happens when a sort is set in the query, hence could very well be some issue with ScoreTopScoringDocsTFIDF (so somehow related to SI issue #413)

@fergiemcdowall
Copy link

I think this is now fixed in master. I'm not totally sure what the expected output of the program is, but the error is at least gone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment