jaredhirsch/foo.markdown

## foo.markdown

      
    Raw
  

              foo.markdown
            
          
    What do we need?

Scrape a page
store the result in a relational DB
also store the result in a full text index
expose search API
on search hit, return the full record & any score

How do we do this quickly / easily?

Scraping: ReaderMode


This is in FF already, and it works decently enough

let result;
let url = 'http://time.com/3983182/inflatable-minion-dublin';
ReaderMode.downloadAndParseDocument(url)
  .then(function(x) { result = x; }, console.error);

result.url; // "http://time.com/3983182/inflatable-minion-dublin"
result.title; // "Giant Inflatable Minion Causes Chaos on Dublin Road"
result.byline; // "Sarah Begley							@SCBegley"
result.excerpt; // "The balloon got loose from a fairground"
result.length; // 906
result.dir; // undefined (??? todo)
result.content; // the content is a mess, lot of html still in it,
                // see https://gist.github.com/6a68/d074fcff51cc39aa7e11


Store the result in a relational DB


we could create our own parallel places sqlite DB (visits table, urls table, etc)

this is a lot of work but we can create indexes without hurting anything
MDN docs: https://developer.mozilla.org/en-US/docs/Mozilla/JavaScript_code_modules/Sqlite.jsm


we could use IndexedDB (YDN has a full-text wrapper, too)

Might run into space limitations: FF uses LRU to evict entire domains, we could lose all data without notice


we could just overload the annotation field

this is a lot less work, more of a prototypey hack, but could still actually work
MDN docs: https://developer.mozilla.org/en-US/docs/Mozilla/Tech/Places/Using_the_Places_annotation_service


postgres RDS???


also store the result in a full text index


not provided, sadly, by FF
lunr.js is an option, no idea how it'll perform: http://lunrjs.com/

interesting lunr + dexie + indexedDB demo: http://bl.ocks.org/nolanlawson/6f69f4a573c1da862e92


maybe a bit more fully-featured: YDN

base library wraps indexedDB: https://github.com/yathit/ydn-db
same author supports full-text search on the same DB: https://github.com/yathit/ydn-db-fulltext
this could give us both databases in the same spot, but then we're talking about replacing Places again


AWS CloudSearch???


expose search API
on search hit, fetch the corresponding record & search score