Skip to content

Instantly share code, notes, and snippets.

@bwallace
Created April 27, 2014 13:53
Show Gist options
  • Save bwallace/11346188 to your computer and use it in GitHub Desktop.
Save bwallace/11346188 to your computer and use it in GitHub Desktop.
We need to deal with the "TextNodes" returned by PDF.js, which looks like this:
{
"str": "procedure was performed at 10 mmHg pressure. At 6 and 24 h postoperatively, a short-form McGill Questionnaire (MPQ) was",
"dir": "ltr",
"width": 477.8928,
"height": 9,
"transform": [
9,
0,
0,
9,
58.5012,
516.0192
],
"fontName": "g_font_2"
},
{
"str": "obtained from all patients. Patients were then asked to complete a 10-cm visual analogue scale (VAS) for abdominal pain.",
"dir": "ltr",
"width": 439.425,
"height": 9,
"transform": [
9,
0,
0,
9,
58.5012,
505.0194
],
"fontName": "g_font_2"
},
(See also: https://gist.github.com/joelkuiper/11346143), to the desired SpaDoc structure
{
rawText: “sfagegqegqehqeheqhqwh”,
wordTokens: { uuid: 1212, word: “word” },
sentenceTokens: { uuid: 12121, sentence: “the brown fox …” }},
annotations: [{document: {}, textAnnotations: [{uuid: 13131, label: “whatever”] ],
__metaWords: [{uuid: 121441, pageIndex: 1, textNodes: [121212,31411145,…], range: [[1313,1414],[1441,11414]]}],
__metaSentence: []
}
@joelkuiper
Copy link

{
  "text": "Lorem ipsum dolor sit amet. consectetur adipisicing elit, sed do eiusmod tempor",
  "marginalia": [
    {"uuid": "0x131313", 
     "id": "selective_reporting",
     "title": "Selective reporting",
     "description": "A *markdown* string",
     "annotations": [
       {"uuid": "0x121141ac4", "label": 1, "__mappings": {"field": "words", "index": 14}},...]}
       ,...
  ],
  "__mappings":  {
    "pages": [{"textNodes": [1,...], "range": [[50,54],...]},...],
    "words": [{"textNodes": [1,...], "range": [[50,54],...]},...],
    "sentences": [{"textNodes": [1,4,...], "range": [[50,54],[54,354],...]},...]
  },
  "__textNodes": [{"pageIndex": 1, "interval": [12,14]}]
}

@joelkuiper
Copy link

“[Some new sentence ][that ends here. Other sentence.]”

{ "text": "Some new sentence that ends here. Other sentence.",
  "__textNodes": [{"page": 1, "interval": [0,18]}, {"page": 1, "interval": [18,49]}],
  "__mappings": { 
     "sentences": [
         {"textNodes": [0,1], "ranges": [[0,18],[18,33]]}, 
         {"textNodes": [1], "ranges": [33,49]}]
  }
}

@joelkuiper
Copy link

“[Some new sentence ][that ends here. Other sentence.]”

{ "text": "Some new sentence that ends here. Other sentence.",
  "marginalia": [
    {"id": "selective_reporting",
     "title": "Selective reporting",
     "description": "A *markdown* string",
     "annotations": [
       {"uuid": "0x121141ac4", "label": 1, "__mappings": {"field": "sentences", "index": 1}}]}
  ],
  "__textNodes": {
      "1":  {"page": 1, "interval": [0,18]}, 
      "2":  {"page": 1, "interval": [18,49]}
 },
  "__mappings": { 
     "sentences": [
         [{"nodeId": 0, "range": [0,18]},{"nodeId": 1, "range": [18,33]}],
         [{"nodeId": 1, "range": [33,49]}]
  ]
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment