Skip to content

Instantly share code, notes, and snippets.

View sbassi's full-sized avatar

Sebastian Bassi sbassi

View GitHub Profile
@sbassi
sbassi / gist:6691391
Created September 24, 2013 21:17
sample text
<div class="pd w0 h0"><div id="pf5" class="pf" data-page-no="5"><div class="pc pc5"><img class="bi" alt="" src="bg5.png"/><div class="t m0 x8b h12 y12a ff3 fsd fc0 sc0 ls0 ws1b">10 in 449 sib-pairs<span class="_ _8"> </span>with multiple sclerosis. <span class="ff4">J<span class="_ _8"> </span>Neuroimmunol</span></div><div class="t m0 x8b h12 y63 ff3 fsd fc0 sc0 ls0 wsc">2003; <span class="fff ws42">143</span>: 31–38.</div><div class="t m0 x0 h12 y12b ff3 fsd fc0 sc0 ls0 ws1b">16 Matesanz<span class="_ _7"> </span>F<span class="_ _0"></span>,<span class="_ _a"> </span>Caro-Maldonado<span class="_ _7"> </span>A,<span class="_ _7"> </span>Fedetz<span class="_ _a"> </span>M,<span class="_ _7"> </span>Fernandez<span class="_ _7"> </span>O,</div><div class="t m0 x8b h12 y12c ff3 fsd fc0 sc0 ls0 ws20">Milne RL, Guerr<span class="_ _2"></span>ero M<span class="_ _10"> </span><span class="ff4">et al. </span>IL2RA/CD25<span class="_ _10"> </span>polymorphisms</div><div class="t m0 x8b h12 y9d ff3 fsd fc0 sc0 ls0 wsf">
<div class="pd w0 h0"><div id="pf5" class="pf" data-page-no="5"><div class="pc pc5"><img class="bi" alt="" src="bg5.png"/><div class="t m0 x8b h12 y12a ff3 fsd fc0 sc0 ls0 ws1b">10 in 449 sib-pairs<span class="_ _8"> </span>with multiple sclerosis. <span class="ff4">J<span class="_ _8"> </span>Neuroimmunol</span></div><div class="t m0 x8b h12 y63 ff3 fsd fc0 sc0 ls0 wsc">2003; <span class="fff ws42">143</span>: 31–38.</div><div class="t m0 x0 h12 y12b ff3 fsd fc0 sc0 ls0 ws1b">16 Matesanz<span class="_ _7"> </span>F<span class="_ _0"></span>,<span class="_ _a"> </span>Caro-Maldonado<span class="_ _7"> </span>A,<span class="_ _7"> </span>Fedetz<span class="_ _a"> </span>M,<span class="_ _7"> </span>Fernandez<span class="_ _7"> </span>O,</div><div class="t m0 x8b h12 y12c ff3 fsd fc0 sc0 ls0 ws20">Milne RL, Guerr<span class="_ _2"></span>ero M<span class="_ _10"> </span><span class="ff4">et al. </span>IL2RA/CD25<span class="_ _10"> </span>polymorphisms</div><div class="t m0 x8b h12 y9d ff3 fsd fc0 sc0 ls0 wsf">
(cata)sbassi@sbassi-Latitude-E6330:~/projects/catalytic/cata/main-catalytic/src$ ./manage.py pdfhtml2model --checkdois xx 527ab0a0589a337f0f636ac7
pdfhtml2model starting: 2013-11-07 14:11:51.586189
dict_file is: xx
id is: 527ab0a0589a337f0f636ac7
51088b372ed3b20d8ac4fa0e 10.1371/journal.pone.0016964
5108931c2ed3b20d8ac4fa16 10.1093/brain/awm206
5108943e2ed3b20d8ac4fa19 10.1371/journal.pone.0029931
5108947a2ed3b20d8ac4fa1a 10.1371/journal.pone.0023634
510895772ed3b20d8ac4fa1b 10.1371/journal.pone.0046730
process_pdf_v2 - G296.full.pdf : starting : 2013-12-04 14:29:04.883371
process_pdf_v2 - G296.full.pdf : search publications related to file: 'G296.full
.pdf'
process_pdf_v2 - G296.full.pdf : existing publication found: 529e6ba2589a3354c07
9dbf4
process_pdf_v2 - G296.full.pdf : creation date from pdftk is 'D:20120112143838Z'
process_pdf_v2 - G296.full.pdf : creation date from pdfinfo is 'Thu Jan 12 14:38
:38 2012'
process_pdf_v2 - G296.full.pdf : calling pdfhtml2model with file: G296.full.txt
@sbassi
sbassi / gist:7796130
Created December 4, 2013 21:43
Announce from NCBI, we should see if we can use something of this.
News story: PMCID - PMID - Manuscript ID - DOI Converter Upgraded<http://www.ncbi.nlm.nih.gov/news/12-04-2013-id-converter-updated>
We have upgraded the PMCID - PMID - Manuscript ID - DOI Converter<http://www.ncbi.nlm.nih.gov/pmc/pmctopmid/>. It now allows you to convert IDs for publications referenced in PubMed and PMC. You can also cross-reference Open Access NIH Manuscript Submission IDs (NIHMS) and Digital Object Identifiers (DOIs) often used by publishers.
The API allows you to programmatically convert between IDs, and provides versioned IDs for each article and information about which article version is live (for embargoed articles), as well as output in various formats including HTML and CSV.
For more information:
* News story: PMCID - PMID - Manuscript ID - DOI Converter Upgraded http://www.ncbi.nlm.nih.gov/news/12-04-2013-id-converter-updated
* PMCID - PMID - Manuscript ID - DOI Converter http://www.ncbi.nlm.nih.gov/pmc/pmctopmid/
* ID Converter API documentation http://www
5296439c589a33124150ab1d NONE
529643c7589a33124150ab1e NONE
5296442f589a33124150ab1f NONE
52964482589a33124150ab20 NONE
529666f9589a3319422d043a NONE
52966a3a589a331b3b8f079e NONE
52966b2b589a331be763dd94 NONE
52967211589a331fd8125588 NONE
52967dae589a33233c0a6f90 31258
52968b42589a332518422ebc 21788
urlpatterns = patterns('',
# authentication
url(r'^login/(?P<backend>[^/]+)/$', auth,
name='socialauth_begin'),
url(r'^complete/(?P<backend>[^/]+)/$', complete,
name='socialauth_complete'),
# XXX: Deprecated, this URLs are deprecated, instead use the login and
# complete ones directly, they will differentiate the user intention
# by checking it's authenticated status association.
{u'document_ids': [u'6375166394', u'6375247604', u'6375247614', u'6375247624', u'6375243124', u'6375243764', u'6375247634', u'6375243214', u'6375243204', u'6375162234', u'6375243774', u'6375243194', u'6224587934'], u'documents': [{u'version': 1388154478, u'id': u'6375166394'}, {u'version': 1388156033, u'id': u'6375247604'}, {u'version': 1388156033, u'id': u'6375247614'}, {u'version': 1388156033, u'id': u'6375247624'}, {u'version': 1388155930, u'id': u'6375243124'}, {u'version': 1388155951, u'id': u'6375243764'}, {u'version': 1388156033, u'id': u'6375247634'}, {u'version': 1388155931, u'id': u'6375243214'}, {u'version': 1388155931, u'id': u'6375243204'}, {u'version': 1388154435, u'id': u'6375162234'}, {u'version': 1388155951, u'id': u'6375243774'}, {u'version': 1388155931, u'id': u'6375243194'}, {u'version': 1383164664, u'id': u'6224587934'}], u'total_results': 13, u'current_page': 0, u'total_pages': 1, u'items_per_page': 20}
@sbassi
sbassi / gist:8611240
Created January 25, 2014 03:09
ingestion error
[2014-01-25 02:44:40,710: WARNING/Worker-6] dict_file is: journal.pcbi.0030199.t
xt
[2014-01-25 02:44:40,710: WARNING/Worker-6] id is: 52e3250f589a334865feb6d5
[2014-01-25 02:44:42,379: WARNING/Worker-6] Using parser: <catalytic.website.par
ser.UnknownJournalParser instance at 0x4f38dd0>
[2014-01-25 02:44:42,582: WARNING/Worker-6] parser - DOI:
[2014-01-25 02:44:42,582: WARNING/Worker-6] No DOI found
[2014-01-25 02:44:43,168: ERROR/MainProcess] Task catalytic.webservice.tasks.pro
cess_pdf[fe09063a-e0c1-4edf-825b-4485cc0784ae] raised unexpected: IndexError('no
such item for Cursor instance',)
This file has been truncated, but you can view the full file.
<div id="page-container"><div id="pf1" class="pf w0 h0" data-page-no="1"><div class="pc pc1 w0 h0"><img class="bi x0 y0 w1 h1" alt="" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAA+YAAAMnCAIAAAA8tt9hAAAACXBIWXMAABYlAAAWJQFJUiTwAAANsUlEQVR42u3YsRHAIAwEQcsNUJprpTQqeAKKADO7keKPblRJHgAAfqv6MMKV8rV1vLYAAICTSXYAAJDsAACAZAcAAMkOAABIdgAAQLIDAIBkBwAAJDsAAEh2AABAsgMAAJIdAAAkOwAAINkBAECyAwAAkh0AAJDsAAAg2QEAAMkOAACSHQAAkOwAAIBkBwCAa1QSKwAAwLF82QEAQLIDAACSHQAAJDsAACDZAQAAyQ4AAJIdAACQ7AAAINkBAADJDgAASHYAAJDsAACAZAcAAMkOAABIdgAAQLIDAIBkBwAAJDsAAEh2AABAsgMAAJIdAAAkOwAAINkBAECymwAAACQ7AAAg2QEAQLIDAACSHQAAkOwAACDZAQAAyQ4AAJIdAACQ7AAAgGQHAADJDgAASHYAAJDsAACAZAcAACQ7AABIdgAAQLIDAIBkBwAAJDsAACDZAQBAsgMAAJIdAAAkOwAAINkBAADJDgAAkh0AAJDsAACAZAcAAMkOAABIdgAAkOwAAIBkBwAAJDsAAEh2AABAsgMAgGQHAAAkOwAAINkBAECyAwAAkh0AACQ7AAAg2QEAAMkOAACSHQAAkOwAACDZAQAAyQ4AAEh2AACQ7AAAgGQHAAAkOwAASHYAAECyAwCAZAcAACQ7AAAg2QEAQLIDAACSHQAAJDsAACDZAQAAyQ4AAJIdAACQ7AAAINkBAADJDgAASHYAAJDsAACAZAcAAMkOAABIdgAAQLIDAIBkBwAAJDsAACDZAQBAsgMAAJIdAAAkOwAAINkBAAD