Last active
December 27, 2015 23:59
-
-
Save renaud/7410543 to your computer and use it in GitHub Desktop.
finding misextracted subscripts in pdfs, using PdfTextStream
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
@Test | |
public void testSubscripts() throws Exception { | |
final Pattern SUBSCRIPTS = Pattern.compile("^[ \\d]{10,1000}$"); | |
File ROOT = new File( | |
"/Volumes/scratch/richarde/pdfs/201307/"); | |
for (File pdf : ROOT.listFiles()) { | |
if (pdf.getName().endsWith(".pdf")) { | |
try { | |
// System.out.println("reading " + pdf.getName()); | |
PDFTextStream pdfts = new PDFTextStream(pdf); | |
StringBuilder text = new StringBuilder(1024); | |
pdfts.pipe(new OutputTarget(text)); | |
pdfts.close(); | |
String previous = ""; | |
for (String line : text.toString().split("\n")) { | |
Matcher m = SUBSCRIPTS.matcher(line); | |
while (m.find()) { | |
System.out | |
.println(pdf.getName() | |
+ "--------------------------------------\n" | |
+ previous + "\n" + line + "\n"); | |
} | |
previous = line; | |
} | |
} catch (Exception e) { | |
System.err.println(e); | |
} | |
} | |
} | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
http://www.ncbi.nlm.nih.gov/pubmed/?term=10720617 | |
fluid 119.0 mM NaCl, 3.3 mM KCl, 1.3 mM CaCl , 1.2 | |
2 | |
http://www.ncbi.nlm.nih.gov/pubmed/?term=10720617 | |
mM MgCl , 0.5 mM Na HPO , 21.0 mM NaHCO , 3.4 | |
2 24 3 | |
http://www.ncbi.nlm.nih.gov/pubmed/?term=10720617 | |
blockade experiments was carried out using CoCl Co2q. | |
2 | |
http://www.ncbi.nlm.nih.gov/pubmed/?term=11166682 | |
by briefly immersing the sections in 0.05% OsO . Pro- | |
4 | |
http://www.ncbi.nlm.nih.gov/pubmed/?term=9570713 | |
mM KH PO , 136 mM NaCl, 8 mM Na HPO ). Slides were | |
24 24 | |
http://www.ncbi.nlm.nih.gov/pubmed/?term=9570713 | |
aline activates vasopressin neurons via a -receptor-mediated Ca21 | |
1 | |
http://www.ncbi.nlm.nih.gov/pubmed/?term=10708688 | |
pCO hypercapnia. which is metabolically compensated | |
2 | |
http://www.ncbi.nlm.nih.gov/pubmed/?term=10708688 | |
artery ligation did not affect the arterial pCO w10x. The | |
2 | |
http://www.ncbi.nlm.nih.gov/pubmed/?term=12900283 | |
electron-donating group, p-Me N-, -OMe or -OH, which | |
2 | |
http://www.ncbi.nlm.nih.gov/pubmed/?term=12900283 | |
residual protons in CDCl unless otherwise mentioned. | |
3 | |
http://www.ncbi.nlm.nih.gov/pubmed/?term=12900283 | |
NaCNBH (300 mg, 4.8 mmol). The resulting mixture was | |
3 | |
http://www.ncbi.nlm.nih.gov/pubmed/?term=12900283 | |
o)fluorene (94%). 1H NMR (200 MHz, CDCl ): 3.09 (s, | |
3 | |
http://www.ncbi.nlm.nih.gov/pubmed/?term=12900283 | |
MHz, CDCl ): 45.43, 114.04, 119.74, 120.60, 120.98, | |
3 | |
http://www.ncbi.nlm.nih.gov/pubmed/?term=12900283 | |
ofluorene. 1H NMR (200 MHz, CDCl ): 3.05 (s, 6H), 3.89 | |
3 | |
http://www.ncbi.nlm.nih.gov/pubmed/?term=12900283 | |
ofluorene. 1H NMR (200 MHz, CDCl ): 3.01 (s, 6H), 4.01 | |
3 | |
http://www.ncbi.nlm.nih.gov/pubmed/?term=23015449 | |
taining the following (in mM): 139 NaCl, 12 D-glucose, 17 NaHCO , 3 | |
3 | |
http://www.ncbi.nlm.nih.gov/pubmed/?term=23015449 | |
mPSCs contained the following (in mM): 139 NaCl, 5 KCl, 17 NaHCO , | |
3 | |
http://www.ncbi.nlm.nih.gov/pubmed/?term=23015449 | |
Na ATP, and 0.1 MgGTP; pH was adjusted to 7.3 with KOH. The junc- | |
2 | |
http://www.ncbi.nlm.nih.gov/pubmed/?term=23015449 | |
when motoneurons were held at 70 mV and with 1 mM MgCl in the | |
2 | |
http://www.ncbi.nlm.nih.gov/pubmed/?term=218204 | |
Na+ Na+ K+ Na+ Na+ K+ Na+ Na+ K' | |
15 30 45 60 | |
http://www.ncbi.nlm.nih.gov/pubmed/?term=18199766 | |
1.25 KH PO , 1 MgSO , 2 CaCl , 16 NaHCO , | |
24 4 2 3 | |
http://www.ncbi.nlm.nih.gov/pubmed/?term=23161880 | |
iological conditions (using HCO -containing buffers), little or | |
3 | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment