Skip to content

Instantly share code, notes, and snippets.

@renaud
Last active December 27, 2015 23:59
Show Gist options
  • Save renaud/7410543 to your computer and use it in GitHub Desktop.
Save renaud/7410543 to your computer and use it in GitHub Desktop.
finding misextracted subscripts in pdfs, using PdfTextStream
@Test
public void testSubscripts() throws Exception {
final Pattern SUBSCRIPTS = Pattern.compile("^[ \\d]{10,1000}$");
File ROOT = new File(
"/Volumes/scratch/richarde/pdfs/201307/");
for (File pdf : ROOT.listFiles()) {
if (pdf.getName().endsWith(".pdf")) {
try {
// System.out.println("reading " + pdf.getName());
PDFTextStream pdfts = new PDFTextStream(pdf);
StringBuilder text = new StringBuilder(1024);
pdfts.pipe(new OutputTarget(text));
pdfts.close();
String previous = "";
for (String line : text.toString().split("\n")) {
Matcher m = SUBSCRIPTS.matcher(line);
while (m.find()) {
System.out
.println(pdf.getName()
+ "--------------------------------------\n"
+ previous + "\n" + line + "\n");
}
previous = line;
}
} catch (Exception e) {
System.err.println(e);
}
}
}
}
http://www.ncbi.nlm.nih.gov/pubmed/?term=10720617
fluid 119.0 mM NaCl, 3.3 mM KCl, 1.3 mM CaCl , 1.2
2
http://www.ncbi.nlm.nih.gov/pubmed/?term=10720617
mM MgCl , 0.5 mM Na HPO , 21.0 mM NaHCO , 3.4
2 24 3
http://www.ncbi.nlm.nih.gov/pubmed/?term=10720617
blockade experiments was carried out using CoCl Co2q.
2
http://www.ncbi.nlm.nih.gov/pubmed/?term=11166682
by briefly immersing the sections in 0.05% OsO . Pro-
4
http://www.ncbi.nlm.nih.gov/pubmed/?term=9570713
mM KH PO , 136 mM NaCl, 8 mM Na HPO ). Slides were
24 24
http://www.ncbi.nlm.nih.gov/pubmed/?term=9570713
aline activates vasopressin neurons via a -receptor-mediated Ca21
1
http://www.ncbi.nlm.nih.gov/pubmed/?term=10708688
pCO hypercapnia. which is metabolically compensated
2
http://www.ncbi.nlm.nih.gov/pubmed/?term=10708688
artery ligation did not affect the arterial pCO w10x. The
2
http://www.ncbi.nlm.nih.gov/pubmed/?term=12900283
electron-donating group, p-Me N-, -OMe or -OH, which
2
http://www.ncbi.nlm.nih.gov/pubmed/?term=12900283
residual protons in CDCl unless otherwise mentioned.
3
http://www.ncbi.nlm.nih.gov/pubmed/?term=12900283
NaCNBH (300 mg, 4.8 mmol). The resulting mixture was
3
http://www.ncbi.nlm.nih.gov/pubmed/?term=12900283
o)fluorene (94%). 1H NMR (200 MHz, CDCl ): 3.09 (s,
3
http://www.ncbi.nlm.nih.gov/pubmed/?term=12900283
MHz, CDCl ): 45.43, 114.04, 119.74, 120.60, 120.98,
3
http://www.ncbi.nlm.nih.gov/pubmed/?term=12900283
ofluorene. 1H NMR (200 MHz, CDCl ): 3.05 (s, 6H), 3.89
3
http://www.ncbi.nlm.nih.gov/pubmed/?term=12900283
ofluorene. 1H NMR (200 MHz, CDCl ): 3.01 (s, 6H), 4.01
3
http://www.ncbi.nlm.nih.gov/pubmed/?term=23015449
taining the following (in mM): 139 NaCl, 12 D-glucose, 17 NaHCO , 3
3
http://www.ncbi.nlm.nih.gov/pubmed/?term=23015449
mPSCs contained the following (in mM): 139 NaCl, 5 KCl, 17 NaHCO ,
3
http://www.ncbi.nlm.nih.gov/pubmed/?term=23015449
Na ATP, and 0.1 MgGTP; pH was adjusted to 7.3 with KOH. The junc-
2
http://www.ncbi.nlm.nih.gov/pubmed/?term=23015449
when motoneurons were held at 70 mV and with 1 mM MgCl in the
2
http://www.ncbi.nlm.nih.gov/pubmed/?term=218204
Na+ Na+ K+ Na+ Na+ K+ Na+ Na+ K'
15 30 45 60
http://www.ncbi.nlm.nih.gov/pubmed/?term=18199766
1.25 KH PO , 1 MgSO , 2 CaCl , 16 NaHCO ,
24 4 2 3
http://www.ncbi.nlm.nih.gov/pubmed/?term=23161880
iological conditions (using HCO -containing buffers), little or
3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment