Skip to content

Instantly share code, notes, and snippets.

@unot
Created April 4, 2011 02:22
Show Gist options
  • Save unot/901055 to your computer and use it in GitHub Desktop.
Save unot/901055 to your computer and use it in GitHub Desktop.
PDF to Text in Cygwin
#!/usr/bin/bash
# for Cygwin env
# PDF to Text. ShiftJISだと英数記号が2バイト文字になるのでEUC-JPに
for i in *.pdf;
do
pdftotext -enc EUC-JP $i
done
# Windows テキストに変換
for i in *.txt;
do
nkf -s -Lw $i >${i%.txt}.tmp
done
# 空行とホワイトスペースの削除。Cygwin の sed だとうまくいかないのでPerlで
for i in *.tmp;
do
perl -pe 's/^\r\n//; s/ +//g' $i >${i%.tmp}.txt
done
rm *.tmp
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment