Example using pdfminer to extract text and then filter out lines that aren't "texty" enough.
pdf2txt.py something.pdf | ./textiness.py > temp.txt
For example, I'm personally using this to extract the text of the IPCC report on 1.5°C:
pdf2txt.py SR15_Chapter4_Low_Res.pdf | ./unwrap.py -e ".}" | ./textiness.py > SR15_Chapter4_Low_Res.txt
unwrap.py
takes care of reformatting lines of text into coherent paragraphs. textiness.py
then filters out lines that don't appear to be texty.