cx0/extract_table_from_pdf.md

## extract_table_from_pdf.md

      
    Raw
  

              extract_table_from_pdf.md
            
          
    # install nougat
pip install "nougat-ocr[api, dataset]"
# crop the table from paper (preserve pdf)
# using default 0.1.0-small model
nougat /tmp/2304.08485.table3.only.pdf -o /tmp/" --markdown
\begin{table}
\begin{tabular}{l|l l l l}  & Conversation & Detail description & Complex reasoning & **All** \\ \hline Full data & 83.1 & 75.3 & 96.5 & 85.1 \\ Detail + Complex & 81.5 (-1.6) & 73.3 (-2.0) & 90.8 (-5.7) & 81.9 (-3.2) \\ Conv + 5\% Detail + 10\% Complex & 81.0 (-2.1) & 68.4 (-7.1) & 91.5 (-5.0) & 80.5 (-4.4) \\ Conversation & 76.5 (-6.6) & 59.8 (-16.2) & 84.9 (-12.4) & 73.8 (-11.3) \\ No Instruction Tuning & 22.0 (-61.1) & 24.0 (-51.3) & 18.5 (-78.0) & 21.5 (-63.6) \\ \hline \end{tabular}
\end{table}
Table 3: Relative scores for different settings _w.r.t._ GPT-4 (text-only) on 30 randomly sampled images from COCO Val 2014. Each image is associated one short question, one detailed question, and one complex reasoning question, resulting in a total of 90 questions. Following the same setting as our data generation pipeline, GPT-4 uses ground truth image captions and bounding boxes as visual input. We prompt GPT-4 with the answers from our model outputs and the answers by GPT-4 (text-only), and let it compare between both responses and give a rating with an explanation.