Skip to content

Instantly share code, notes, and snippets.

@gyurisc
Last active July 26, 2020 20:12
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save gyurisc/1026807 to your computer and use it in GitHub Desktop.
Save gyurisc/1026807 to your computer and use it in GitHub Desktop.
Extracting attachments from a PDF file and write it out to a file
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using iTextSharp.text.pdf;
namespace PDFExtract
{
public class PDFExtractor
{
public void ExtractAttachments()
{
}
// Origin of the code: http://stackoverflow.com/questions/3007780/itextsharp-how-to-open-read-extract-a-file-attachment
internal void ExtractAttachments(string file_name, string folderName)
{
PdfDictionary documentNames = null;
PdfDictionary embeddedFiles = null;
PdfDictionary fileArray = null;
PdfDictionary file = null;
PRStream stream = null;
PdfReader reader = new PdfReader(file_name);
PdfDictionary catalog = reader.Catalog;
documentNames = (PdfDictionary)PdfReader.GetPdfObject(catalog.Get(PdfName.NAMES));
if (documentNames != null)
{
embeddedFiles = (PdfDictionary)PdfReader.GetPdfObject(documentNames.Get(PdfName.EMBEDDEDFILES));
if (embeddedFiles != null)
{
PdfArray filespecs = embeddedFiles.GetAsArray(PdfName.NAMES);
for (int i = 0; i < filespecs.Size; i++)
{
// i++; commenting this out as it is a mistake to change the loop variable
fileArray = filespecs.GetAsDict(i);
file = fileArray.GetAsDict(PdfName.EF);
foreach (PdfName key in file.Keys)
{
stream = (PRStream)PdfReader.GetPdfObject(file.GetAsIndirectObject(key));
string attachedFileName = fileArray.GetAsString(key).ToString();
byte[] attachedFileBytes = PdfReader.GetStreamBytes(stream);
System.IO.File.WriteAllBytes(attachedFileName, attachedFileBytes);
}
}
}
}
}
}
}
@chy600
Copy link

chy600 commented Jul 21, 2020

Thanks for the solution! Curious, how come the counter is added again here https://gist.github.com/gyurisc/1026807#file-pdfextractor-cs-L38 ?

@gyurisc
Copy link
Author

gyurisc commented Jul 26, 2020

That is a mistake. I will fix it right away. Thanks for pointing out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment