Skip to content

Instantly share code, notes, and snippets.

Last active April 10, 2024 01:38
Show Gist options
  • Save mogsdad/e6795e438615d252584f to your computer and use it in GitHub Desktop.
Save mogsdad/e6795e438615d252584f to your computer and use it in GitHub Desktop.
For, a question about getting pdf attachments in gmail as text. I got a little carried away - this does much more than asked.

Google Apps Script pdfToText Utility#

This is a helper function that will convert a given PDF file blob into text, as well as offering options to save the original PDF, intermediate Google Doc, and/or final plain text files. Additionally, the language used for Optical Character Recognition (OCR) may be specified, defaulting to 'en' (English).

Note: Updated 12 May 2015 due to deprecation of DocsList. Thanks to Bruce McPherson for the getDriveFolderFromPath() utility.

    // Start with a Blob object
    var blob = gmailAttchment.getAs(MimeType.PDF);
    // fileId will be the ID of a saved text file (default behavior):
    var fileId = pdfToText( blob );

    // filetext will contain text from pdf file, no residual files are saved:
    var filetext = pdfToText( blob, {keepTextfile: false} );

    // we can save other converted file types, too:
    var options = {
       keepPdf : true,            // Keep a copy of the original PDF file.
       keepGdoc : true,           // Keep a copy of the OCR Google Doc file.
       keepTextfile : true,       // Keep a copy of the text file. (default)
       path : "attachments/today" // Folder path to store file(s) in.
    filetext = pdfToText( blob, options );
* Convert pdf file (blob) to a text file on Drive, using built-in OCR.
* By default, the text file will be placed in the root folder, with the same
* name as source pdf (but extension 'txt'). Options:
* keepPdf (boolean, default false) Keep a copy of the original PDF file.
* keepGdoc (boolean, default false) Keep a copy of the OCR Google Doc file.
* keepTextfile (boolean, default true) Keep a copy of the text file.
* path (string, default blank) Folder path to store file(s) in.
* ocrLanguage (ISO 639-1 code) Default 'en'.
* textResult (boolean, default false) If true and keepTextfile true, return
* string of text content. If keepTextfile
* is false, text content is returned without
* regard to this option. Otherwise, return
* id of textfile.
* @param {blob} pdfFile Blob containing pdf file
* @param {object} options (Optional) Object specifying handling details
* @returns {string} id of text file (default) or text content
function pdfToText ( pdfFile, options ) {
// Ensure Advanced Drive Service is enabled
try {
catch (e) {
throw new Error( "To use pdfToText(), first enable 'Drive API' in Resources > Advanced Google Services." );
// Set default options
options = options || {};
options.keepTextfile = options.hasOwnProperty("keepTextfile") ? options.keepTextfile : true;
// Prepare resource object for file creation
var parents = [];
if (options.path) {
parents.push( getDriveFolderFromPath (options.path) );
var pdfName = pdfFile.getName();
var resource = {
title: pdfName,
mimeType: pdfFile.getContentType(),
parents: parents
// Save PDF to Drive, if requested
if (options.keepPdf) {
var file = Drive.Files.insert(resource, pdfFile);
// Save PDF as GDOC
resource.title = pdfName.replace(/pdf$/, 'gdoc');
var insertOpts = {
ocr: true,
ocrLanguage: options.ocrLanguage || 'en'
var gdocFile = Drive.Files.insert(resource, pdfFile, insertOpts);
// Get text from GDOC
var gdocDoc = DocumentApp.openById(;
var text = gdocDoc.getBody().getText();
// We're done using the Gdoc. Unless requested to keepGdoc, delete it.
if (!options.keepGdoc) {
// Save text file, if requested
if (options.keepTextfile) {
resource.title = pdfName.replace(/pdf$/, 'txt');
resource.mimeType = MimeType.PLAIN_TEXT;
var textBlob = Utilities.newBlob(text, MimeType.PLAIN_TEXT, resource.title);
var textFile = Drive.Files.insert(resource, textBlob);
// Return result of conversion
if (!options.keepTextfile || options.textResult) {
return text;
else {
// Helper utility from
function getDriveFolderFromPath (path) {
return (path || "/").split("/").reduce ( function(prev,current) {
if (prev && current) {
var fldrs = prev.getFoldersByName(current);
return fldrs.hasNext() ? : null;
else {
return current ? null : prev;
Copy link

Hi, My files are saving in the "My Drive" not in my desired folder, even though I put right address in path : "Attachments/Test" which exists. Can anyone help me to solve this issue?

Copy link

Hi, My files are saving in the "My Drive" not in my desired folder, even though I put right address in path : "Attachments/Test" which exists. Can anyone help me to solve this issue?

Hi, Actually it is referring main drive (Drive.files), you need to replace it with "DriveApp.getFolderById('string_id_of_my_folder');"

You may visit us for more at help

Copy link

Hi, My files are saving in the "My Drive" not in my desired folder, even though I put right address in path : "Attachments/Test" which exists. Can anyone help me to solve this issue?

Hi, Actually it is referring main drive (Drive.files), you need to replace it with "DriveApp.getFolderById('string_id_of_my_folder');"

You may visit us for more at help

It's giving error, "TypeError: DriveApp.getFolderById(...).insert is not a function".

Copy link

thokoe commented Oct 20, 2022

Hi, Is it possible to run the script without having to save the google doc to drive and then delete it.

Copy link

appscriptexpert commented Oct 20, 2022 via email

Copy link

dahse89 commented Apr 13, 2023

The Script wasn't working for me, but i found this

 * Convert PDF file to text
 * @param {string} fileId - The Google Drive ID of the PDF
 * @param {string} language - The language of the PDF text to use for OCR
 * return {string} - The extracted text of the PDF file

const convertPDFToText = (fileId, language) => {
  fileId = fileId || '18FaqtRcgCozTi0IyQFQbIvdgqaO_UpjW'; // Sample PDF file
  language = language || 'en'; // English

  // Read the PDF file in Google Drive
  const pdfDocument = DriveApp.getFileById(fileId);

  // Use OCR to convert PDF to a temporary Google Document
  // Restrict the response to include file Id and Title fields only
  const { id, title } = Drive.Files.insert(
      title: pdfDocument.getName().replace(/\.pdf$/, ''),
      mimeType: pdfDocument.getMimeType() || 'application/pdf',
      ocr: true,
      ocrLanguage: language,
      fields: 'id,title',

  // Use the Document API to extract text from the Google Document
  const textContent = DocumentApp.openById(id).getBody().getText();

  // Delete the temporary Google Document since it is no longer needed

  // (optional) Save the text content to another text file in Google Drive
  const textFile = DriveApp.createFile(`${title}.txt`, textContent, 'text/plain');
  return textContent;

Copy link

The Script wasn't working for me, but i found this

 * Convert PDF file to text
 * @param {string} fileId - The Google Drive ID of the PDF
 * @param {string} language - The language of the PDF text to use for OCR
 * return {string} - The extracted text of the PDF file

const convertPDFToText = (fileId, language) => {
  fileId = fileId || '18FaqtRcgCozTi0IyQFQbIvdgqaO_UpjW'; // Sample PDF file
  language = language || 'en'; // English

  // Read the PDF file in Google Drive
  const pdfDocument = DriveApp.getFileById(fileId);

  // Use OCR to convert PDF to a temporary Google Document
  // Restrict the response to include file Id and Title fields only
  const { id, title } = Drive.Files.insert(
      title: pdfDocument.getName().replace(/\.pdf$/, ''),
      mimeType: pdfDocument.getMimeType() || 'application/pdf',
      ocr: true,
      ocrLanguage: language,
      fields: 'id,title',

  // Use the Document API to extract text from the Google Document
  const textContent = DocumentApp.openById(id).getBody().getText();

  // Delete the temporary Google Document since it is no longer needed

  // (optional) Save the text content to another text file in Google Drive
  const textFile = DriveApp.createFile(`${title}.txt`, textContent, 'text/plain');
  return textContent;

how abaut get Image ?, when I add script

const ImgContent = DocumentApp.openById(id).getBody().getImage();

I cannot get all PDF file, there are 3 images but 2 images detected only

Copy link

The Drive.Files.insert api is outdated, now it needs a ParentReference on parents field, and the request is always uploading to the root folder

The if (options.path) must be replaced by

  if (options.path) {
    const folder = getDriveFolderFromPath (options.path);
    if (folder) {
      const parentReference = Drive.newParentReference(); = folder.getId();

Copy link

I'm using this function that works well:

 * @param {string} fileId
 * @param {string} parentFolderId
 * @returns {string} pdfContent
function extractTextFromPDF(fileId, parentFolderId) {

  const destFolder = Drive.Files.get(parentFolderId, { "supportsAllDrives": true });
  const newFile = {
    "fileId": fileId,
    "parents": [
  const args = {
    "resource": {
      "parents": [
      "name": "temp",
      "mimeType": "application/",
    "supportsAllDrives": true

  const newTargetDoc = Drive.Files.copy(newFile, fileId, args);
  const newTargetFile = DocumentApp.openById(newTargetDoc.getId());
  const pdfContent = newTargetFile.getBody().getText();

  return pdfContent;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment