Skip to content

Instantly share code, notes, and snippets.

@tanaikech
Created December 12, 2024 03:07
Show Gist options
  • Save tanaikech/2f91c211cc6332c84db7b4697ccc4cb6 to your computer and use it in GitHub Desktop.
Save tanaikech/2f91c211cc6332c84db7b4697ccc4cb6 to your computer and use it in GitHub Desktop.
Workaround: Export web-published Google Docs as PDFs using Google Apps Script

Workaround: Export web-published Google Docs as PDFs using Google Apps Script

Abstract

This report outlines a Google Apps Script solution for directly exporting web-published Google Docs to PDF. By circumventing limitations in published URLs, the script enables convenient PDF generation without manual intervention.

Introduction

Google Sheets and Google Docs offer the convenient feature of web publishing, providing readily accessible URLs for sharing. Ref

  • Google Sheets: https://docs.google.com/spreadsheets/d/e/2PACX-###/pubhtml
  • Google Docs: https://docs.google.com/document/d/e/2PACX-###/pub

These URLs utilize a unique ID (###) for each document. While Google Sheets supports direct PDF export by modifying the URL to https://docs.google.com/spreadsheets/d/e/2PACX-###/pub?output=pdf, a similar direct method for Google Docs is currently unavailable.

This limitation arises because the unique ID within the published URL (2PACX-###) does not correspond to the actual Google Docs file ID, which is necessary for direct PDF generation. Furthermore, retrieving the file ID from the published URL is not feasible.

To overcome this obstacle, this report presents a practical workaround using Google Apps Script. Google Apps Script seamlessly integrates with Google Docs and Google Drive, providing the necessary tools to achieve the desired PDF export functionality.

Flow

The flow of this workaround is as follows.

Usage

1. Prepare Google Apps Script Project

Create a standalone Google Apps Script project. Ref Alternatively, a container-bound script can also be used.

2. Enable API

Enable the Drive API v3 in Advanced Google services. Ref The Drive API is required to convert HTML from Google Docs.

3. Script

Copy and paste the following script into the script editor of your Google Apps Script project:

function myFunction() {
  // Please set your Web Published URL of the Google Document.
  const url = "https://docs.google.com/document/d/e/2PACX-###/pub";

  // 1. Download as HTML.
  const htmlBlob = UrlFetchApp.fetch(url)
    .getBlob()
    .setContentType(MimeType.HTML);

  // 2. Convert HTML to Google Docs.
  const { id } = Drive.Files.create(
    { name: "ConvertedDocument", mimeType: MimeType.GOOGLE_DOCS },
    htmlBlob
  );
  const doc = DocumentApp.openById(id);

  // Remove header paragraphs of Web Published Document.
  // If it is necessary to adjust the removed paragraphs, please modify the following script.
  doc
    .getBody()
    .getParagraphs()
    .splice(0, 5)
    .forEach((p) => p.removeFromParent());
  doc.saveAndClose();

  // 3. Convert Google Docs to PDF.
  DriveApp.createFile(DriveApp.getFileById(id).getBlob());
}

When this script is executed, the web-published Google Docs will be exported as a PDF file in the root folder of your Google Drive. Also, you can see the converted Google Docs in the same folder. If you need to edit the document, you can use const doc = DocumentApp.openById(id) within the script.

Note

  • This script provides a workaround for exporting web-published Google Docs as PDFs. Therefore, the exported document might not perfectly replicate the original.
  • Currently, it appears that web-published Google Docs cannot be parsed on a per-tab basis. When multiple tabs are included in the Google Docs, all tabs are displayed in the web-published version sequentially.
  • This approach can also be adapted for use with other languages besides Google Apps Script. For example, a Python implementation can be found here.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment