Skip to content

Instantly share code, notes, and snippets.

@tanaikech
Last active July 24, 2023 05:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tanaikech/9ee06b508624e252dd0e32efc348a59a to your computer and use it in GitHub Desktop.
Save tanaikech/9ee06b508624e252dd0e32efc348a59a to your computer and use it in GitHub Desktop.
Management of PDF Metadata using Google Apps ScriptManagement of PDF Metadata using Google Apps Script

Management of PDF Metadata using Google Apps ScriptManagement of PDF Metadata using Google Apps Script

This is a sample script for managing the metadata of PDF data using Google Apps Script.

There might be a case in that you want to retrieve and update the metadata of PDF data using Google Apps Script. In this post, I would like to introduce achieving this.

Class ManagePdfMetadata

This is a Class ManagePdfMetadata. This Class is used for managing the metadata of PDF files using Google Apps Script. And, in this Class, a Javascript library of pdf-lib is used for managing the PDF metadata. This Javascript library is loaded in this Class.

/**
 * ### Description
 * This is a Class for managing the metadata of PDF data using Google Apps Script.
 *
 * Author: Tanaike ( https://tanaikech.github.io/ )
 */
class ManagePdfMetadata {
  /**
   * ### Description
   * Constructor of this class.
   *
   * @return {void}
   */
  constructor() {
    this.cdnjs = "https://cdn.jsdelivr.net/npm/pdf-lib/dist/pdf-lib.min.js"; // or "https://cdnjs.cloudflare.com/ajax/libs/pdf-lib/1.17.1/pdf-lib.min.js"

    this.setTimeout;
    this.loadPdfLib_();
    this.keys = ["title", "subject", "author", "creator", "creationDate", "modificationDate", "keywords", "producer"];
  }

  /**
   * ### Description
   * Get metadata of the inputted PDF data using pdf-lib.
   *
   * @param {Object} blob Blob of PDF data by retrieving with Google Apps Script.
   * @return {Object} Object including the metadata of the inputted PDF data.
   */
  getPdfMetadata(blob) {
    const setTimeout = this.setTimeout;
    return new Promise(async (resolve, reject) => {
      try {
        const d = await this.getPdfDocFromBlob_(blob).catch(err => reject(err));
        const metadata = this.keys.reduce((o, k) => (o[k] = d[`get${k.charAt(0).toUpperCase() + k.slice(1)}`](), o), {});
        resolve(metadata);
      } catch (e) {
        reject(e);
      }
    });
  }

  /**
   * ### Description
   * Update metadata of the inputted PDF data using pdf-lib.
   *
   * @param {Object} blob Blob of PDF data by retrieving with Google Apps Script.
   * @param {Object} object Object for updating PDF metadata.
   * @return {Object} Blob of the updated PDF data.
   */
  udpatePdfMetadata(blob, object) {
    if (typeof object != "object" || Object.keys(object).length == 0) {
      throw new Error("Please set valid object.");
    }
    const setTimeout = this.setTimeout;
    return new Promise(async (resolve, reject) => {
      const d = await this.getPdfDocFromBlob_(blob).catch(err => reject(err));
      try {
        Promise.all(this.keys.map(k => {
          return new Promise(async (r, rj) => {
            try {
              if (object.hasOwnProperty(k)) {
                const f = `set${k.charAt(0).toUpperCase() + k.slice(1)}`;
                if (k == "title") {
                  await d[f](...object[k]);
                } else {
                  await d[f](object[k]);
                }
                r("Done");
              }
            } catch (e) {
              rj(e);
            }
          });
        }))
          .then(async _ => {
            const bytes = await d.save();
            const newBlob = Utilities.newBlob([...new Int8Array(bytes)], MimeType.PDF, blob.getName());
            resolve(newBlob);
          })
          .catch(err => console.log(err));
      } catch (e) {
        reject(e);
      }
    });
  }

  /**
   * ### Description
   * Load pdf-lib. https://pdf-lib.js.org/docs/api/classes/pdfdocument
   *
   * @return {void}
   */
  loadPdfLib_() {
    eval(UrlFetchApp.fetch(this.cdnjs).getContentText());
    this.setTimeout = function (f, t) {
      Utilities.sleep(t);
      return f();
    };
  }

  /**
   * ### Description
   * Get PDF document object using pdf-lib.
   *
   * @param {Object} blob Blob of PDF data by retrieving with Google Apps Script.
   * @return {Object} PDF document object using pdf-lib.
   */
  async getPdfDocFromBlob_(blob) {
    if (blob.toString() != "Blob") {
      throw new error("Please set PDF blob.");
    }
    const setTimeout = this.setTimeout;
    return await this.PDFLib.PDFDocument.load(new Uint8Array(blob.getBytes()), { updateMetadata: true });
  }
}

Sample script 1

This sample script retrieves the metadata from a PDF file on Google Drive using Google Apps Script.

function getPdfMetadata() {
  const fileId = "###"; // Please set the file ID of the PDF file on Google Drive.

  const blob = DriveApp.getFileById(fileId).getBlob();

  const MM = new ManagePdfMetadata();
  MM.getPdfMetadata(blob).then((res) => console.log(res));
}

When this script is run, the following response is returned.

{
  "title": "###",
  "subject": "###",
  "author": "###",
  "creator": "###",
  "creationDate": "### Date object ###",
  "modificationDate": "### Date object ###",
  "keywords": "### key words ###",
  "producer": "###"
}

When I tested this script, I noticed that the values of modificationDate and producer might be a bug in pdf-lib. modificationDate returns the execution time. producer always returns pdf-lib (https://github.com/Hopding/pdf-lib). I guessed that this might be a bug in pdf-lib. And, I would like to believe that this will be resolved in the future update.

Sample script 2

In this sample script, the metadata of a PDF file is updated using Google Apps Script.

function udpatePdfMetadata() {
  const fileId = "###"; // Please set the file ID of the PDF file on Google Drive.
  const blob = DriveApp.getFileById(fileId).getBlob();

  const object = {
    title: ["sample title", { showInWindowTitleBar: true }], // This property is an array.
    subject: "sample subject",
    author: "sample author",
    creator: "sample creator",
    creationDate: new Date("2023-07-01T00:00:00"), // This value is date object.
    modificationDate: new Date("2023-07-01T10:00:00"), // This value is date object.
    keywords: ["sample keyword 1", "sample keyword 2", "sample keyword 3"], // This property is an array.
    producer: "sample producer",
  };
  const MM = new ManagePdfMetadata();
  MM.udpatePdfMetadata(blob, object).then((newBlob) =>
    DriveApp.createFile(newBlob)
  );
}
  • When this script is run, the metadata of "title", "subject", "author", "creator", "creationDate", "modificationDate", "keywords", and "producer" is updated. And, after the update is finished, a new PDF file is created in the root folder.

  • When you want to overwrite the original PDF data with the updated PDF data, please modify the above script as follows. In this case, please enable Drive API at Advanced Google services.

    • From

      MM.udpatePdfMetadata(blob, object).then((newBlob) =>
        DriveApp.createFile(newBlob)
      );
    • To

      MM.udpatePdfMetadata(blob, object).then((newBlob) =>
        Drive.Files.update({}, fileId, newBlob)
      );
  • When the metadata is retrieved from the output PDF file updated with this method using getPdfMetadata(), the values of modificationDate, and producer are the execution time and pdf-lib (https://github.com/Hopding/pdf-lib), respectively. But, when I saw the output PDF file on my browser, I confirmed that the metadata was correctly updated by the giving values.

  • Official document of setTitle is here.

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment