Skip to content

Instantly share code, notes, and snippets.

Last active Jun 26, 2022
What would you like to do?
Parsing HTML using Google Apps Script

Parsing HTML using Google Apps Script

This is a sample script for parsing HTML using Google Apps Script. When HTML data is converted to Google Document, the HTML data can be parsed and be converted to Google Document. In this case, the paragraphs, lists and tables are included. From this situation, I thought that this situation can be used for parsing HTML using Google Apps Script. So I could came up with this method.

In the Sheet API, the HTML data can be put to the Spreadsheet with the PasteDataRequest. But unfortunately, in this case, I couldn't distinguish between the body and tables.

The flow of this method is as follows. In this sample script, the tables from HTML are retrieved.


  1. Retrieve HTML data using UrlFetchApp.fetch().
  2. Create new Google Document by converting HTML data to Google Document using Drive API.
    • This is a temporal file.
  3. Retrieve all tables using Document service of Google Apps Script.
  4. Delete the temporal file.

Sample script

Before you run this script, please enable Drive API at Advanced Google Services.

function parseTablesFromHTML(url) {
  var html = UrlFetchApp.fetch(url);
  var docId = Drive.Files.insert(
    { title: "temporalDocument", mimeType: MimeType.GOOGLE_DOCS },
  var tables = DocumentApp.openById(docId)
  var res = {
    var values = [];
    for (var row = 0; row < table.getNumRows(); row++) {
      var temp = [];
      var cols = table.getRow(row);
      for (var col = 0; col < cols.getNumCells(); col++) {
    return values;
  return res;

// Please run this function.
function run() {
  var url = "###"; // <--- Please set URL that you want to retrieve table.
  var res = parseTablesFromHTML(url);


As a test case, when you set to url and run the script, the following result can be retrieved.

    ["head1_1", "head1_2", "head1_3\n"],
    ["value1_a1", "value1_b1", "value1_c1"],
    ["value1_a2", "value1_b2", "value1_c2"]
    ["head2_1", "head2_2", "head2_3\n"],
    ["value2_a1", "value2_b1", "value2_c1"],
    ["value2_a2", "value2_b2", "value2_c2"]


  • Using this method, all paragraphs and lists can be also retrieved.
  • This method can be also used with other languages.


Copy link

empenoso commented Nov 30, 2019

Great example! Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment