Skip to content

Instantly share code, notes, and snippets.

@tanaikech
Created October 13, 2020 06:10
Show Gist options
  • Save tanaikech/51628e5e0a2c017329457afdb1936912 to your computer and use it in GitHub Desktop.
Save tanaikech/51628e5e0a2c017329457afdb1936912 to your computer and use it in GitHub Desktop.
Retrieving All Values from All Sheets from URL of 2PACX- of Web Published Google Spreadsheet using Python

Retrieving All Values from All Sheets from URL of 2PACX- of Web Published Google Spreadsheet using Python

This is a sample script for retrieving all values from all sheets from URL of 2PACX- of Web Published Google Spreadsheet using Python.

In this post, it supposes that the Google Spreadsheet has already been published for Web. Ref

Flow

The flow of this method is as follows.

  1. Retrieve XLSX data from the URL of web published Google Spreadsheet as BytesIO data.
    • The URL is like https://docs.google.com/spreadsheets/d/e/2PACX-###/pubhtml.
  2. XLSX data is parsed with openpyxl.
  3. Retrieve all values from all sheets.

Sample script

Please set spreadsheetUrl.

import openpyxl
import re
import requests
from io import BytesIO

spreadsheetUrl = "https://docs.google.com/spreadsheets/d/e/2PACX-###/pubhtml"

regex = re.compile(
    'https:\/\/docs\.google\.com\/spreadsheets\/d\/e\/2PACX-.+\/pub(html)?')
if re.match(regex, spreadsheetUrl):
    temp = spreadsheetUrl.split('/')
    temp.pop()
    url = "/".join(temp) + "/pub?output=xlsx"
    res = requests.get(url)
    wb = openpyxl.load_workbook(
        filename=BytesIO(res.content), data_only=False)
    sheets = wb.sheetnames
    r = []
    for sheetName in sheets:
        sheet = wb[sheetName]
        temp = {'sheetName': sheetName, 'values': []}
        for cells in sheet.rows:
            column = []
            for cell in cells:
                column.append(cell.value)
            temp['values'].append(column)
        r.append(temp)
    print(r)

Result

When above script is run, the following sample value is returned.

[
  {
    "sheetName": "Sheet1",
    "values": [["a1", "b1", "c1"], ["a2", "b2", "c2"], , ,]
  },
  {
    "sheetName": "Sheet2",
    "values": [["a1", "b1", "c1"], ["a2", "b2", "c2"], , ,]
  },
  ,
  ,
]

Reference

openpyxl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment