Retrieving All Values from All Sheets from URL of 2PACX- of Web Published Google Spreadsheet using Python
This is a sample script for retrieving all values from all sheets from URL of 2PACX- of Web Published Google Spreadsheet using Python.
In this post, it supposes that the Google Spreadsheet has already been published for Web. Ref
The flow of this method is as follows.
- Retrieve XLSX data from the URL of web published Google Spreadsheet as
BytesIO
data.- The URL is like
https://docs.google.com/spreadsheets/d/e/2PACX-###/pubhtml
.
- The URL is like
- XLSX data is parsed with openpyxl.
- Retrieve all values from all sheets.
Please set spreadsheetUrl
.
import openpyxl
import re
import requests
from io import BytesIO
spreadsheetUrl = "https://docs.google.com/spreadsheets/d/e/2PACX-###/pubhtml"
regex = re.compile(
'https:\/\/docs\.google\.com\/spreadsheets\/d\/e\/2PACX-.+\/pub(html)?')
if re.match(regex, spreadsheetUrl):
temp = spreadsheetUrl.split('/')
temp.pop()
url = "/".join(temp) + "/pub?output=xlsx"
res = requests.get(url)
wb = openpyxl.load_workbook(
filename=BytesIO(res.content), data_only=False)
sheets = wb.sheetnames
r = []
for sheetName in sheets:
sheet = wb[sheetName]
temp = {'sheetName': sheetName, 'values': []}
for cells in sheet.rows:
column = []
for cell in cells:
column.append(cell.value)
temp['values'].append(column)
r.append(temp)
print(r)
When above script is run, the following sample value is returned.
[
{
"sheetName": "Sheet1",
"values": [["a1", "b1", "c1"], ["a2", "b2", "c2"], , ,]
},
{
"sheetName": "Sheet2",
"values": [["a1", "b1", "c1"], ["a2", "b2", "c2"], , ,]
},
,
,
]