Skip to content

Instantly share code, notes, and snippets.

@GabrielSGoncalves
Last active September 21, 2023 16:06
Show Gist options
  • Save GabrielSGoncalves/eaa6fb06ed93a2d333771e99bc9ad8df to your computer and use it in GitHub Desktop.
Save GabrielSGoncalves/eaa6fb06ed93a2d333771e99bc9ad8df to your computer and use it in GitHub Desktop.
Read private files from a Google Drive
from typing import Union, Dict
from io import BytesIO, StringIO
import json
import pandas as pd
import requests
from pydrive2.auth import GoogleAuth
from pydrive2.drive import GoogleDrive
def read_private_file_from_gdrive(
file_url: str, file_format: str, google_auth: GoogleAuth, **kwargs
) -> Union[pd.DataFrame, Dict, str]:
"""Read private files from Google Drive.
Parameters
----------
file_url : str
URL adress to file in Google Drive.
file_format : str
File format can be 'csv', 'xlsx', 'parquet', 'json' or 'txt'.
google_auth: GoogleAuth
Google Authentication object with access to target account. For more
information on how to login using Auth2, please check the link below:
https://docs.iterative.ai/PyDrive2/quickstart/#authentication
Returns
-------
Union[pd.DataFrame, Dict, str].
The specified object generate from target file.
"""
drive = GoogleDrive(google_auth)
# Parsing file URL
file_id = file_url.split("/")[-2]
file = drive.CreateFile({"id": file_id})
content_io_buffer = file.GetContentIOBuffer()
if file_format == "csv":
return pd.read_csv(
StringIO(content_io_buffer.read().decode()), **kwargs
)
elif file_format == "xlsx":
return pd.read_excel(content_io_buffer.read(), **kwargs)
elif file_format == "parquet":
byte_stream = content_io_buffer.read()
return pd.read_parquet(BytesIO(byte_stream), **kwargs)
elif file_format == "json":
return json.load(StringIO(content_io_buffer.read().decode()))
elif file_format == "txt":
byte_stream = content_io_buffer.read()
return byte_stream.decode("utf-8", **kwargs)
@derycck
Copy link

derycck commented Sep 21, 2023

thanks for this snipet !
Quick tip: You could replace

from: StringIO(file.GetContentIOBuffer().read().decode())

to: StringIO(file.GetContentString())

no need to follow the route: GetContentIOBuffer > read > decode
because the getcontentstring method of 'GoogleDriveFile object' already delivers the string

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment