Skip to content

Instantly share code, notes, and snippets.

@jitoquinto
Created November 9, 2022 19:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jitoquinto/e5a763c6e4aefae3b40b15b3b9221395 to your computer and use it in GitHub Desktop.
Save jitoquinto/e5a763c6e4aefae3b40b15b3b9221395 to your computer and use it in GitHub Desktop.
Google Workflow to load CSV's from Drive to BigQuery
# This workflow loads data from Google Drive to BQ.
#
# Arguments:
# - base_drive_folder_id (String)
# The root Google Drive Folder ID
# - dest_project_id (String)
# Project ID of where data should be loaded
# - source_dataset_id (String)
# Dataset where data should be loaded
main:
params: [args]
steps:
- init:
assign:
- base_drive_folder_id: ${args.base_drive_folder_id}
- dest_project_id: ${args.dest_project_id}
- dest_dataset_id: ${args.dest_dataset_id}
- get_file_ids_from_drive:
call: http.get
args:
url: https://content.googleapis.com/drive/v3/files
auth:
type: OAuth2
scopes:
- https://www.googleapis.com/auth/drive
query:
includeItemsFromAllDrives: true
supportsAllDrives: true
q: ${"'" + base_drive_folder_id + "' in parents and mimeType = 'text/csv'"}
result: csv_files
- iterate_files:
for:
value: file
in: ${csv_files.body.files}
steps:
- assign_variables_for_bq_load:
assign:
- source_uri: ${"https://drive.google.com/open?id=" + file.id}
- file_name_no_ext: ${text.split(file.name, ".")[0]}
- table_name: ${file_name_no_ext}
- load_file_to_bq:
call: googleapis.bigquery.v2.jobs.insert
args:
projectId: ${dest_project_id}
connector_params:
scopes:
- https://www.googleapis.com/auth/drive
- https://www.googleapis.com/auth/bigquery
body:
configuration:
query:
createDisposition: CREATE_IF_NEEDED
writeDisposition: WRITE_TRUNCATE
destinationTable:
projectId: ${dest_project_id}
datasetId: ${dest_dataset_id}
tableId: ${table_name}
tableDefinitions:
${"temp_" + table_name}:
sourceUris:
- ${source_uri}
sourceFormat: CSV
csvOptions:
skipLeadingRows: 1
autodetect: true
query: ${"SELECT * FROM temp_" + table_name}
result: load_result
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment