-
-
Save jitoquinto/e5a763c6e4aefae3b40b15b3b9221395 to your computer and use it in GitHub Desktop.
Google Workflow to load CSV's from Drive to BigQuery
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# This workflow loads data from Google Drive to BQ. | |
# | |
# Arguments: | |
# - base_drive_folder_id (String) | |
# The root Google Drive Folder ID | |
# - dest_project_id (String) | |
# Project ID of where data should be loaded | |
# - source_dataset_id (String) | |
# Dataset where data should be loaded | |
main: | |
params: [args] | |
steps: | |
- init: | |
assign: | |
- base_drive_folder_id: ${args.base_drive_folder_id} | |
- dest_project_id: ${args.dest_project_id} | |
- dest_dataset_id: ${args.dest_dataset_id} | |
- get_file_ids_from_drive: | |
call: http.get | |
args: | |
url: https://content.googleapis.com/drive/v3/files | |
auth: | |
type: OAuth2 | |
scopes: | |
- https://www.googleapis.com/auth/drive | |
query: | |
includeItemsFromAllDrives: true | |
supportsAllDrives: true | |
q: ${"'" + base_drive_folder_id + "' in parents and mimeType = 'text/csv'"} | |
result: csv_files | |
- iterate_files: | |
for: | |
value: file | |
in: ${csv_files.body.files} | |
steps: | |
- assign_variables_for_bq_load: | |
assign: | |
- source_uri: ${"https://drive.google.com/open?id=" + file.id} | |
- file_name_no_ext: ${text.split(file.name, ".")[0]} | |
- table_name: ${file_name_no_ext} | |
- load_file_to_bq: | |
call: googleapis.bigquery.v2.jobs.insert | |
args: | |
projectId: ${dest_project_id} | |
connector_params: | |
scopes: | |
- https://www.googleapis.com/auth/drive | |
- https://www.googleapis.com/auth/bigquery | |
body: | |
configuration: | |
query: | |
createDisposition: CREATE_IF_NEEDED | |
writeDisposition: WRITE_TRUNCATE | |
destinationTable: | |
projectId: ${dest_project_id} | |
datasetId: ${dest_dataset_id} | |
tableId: ${table_name} | |
tableDefinitions: | |
${"temp_" + table_name}: | |
sourceUris: | |
- ${source_uri} | |
sourceFormat: CSV | |
csvOptions: | |
skipLeadingRows: 1 | |
autodetect: true | |
query: ${"SELECT * FROM temp_" + table_name} | |
result: load_result |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment