Skip to content

Instantly share code, notes, and snippets.

{"id": "1", "first_name": "John"}{"id": "12", "first_name": "Peter"}{"id": "3", "first_name": "Cornelia"}
$ git clone https://github.com/mshakhomirov/bigquery-etl-tutorial.git
$ cd bigquery-etl-tutorial
$ git checkout part2
[
{
"name": "id",
"type": "INT64",
"mode": "NULLABLE"
},
{
"name": "first_name",
"type": "STRING",
"mode": "NULLABLE"
$ bq query 'select first_name, last_name, dob from staging.table_1'
Waiting on bqjob_r2c64623d43d6b68d_0000016de49a1248_1 ... (0s) Current status: DONE
+------------+-----------+------------+
| first_name | last_name | dob |
+------------+-----------+------------+
| John | Doe | 1968-01-22 |
| Peter | Doe | 1968-01-22 |
+------------+-----------+------------+
$ gcloud config set project your-project
$ gsutil mb -c regional -l europe-west2 gs://project_staging_files
$ gsutil cp ./test_files/* gs://project_staging_files/
'''
This simple a Cloud Function responsible for:
- Loading data using schemas
- Loading data from different data file formats
'''
import json
import logging
import os
import traceback
- name: table_1
size: large
format: NEWLINE_DELIMITED_JSON
columns: []
schema:
- name: "id"
type: "INT64"
mode: "NULLABLE"
- name: "first_name"
type: "STRING"
from event import data
print(data['bucket'])
print(data['name'])
print(data['timeCreated'])
from main import streaming
streaming(data, 'context')
data = {"name": "table-1_data_new_line_delimited_json.json", \
"bucket":"project_staging_files", \
"timeCreated": "2019-09-24 15:54:54"\
}\
$ bq query 'select first_name, last_name, dob from staging.table_1'
Waiting on bqjob_r5f1305f93091f0a5_0000016de8e9171c_1 ... (0s) Current status: DONE
+------------+-----------+------------+
| first_name | last_name | dob |
+------------+-----------+------------+
| John | Doe | 1968-01-22 |
| Peter | Doe | 1968-01-22 |
| John | Doe | 1968-01-22 |
| Peter | Doe | 1968-01-22 |
+------------+-----------+------------+