Skip to content

Instantly share code, notes, and snippets.

@dyerrington
Created September 18, 2019 21:33
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dyerrington/a3d283306318aaa73711825ca08b4621 to your computer and use it in GitHub Desktop.
Save dyerrington/a3d283306318aaa73711825ca08b4621 to your computer and use it in GitHub Desktop.
Python code that will create, essentially a pivot from a nested big query set. Based on the original method in the google big query documentation.
# fighting == most common event type
def build_udf_prototype(event_types):
null = "null" # default all types to null in the UDF function
PIVOT_FEATURES = str({"col_" + event_name.replace("-", "_"): null for event_name in event_types.tolist()}).replace("'null'", "null")
SQL_RETURN = "STRUCT<"
for event_type in event_types.tolist():
event_type = event_type.replace("-", "_")
SQL_RETURN += f"col_{event_type} INT64, "
SQL_RETURN = SQL_RETURN[:-2] + ">"
tripple_boundary = '"""'
sql_template = """
CREATE TEMP FUNCTION customGreeting(event ARRAY<STRUCT< name STRING, rec_count INT64>>)
RETURNS {SQL_RETURN}
LANGUAGE js AS {tripple_boundary}
var pivot = {PIVOT_FEATURES}
for(i = 0; i < event.length; i++) {
key_name = 'col_' + event[i]['name']
if (pivot.hasOwnProperty(key_name)) {
pivot[key_name] = event[i]['rec_count'];
}
}
return pivot
{tripple_boundary};
SELECT
key,
id,
date,
customGreeting(events)
FROM [your dataset]
LIMIT 100;
"""
sql_template = sql_template.replace("{PIVOT_FEATURES}", PIVOT_FEATURES)
sql_template = sql_template.replace("{tripple_boundary}", tripple_boundary)
sql_template = sql_template.replace("{SQL_RETURN}", SQL_RETURN)
print(sql_template)
events_df.apply(build_udf_prototype)
@dyerrington
Copy link
Author

dyerrington commented Sep 18, 2019

This creates an SQL statement:

CREATE TEMP FUNCTION customGreeting(event ARRAY<STRUCT< name STRING, rec_count INT64>>)
RETURNS STRUCT<col_0_borrow INT64, col_13 INT64, col_14 INT64, col_15 INT64, col_16 INT64, col_17 INT64, col_1_borrow INT64, col_30 INT64, col_4_borrow INT64, col_5_borrow INT64>
LANGUAGE js AS """
 
  var pivot = {'col_0_borrow': null, 'col_13': null, 'col_14': null, 'col_15': null, 'col_16': null, 'col_17': null, 'col_1_borrow': null, 'col_30': null, 'col_4_borrow': null, 'col_5_borrow': null}

  for(i = 0; i < event.length; i++) {
      key_name = 'col_' + event[i]['name']
      if (pivot.hasOwnProperty(key_name)) {
         pivot[key_name] = event[i]['rec_count'];
      }
  }

  return pivot
  """;
SELECT 
key,
id,
date,
customGreeting(events)
FROM [the dataset you have]
LIMIT 100;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment