Skip to content

Instantly share code, notes, and snippets.

View ecmonsen's full-sized avatar

ECMonsen ecmonsen

View GitHub Profile
@ecmonsen
ecmonsen / gist:e1e3e6906def081a5fbf43ad3a653171
Last active December 7, 2023 22:57
Make an Athena-created view available to Glue Jobs
# Making an Athena view queryable in a Glue job
#
# When a view is created by running an Athena SQL query, it appears in the Glue catalog but errors are raised when
# querying the view in a Glue job or a Glue Spark context.
#
# Use this gist to programmatically update the view's metadata in the Glue catalog.
#
# After this you should be able to run `spark.sql("select * from mydb.myview")` without errors.
#
# Assumes you have created the view `mydb.myview` in Athena.
@ecmonsen
ecmonsen / gist:76759c5ab42a1973ef2dac7668bfe883
Created October 3, 2023 23:39
Pseudo-python in response to a recent interview question.
"""
Build the start of an e2e pipeline designed to be robust, extensible and scalable.
Approach and structure is open ended.
Use any packages you like but ensure code is as close to executable as possible.
Input
API endpoint = "testurl.com/endpoint"
- json response that contains some IDs, a description of that ID, and the most
recent modification date for that id’s description.
Has three columns: