Skip to content

Instantly share code, notes, and snippets.

View oelesinsc24's full-sized avatar

Olalekan Fuad Elesin oelesinsc24

View GitHub Profile
@oelesinsc24
oelesinsc24 / boto3-create-glue-job.py
Last active April 12, 2022 07:42
Create AWS Glue Job with Boto3
glue = boto3.client('glue')
glue_job_name = 'MyDataProcessingETL'
s3_script_path = 's3://my-code-bucket/glue/glue-etl-processing.py'
my_glue_role = 'MyGlueJobRole' # created earlier
response = glue.create_job(
Name=glue_job_name,
@oelesinsc24
oelesinsc24 / glue-etl-processing.py
Created January 28, 2020 20:20
AWS Glue Data Pre-processing Script
input_df = spark.read.option("header", "true").csv(s3_input_data_path)
rearranged_col_names_df = input_df.select(*columns)
# drop null values
cleaned_df = rearranged_col_names_df.dropna()
print("Dropped null values")
# split dataframe into train and validation
splits = cleaned_df.randomSplit([0.7, 0.3], 0)

1. Clone your fork:

git clone git@github.com:YOUR-USERNAME/YOUR-FORKED-REPO.git

2. Add remote from original repository in your forked repository:

cd into/cloned/fork-repo
git remote add upstream git://github.com/ORIGINAL-DEV-USERNAME/REPO-YOU-FORKED-FROM.git
git fetch upstream
@oelesinsc24
oelesinsc24 / boto_dynamodb_methods.py
Created May 31, 2018 21:30 — forked from martinapugliese/boto_dynamodb_methods.py
Some wrapper methods to deal with DynamoDB databases in Python, using boto3.
# Copyright (C) 2016 Martina Pugliese
from boto3 import resource
from boto3.dynamodb.conditions import Key
# The boto3 dynamoDB resource
dynamodb_resource = resource('dynamodb')
def get_table_metadata(table_name):