Skip to content

Instantly share code, notes, and snippets.

View oelesinsc24's full-sized avatar

Olalekan Fuad Elesin oelesinsc24

View GitHub Profile
@oelesinsc24
oelesinsc24 / boto_dynamodb_methods.py
Created May 31, 2018 21:30 — forked from martinapugliese/boto_dynamodb_methods.py
Some wrapper methods to deal with DynamoDB databases in Python, using boto3.
# Copyright (C) 2016 Martina Pugliese
from boto3 import resource
from boto3.dynamodb.conditions import Key
# The boto3 dynamoDB resource
dynamodb_resource = resource('dynamodb')
def get_table_metadata(table_name):

1. Clone your fork:

git clone git@github.com:YOUR-USERNAME/YOUR-FORKED-REPO.git

2. Add remote from original repository in your forked repository:

cd into/cloned/fork-repo
git remote add upstream git://github.com/ORIGINAL-DEV-USERNAME/REPO-YOU-FORKED-FROM.git
git fetch upstream
@oelesinsc24
oelesinsc24 / glue-etl-processing.py
Created January 28, 2020 20:20
AWS Glue Data Pre-processing Script
input_df = spark.read.option("header", "true").csv(s3_input_data_path)
rearranged_col_names_df = input_df.select(*columns)
# drop null values
cleaned_df = rearranged_col_names_df.dropna()
print("Dropped null values")
# split dataframe into train and validation
splits = cleaned_df.randomSplit([0.7, 0.3], 0)
@oelesinsc24
oelesinsc24 / boto3-create-glue-job.py
Last active April 12, 2022 07:42
Create AWS Glue Job with Boto3
glue = boto3.client('glue')
glue_job_name = 'MyDataProcessingETL'
s3_script_path = 's3://my-code-bucket/glue/glue-etl-processing.py'
my_glue_role = 'MyGlueJobRole' # created earlier
response = glue.create_job(
Name=glue_job_name,
@oelesinsc24
oelesinsc24 / stepfunctions-sdk-glue-job-step.py
Created January 28, 2020 20:41
AWS Glue Job Step with AWS Step Functions SDK
data_processing_step = GlueStartJobRunStep(
state_id='GlueDataProcessingStep',
parameters={
'JobName': glue_job_name,
'Arguments': {
'--s3_input_data_path': execution_input['S3InputDataPath'],
'--s3_processed_data_path': execution_input['S3OutputDataPath']#
}
}
)
@oelesinsc24
oelesinsc24 / stepfunctions-sdk-training-and-model-step.py
Created January 28, 2020 20:49
AWS Step Functions SDK Training and Model Steps
xgb = sagemaker.estimator.Estimator(
get_image_uri(region, 'xgboost'),
sagemaker_execution_role,
train_instance_count = 1,
train_instance_type = 'ml.m4.4xlarge',
train_volume_size = 5,
output_path = f's3://{model_bucket}/{prefix}',
sagemaker_session = session
)
@oelesinsc24
oelesinsc24 / stepfunctions-sdk-chain-steps.py
Created January 28, 2020 20:52
AWS Step Functions SDK Chain Workflow Steps
workflow_definition = Chain([
data_processing_step,
training_step,
model_step,
transform_step
])
workflow = Workflow(
name='MyTrainTransformDeployWithGlue_v2',
@oelesinsc24
oelesinsc24 / create-sagemaker-processing-job.py
Created February 12, 2020 16:25
Create SageMaker Processing Job Lambda Function
sm_client = boto3.client('sagemaker')
from sagemaker.processing import ScriptProcessor, ProcessingInput, ProcessingOutput
BASE_PROCESSING_IMAGE = ''
INPUT_DATA_DESTINATION = '/opt/ml/processing/input_data'
PROCESSED_DATA_PATH = '/opt/ml/processing/processed_data'
DEFAULT_VOLUME_SIZE = 100
DEFAULT_INSTANCE_TYPE = 'ml.m5.xlarge'
DEFAULT_INSTANCE_COUNT = 1
@oelesinsc24
oelesinsc24 / check-processing-job-status.py
Created February 12, 2020 16:26
Lambda function to check SageMaker Processing Job Status
import boto3
import json
sm_client = boto3.client('sagemaker')
def lambda_handler(event, context):
"""
:param event:
@oelesinsc24
oelesinsc24 / create-processing-job-step.py
Created February 12, 2020 16:29
Create Processing Job with AWS Step Functions Data Science SDK Lambda Step
data_processing_configuration = dict(
JobName=execution_input['JobName'],
IAMRole=execution_input['IAMRole'],
LocalStorageSizeGB=50,
S3CodePath=execution_input['S3CodePath'],
S3InputDataPath=execution_input['S3InputDataPath'],
S3OutputDataPath=execution_input['S3OutputDataPath'],
EcrContainerUri=execution_input['EcrContainerUri']
)