Skip to content

Instantly share code, notes, and snippets.

View guyernest's full-sized avatar
🎯
Focusing

Guy (MLGuy) guyernest

🎯
Focusing
View GitHub Profile
@guyernest
guyernest / TagsAnalysis.R
Last active June 13, 2018 03:11
StackOverflow Tags Query
# Read CSV into R
TagsPerMonth <- read.csv(file="../StackOverFlowData.csv", header=TRUE, sep=",")
amazonTags <- c("amazon-web-services", "amazon", "amazon-product-api", "amazon-mws",
"amazon-appstore", "amazon-echo", "amazon-fire-tv", "amazon-payments",
"amazonica", "amazon-silk", "amazon-marketplace", "amazonads", "login-with-amazon",
"amazon-echo-show", "amazon-mobile-ads", "amazon-clouddrive", "amazon-firefly",
"amazon-in", "amazonsellercentral")
#awsPopularTags <- c("amazon-s3", "amazon-ec2")
topTagsPerMonth <- subset(TagsPerMonth, !TagName %in% amazonTags)
@guyernest
guyernest / lambda_function.py
Created September 1, 2018 16:50
AWS Lambda function to start or step notebook instances
import boto3
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
client = boto3.client('sagemaker')
def lambda_handler(event, context):
@guyernest
guyernest / Lambda_Terraform.tf
Last active September 4, 2018 07:02
Writing Terraform configuration file for a Lambda function from a Jupyter notebook
%%writefile lambda.tf
provider "aws" {
region = "eu-west-1"
}
resource "aws_lambda_function" "start_stop_sm" {
function_name = "StopStartSageMakerNotebookInstances"
# The bucket name as created earlier with "aws s3api create-bucket"
s3_bucket = "terraform-serverless-repository"
@guyernest
guyernest / IAM_Terraform.tf
Created September 5, 2018 12:55
Appending to the Terraform configuration file the IAM permissions for the Lambda function
%%writefile -a lambda.tf
# IAM role which dictates what other AWS services the Lambda function
# may access.
## Testing a working example
resource "aws_iam_role" "lambda_exec" {
name = "assume-role"
assume_role_policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
@guyernest
guyernest / CloudWatch_Terraform.tf
Created September 5, 2018 13:38
Creating two CloudWatch events to trigger the Lambda function to stop and start SageMaker notebook instances
%%writefile cloud_watch.tf
## Based on the location of the instances. This is for Israel where
## people are working Sunday to Thursday and 5AM GTM is 8AM
resource "aws_cloudwatch_event_rule" "on_duty" {
name = "on_duty"
description = "Fires at the beginning of the working day"
schedule_expression = "cron(0 5 ? * SUN-THU *)"
}
## people are working Sunday to Thursday and 4PM GTM is 7PM
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@guyernest
guyernest / category_aggregation.sql
Last active May 18, 2019 10:30
Forecast Pipeline 02
/*train_1217_category training data aggregate by location and category*/
CREATE TABLE forecast.train_1217_category
WITH (
format='TEXTFILE',
external_location='s3://forecast-xxxxxx/sagemaker/train_1217_category/',
field_delimiter = ','
) AS
SELECT category as item_id, date_format(calendar_date, '%Y-%m-%d') as timestamp, sum(sales) as demand
FROM forecast.compressed_data
where calendar_date < CAST('2018-01-01' AS DATE)
@guyernest
guyernest / csv_to_parquet.sql
Last active May 18, 2019 10:27
Forecast Pipeline 01
/*Original Data Compressed*/
CREATE TABLE forecast.compressed_data
WITH (
format='PARQUET',
external_location='s3://forecast-xxxxxx/sagemaker/data/',
partitioned_by = ARRAY['year']
) AS
SELECT *, year(calendar_date) as year FROM forecast.data;
@guyernest
guyernest / create_timeseries_dataset.py
Last active May 18, 2019 13:59
Forecast Pipeline 03
import boto3
session = boto3.Session(region_name='us-east-1')
forecast = session.client(service_name='forecast')
forecastquery = session.client(service_name='forecastquery')
# The name of the dataset that we created with Athena
train_dataset_table = 'train_1217_category'
DATASET_FREQUENCY = "D"
TIMESTAMP_FORMAT = "yyyy-MM-dd"
@guyernest
guyernest / predictor_creation.py
Last active May 18, 2019 14:04
Forecast Pipeline 04
predictorName= project+'_ARIMA'
# We will create forecast prediction for two months (60 days)
forecastHorizon = 60
# Starting the predictor creation job
createPredictorResponse=forecast.create_predictor(
RecipeName=recipe,
DatasetGroupName= datasetGroupName ,
PredictorName=predictorName,