Skip to content

Instantly share code, notes, and snippets.

@atharvai
Created November 29, 2018 00:05
Show Gist options
  • Save atharvai/7a45dc73058c74a497faec894bc012f1 to your computer and use it in GitHub Desktop.
Save atharvai/7a45dc73058c74a497faec894bc012f1 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# How to use Amazon Forecast\n",
"\n",
"Helps advanced users start with Amazon Forecast quickly. \n",
"Prerequisites: \n",
"[AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/installing.html)\n",
"\n",
"## Table Of Contents\n",
"* [Setting up](#setup)\n",
"* [Test Setup - Running first API](#hello)\n",
"* [Forecasting Example with Amazon Forecast](#forecastingExample)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setting up Preview SDK<a class=\"anchor\" id=\"setup\"></a>"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# Configures your AWS CLI to now understand our up and coming service Amazon Forecast\n",
"!aws configure add-model --service-model file://forecastquery-2018-06-26.normal.json --service-name forecastquery\n",
"!aws configure add-model --service-model file://forecast-2018-06-26.normal.json --service-name forecast"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*Checkpoint* : The user running Amazon Forecast should have some minimum permissions for Amazon Forecast to run smoothly. Look at [setup_forecast_permissions](setup_forecast_permissions.py) for a policy example in the documentation. Keep the AWS user credentials ready if it's not the default user running this notebook. "
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"import boto3\n",
"import json\n",
"from time import sleep"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"session = boto3.Session()\n",
"\n",
"forecast = session.client(service_name='forecast',endpoint_url='https://forecast.us-west-2.amazonaws.com')\n",
"forecastquery = session.client(service_name='forecastquery',endpoint_url='https://forecastquery.us-west-2.amazonaws.com')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Test Setup <a class=\"anchor\" id=\"hello\"></a>\n",
"Let's say Hi to the Amazon Forecast to interact with our Simple API ListRecipes. It displays the Global recipes that could potentially be a part of your forecasting solution. Please use recipes start with **forecast_**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*If this ran successfully, kudos! If there are any errors at this point runing the following list_recipes, please say send your questions\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'RecipeNames': ['forecast_ARIMA',\n",
" 'forecast_DEEP_AR',\n",
" 'forecast_DEEP_AR_PLUS',\n",
" 'forecast_ETS',\n",
" 'forecast_MDN',\n",
" 'forecast_MQRNN',\n",
" 'forecast_NPTS',\n",
" 'forecast_PROPHET',\n",
" 'forecast_SQF'],\n",
" 'ResponseMetadata': {'RequestId': '6d201e04-5075-4281-be24-a1e6183a3bf9',\n",
" 'HTTPStatusCode': 200,\n",
" 'HTTPHeaders': {'content-type': 'application/x-amz-json-1.1',\n",
" 'date': 'Wed, 28 Nov 2018 23:55:44 GMT',\n",
" 'x-amzn-requestid': '6d201e04-5075-4281-be24-a1e6183a3bf9',\n",
" 'content-length': '174',\n",
" 'connection': 'keep-alive'},\n",
" 'RetryAttempts': 0}}"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"forecast.list_recipes()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Forecasting with Amazon Forecast<a class=\"anchor\" id=\"forecastingExample\"></a>\n",
"### Preparing your Data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In Amazon Forecast , a dataset is a file which contains data that is relevant for a forecasting task. A dataset must conform to a schema provided by Amazon Forecast. We are currently providing a schema for the retail demand forecasting use case, and we will be adding more datatypes to this use case and adding schemas for other forecasting use cases over the next few months. For the retail demand forecasting use case, users must prepare CSV format with the below fields. \n",
"Each column should correspond to one of the fields listed below. However, the file should *NOT* contain a header row with the names of columns - only the data should be present."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"At this point, let's peek at the example data in this tutorial"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# data type"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>2014-01-01 01:00:00</th>\n",
" <th>38.34991708126038</th>\n",
" <th>client_12</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2014-01-01 02:00:00</td>\n",
" <td>33.5820895522388</td>\n",
" <td>client_12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2014-01-01 03:00:00</td>\n",
" <td>34.41127694859037</td>\n",
" <td>client_12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2014-01-01 04:00:00</td>\n",
" <td>39.800995024875625</td>\n",
" <td>client_12</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 2014-01-01 01:00:00 38.34991708126038 client_12\n",
"0 2014-01-01 02:00:00 33.5820895522388 client_12\n",
"1 2014-01-01 03:00:00 34.41127694859037 client_12\n",
"2 2014-01-01 04:00:00 39.800995024875625 client_12"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"df = pd.read_csv(\"data/item-demand-time.csv\", dtype = object)\n",
"df.head(3)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"s3 = session.client('s3')\n"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'ResponseMetadata': {'RequestId': '19E3415205BFB390',\n",
" 'HostId': 'NNB2bpdaAx8oSZmTk47RfzhAfRys7+7rvSWza6NCKO/tryS8FvBGWorcBSvaieLvd1+db6z99Ws=',\n",
" 'HTTPStatusCode': 200,\n",
" 'HTTPHeaders': {'x-amz-id-2': 'NNB2bpdaAx8oSZmTk47RfzhAfRys7+7rvSWza6NCKO/tryS8FvBGWorcBSvaieLvd1+db6z99Ws=',\n",
" 'x-amz-request-id': '19E3415205BFB390',\n",
" 'date': 'Thu, 29 Nov 2018 00:01:44 GMT',\n",
" 'content-type': 'application/xml',\n",
" 'transfer-encoding': 'chunked',\n",
" 'server': 'AmazonS3'},\n",
" 'RetryAttempts': 0},\n",
" 'Buckets': [{'Name': 'workshop-forecast-939666926172',\n",
" 'CreationDate': datetime.datetime(2018, 11, 28, 3, 15, 45, tzinfo=tzlocal())},\n",
" {'Name': 'workshop-forecast-939666926172-data',\n",
" 'CreationDate': datetime.datetime(2018, 11, 28, 23, 29, 4, tzinfo=tzlocal())}],\n",
" 'Owner': {'DisplayName': 'amazon-forecast-workshop+group-01-member-3664',\n",
" 'ID': 'dc709c911002fa35ff148920376e56bb2551fe6cb62ae84e60d1efe50b7f9f58'}}"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s3.list_buckets()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"accountId = boto3.client('sts').get_caller_identity().get('Account')"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"bucketName = 'workshop-forecast-%s-data'%accountId"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"roleArn = 'arn:aws:iam::%s:role/workshopdemorole'%accountId"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Allow Amazon Forecast to access it<a class=\"anchor\" id=\"forecastAccess\"></a>\n",
"Amazon Forecast currently supports reading from S3 buckets (with our without encryptions). One time set up for a bucket. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### CreateDataset"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"DATASET_FREQUENCY = \"H\" \n",
"TIMESTAMP_FORMAT = \"yyyy-MM-dd hh:mm:ss\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"project = 'workshop'\n",
"datasetName= project+'_ds'\n",
"datasetGroupName= project +'_gp'\n",
"s3DataPath = \"s3://\"+bucketName+\"/data\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"datasetName"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Specify the schema of your dataset here. Make sure the order of columns matches the raw data files.\n",
"schema ={\n",
" \"Attributes\":[\n",
" {\n",
" \"AttributeName\":\"timestamp\",\n",
" \"AttributeType\":\"timestamp\"\n",
" },\n",
" {\n",
" \"AttributeName\":\"target_value\",\n",
" \"AttributeType\":\"float\"\n",
" },\n",
" {\n",
" \"AttributeName\":\"item_id\",\n",
" \"AttributeType\":\"string\"\n",
" }\n",
" ]\n",
"}\n",
"\n",
"response=forecast.create_dataset(\n",
" Domain=\"CUSTOM\",\n",
" DatasetType='TARGET_TIME_SERIES',\n",
" DataFormat='CSV',\n",
" DatasetName=datasetName,\n",
" DataFrequency=DATASET_FREQUENCY, \n",
" TimeStampFormat=TIMESTAMP_FORMAT,\n",
" Schema = schema\n",
" )\n",
"print(response)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"forecast.describe_dataset(DatasetName=datasetName)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"forecast.create_dataset_group(DatasetGroupName=datasetGroupName,RoleArn=roleArn,DatasetNames=[datasetName])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you have an existing datasetgroup, you can update it"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"forecast.describe_dataset_group(DatasetGroupName=datasetGroupName)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create Data Import Job\n",
"Brings the data into Amazon Forecast system ready to forecast from raw data. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ds_import_job_response=forecast.create_dataset_import_job(DatasetName=datasetName,Delimiter=',', DatasetGroupName =datasetGroupName ,S3Uri= s3DataPath)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ds_versionId=ds_import_job_response['VersionId']\n",
"print(ds_versionId)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"forecast.update_dataset_group(DatasetGroupName=datasetGroupName, DatasetNames = [datasetName])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Check the status of dataset, when the status change from **CREATING** to **ACTIVE**, we can continue to next steps. Depending on the data size. It can take 10 mins to be **ACTIVE**."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"while True:\n",
" dataImportStatus = forecast.describe_dataset_import_job(DatasetName=datasetName,VersionId=ds_versionId)['Status']\n",
" print(dataImportStatus)\n",
" if dataImportStatus != 'ACTIVE' and dataImportStatus != 'FAILED':\n",
" sleep(30)\n",
" else:\n",
" break"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"forecast.describe_dataset_import_job(DatasetName=datasetName,VersionId=ds_versionId)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Recipe"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"recipesResponse=forecast.list_recipes()\n",
"recipesResponse"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Get details about each recipe."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"forecast.describe_recipe(RecipeName='forecast_MQRNN')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create Solution with customer forecast horizon"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Forecast horizon is how long in future the forecast should be predicting. For weekly data, a value of 12 means 1 weeks. Our example is hourly data, we try to do next hour"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"predictorName= project+'_mqrnn'"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"forecastHorizon = 1"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"createPredictorResponse=forecast.create_predictor(RecipeName='forecast_MQRNN',DatasetGroupName= datasetGroupName ,PredictorName=predictorName, \n",
" ForecastHorizon = forecastHorizon)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"predictorVerionId=createPredictorResponse['VersionId']"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"forecast.list_predictor_versions(PredictorName=predictorName)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"forecast.list_predictors()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Check the status of solutions, when the status change from **CREATING** to **ACTIVE**, we can continue to next steps. Depending on data size, model selection and hyper parameters,it can take 10 mins to more than one hour to be **ACTIVE**."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"while True:\n",
" predictorStatus = forecast.describe_predictor(PredictorName=predictorName,VersionId=predictorVerionId)['Status']\n",
" print(predictorStatus)\n",
" if predictorStatus != 'ACTIVE' and predictorStatus != 'FAILED':\n",
" sleep(30)\n",
" else:\n",
" break"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Get Error Metrics"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"forecastquery.get_accuracy_metrics(PredictorName= predictorName)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Deploy Predictor"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"forecast.deploy_predictor(PredictorName=predictorName)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"deployedPredictorsResponse=forecast.list_deployed_predictors()\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"deployedPredictorsResponse"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"forecast.describe_deployed_predictor(PredictorName=predictorName)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Get Forecast"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When the solution is deployed and forecast results are ready, you can view them. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"forecastResponse = forecastquery.get_forecast(\n",
" PredictorName=predictorName,\n",
" Interval=\"hour\",\n",
" Filters={\"item_id\":\"client_12\"}\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Export Forecast"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can batch export forecast to s3 bucket. To do so an role with s3 put access is needed. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"forecast.create_forecast_export_job(ForecastId=forecastId, OutputPath={\"S3Uri\": s3DataPath,\"RoleArn\":roleArn})"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "conda_python3",
"language": "python",
"name": "conda_python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment