Skip to content

Instantly share code, notes, and snippets.

@bluekidds
Created December 29, 2018 10:43
Show Gist options
  • Save bluekidds/cad5c0ea2e5051b638ec39810f3c4b09 to your computer and use it in GitHub Desktop.
Save bluekidds/cad5c0ea2e5051b638ec39810f3c4b09 to your computer and use it in GitHub Desktop.

Crisp-DM on AWS

Six phase of DM

Phase 1: Business Understanding

Highlight the Critical feature: The people and the resourece involved

Analyzing supporting information

Convert

Define the criteria for successful outcome of the project

Create a project plan:

Phase 2: Data Understanding

Data Collection

  • Detail sources and steps to extract data
  • Analyzie data for additional requirements
  • Consider other data sources

Data properties

Data Quality

  • Verifying attributes
  • Identifying missing data
  • Reveal inconsistencies
  • Report solution - list the steps to solve

AWS Services for Data understanding

  • Amazon Athena
  • Amazon QuickSight
  • ASW Glue

AWS Glue

AWS Athena

  • Runs SQL on Amazon S3 data
  • Serverless
  • Schema-on-read

AWS QuickSight

  • A BI service
  • Cheap
  • Use StoryVoard to secure sharing and collaboration

Demo

  1. Data- banking.csv
  2. Use Glue Crawlers to turn CSV into column tables

e.g.

A made table built upon the CSV.

View the data in Athena

View the data in QuickSight Note: No Redshift or any instance are created. Only run in S3.

Phase 3: Data Preparation

  • Final data selection
    • Analyze constraints: size, columns, data type
  • Preparing the data
    • Clean
    • Transforming
    • Merging into one final dataset
    • Formatted to properly work with the model

Data cleaning

Transformation

Merging

Use join and concatenation to merge data. Recommende to revisit the dataset again.

Formatting

  • Rearrange attirbutes
  • Shuffle data
  • Do proper encoding

Phase 4: Modeling

Work together with the Data Preparation phase

Generate a model testing plan

Another choice to Sagemaker

Phase 5: Evaluation

Review the project

Creating the report is important. In this phase, the team has to make the decision to launch the model to production, or start again.

Phase 6: Deployment

Planning Deployment

  • AWS EC2 - VM setup managed by user and cloud
  • AWS EC2 Container - VM via a docker container
  • AWS Lambda - without having to setup and manange on your own.
  • AWS Sagemaker

Application Deployment

Note: Identify these three services

Infrastructure Deployment

  • CloudFormation Build infrastructure using streamlined template or JSON.

Code Management

Monitoring

Final Report

Output of all above creates the report.

Project Review

Assess the outcomes of the project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment