Crisp-DM on AWS
Six phase of DM
Phase 1: Business Understanding
Highlight the Critical feature: The people and the resourece involved
Analyzing supporting information
Convert
Define the criteria for successful outcome of the project
Create a project plan:
Phase 2: Data Understanding
Data Collection
- Detail sources and steps to extract data
- Analyzie data for additional requirements
- Consider other data sources
Data properties
Data Quality
- Verifying attributes
- Identifying missing data
- Reveal inconsistencies
- Report solution - list the steps to solve
AWS Services for Data understanding
- Amazon Athena
- Amazon QuickSight
- ASW Glue
AWS Glue
AWS Athena
- Runs SQL on Amazon S3 data
- Serverless
- Schema-on-read
AWS QuickSight
- A BI service
- Cheap
- Use StoryVoard to secure sharing and collaboration
Demo
- Data- banking.csv
- Use Glue Crawlers to turn CSV into column tables
A made table built upon the CSV.
View the data in Athena
View the data in QuickSight Note: No Redshift or any instance are created. Only run in S3.
Phase 3: Data Preparation
- Final data selection
- Analyze constraints: size, columns, data type
- Preparing the data
- Clean
- Transforming
- Merging into one final dataset
- Formatted to properly work with the model
Data cleaning
Transformation
Merging
Use join and concatenation to merge data. Recommende to revisit the dataset again.
Formatting
- Rearrange attirbutes
- Shuffle data
- Do proper encoding
Phase 4: Modeling
Work together with the Data Preparation phase
Generate a model testing plan
Another choice to Sagemaker
Phase 5: Evaluation
Review the project
Creating the report is important. In this phase, the team has to make the decision to launch the model to production, or start again.
Phase 6: Deployment
Planning Deployment
- AWS EC2 - VM setup managed by user and cloud
- AWS EC2 Container - VM via a docker container
- AWS Lambda - without having to setup and manange on your own.
- AWS Sagemaker
Application Deployment
Note: Identify these three services
Infrastructure Deployment
- CloudFormation Build infrastructure using streamlined template or JSON.
Code Management
Monitoring
Final Report
Output of all above creates the report.
Project Review
Assess the outcomes of the project.