Six phase of DM
Highlight the Critical feature: The people and the resourece involved
Define the criteria for successful outcome of the project
- Detail sources and steps to extract data
- Analyzie data for additional requirements
- Consider other data sources
- Verifying attributes
- Identifying missing data
- Reveal inconsistencies
- Report solution - list the steps to solve
- Amazon Athena
- Amazon QuickSight
- ASW Glue
- Runs SQL on Amazon S3 data
- Serverless
- Schema-on-read
- A BI service
- Cheap
- Use StoryVoard to secure sharing and collaboration
- Data- banking.csv
- Use Glue Crawlers to turn CSV into column tables
A made table built upon the CSV.
View the data in Athena
View the data in QuickSight Note: No Redshift or any instance are created. Only run in S3.
- Final data selection
- Analyze constraints: size, columns, data type
- Preparing the data
- Clean
- Transforming
- Merging into one final dataset
- Formatted to properly work with the model
Use join and concatenation to merge data. Recommende to revisit the dataset again.
- Rearrange attirbutes
- Shuffle data
- Do proper encoding
Work together with the Data Preparation phase
Another choice to Sagemaker
Creating the report is important. In this phase, the team has to make the decision to launch the model to production, or start again.
- AWS EC2 - VM setup managed by user and cloud
- AWS EC2 Container - VM via a docker container
- AWS Lambda - without having to setup and manange on your own.
- AWS Sagemaker
Note: Identify these three services
- CloudFormation Build infrastructure using streamlined template or JSON.
Output of all above creates the report.
Assess the outcomes of the project.