How you can improve the accuracy of your machine learning models? How to find which data columns make the most useful features? Here we will discuss the elements of good vs bad features and how you can preprocess and transform them for optimal use in your machine learning models.
Hands-on practice choosing features and preprocessing them inside of Google Cloud Platform with interactive labs. Code solutions which will be made public for your reference as you work on your own future data science projects.
Course Objectives:
- Describe the major areas of Feature Engineering
- Turn raw data to features
- Compare good vs bad features
- Represent features
- Get started with preprocessing and feature creation
- Use Apache Beam and Cloud Dataflow for feature engineering
- Recognize where feature crosses are a powerful way to help machines learn
- Implement feature crosses in TensorFlow
- Incorporate feature creation as part of your ML pipeline
- Improve the taxifare model using feature crosses
- Implement feature preprocessing and feature creation using tf.transform
- Carry out feature processing efficiently, at scale and on streaming data
Labs and Demos: Lab: Training Data Analyst
Lab: Improve model accuracy with new features
Lab: Simple Dataflow Pipeline (Python) -- grep.py and grepc.py
Lab: MapReduce in Dataflow (Python) -- is_popular.py
Lab: Computing Time-Windowed Features in Cloud Dataprep
Lab: Feature Crosses to create a good classifier