Table of Contents
- Reproduce the result they claims to be.
- Code, Data (Could be in database/S3/locally) are properly delivered.
Documents they give is easy to understand.
Timeline: Rudyi will give me the updated code on 5/25/2017, currently I can run the code, but we don't have complete data to reproduce all figures due to some S3, Database access problem. But it doesn't takes long
Innovizio's code is more likely a lab, scientific scripts going through the whole process. But in production, we need it:
- modulized: data source, data clean, pre-process, post-process module should be separate.
- pluggable: it should provide a framework style API to allow us point the data points to arbitrary data source input in very flexible way.
- testable: debug and log is needed.
- deployment as prod: a lots of efforts to do for deployment.
- Python2/3 Compat: Innovizio doesn't write code in Python2/3 compatible style. It's for Python3. But AWS Lambda needs Python2. I will rewrite the code in Python2/3 compatible style.
This list includes all pluggable module we need for this project.
Get, clean, structured raw feature data from any kinds of data source. Raw data is those data points are directly observed, such as indoorTemp, outdoorTemp, runtime; Not derived feature.
Derive other features, resolve NA, error data.
The model should able to self-update. Basically we could train it every 2-week (this value varies) when we have new data feeds. And save the trained model on data persistance layer (Should be somewhere on cloud).
- This module provides a simple, straightforward API for prediction usage.
Should have a utility method to evaluate the accuracy on arbitrarily big test dataset. This helps us dynamically adjust parameters.
Timeline: above 4 should able to be done in one week, once we are able to reproduce their result
- dependencies pack up and distributive
other software resource prepare, s3, lambda, deployment scripts
Timeline: this part needs team work, time line is not predictable at this time.
It should have a nicer documentation for:
- problem definition
- solution, model explaination
- features description
- tech detail
experiment, guide for model adjustment
Timeline: 2-3 days