MacHu-GWU/To do list and time line for deliver Innovizio's Runtime Predict Model as a Product.rst

## To do list and time line for deliver Innovizio's Runtime Predict Model as a Product.rst

      
    Raw
  

              To do list and time line for deliver Innovizio's Runtime Predict Model as a Product.rst
            
          
Table of Contents

To do list and time line for deliver Innovizio's Runtime Predict Model as a Product
Evaluate Innovizio's work
Convert to Whiskerlabs Prod Code
Code refactor
Modules
1. Raw Feature Data Processor
2. Feature Data Constructor
3. Training Model and save
4. Predict/Test and parameters adjustment
5. Deployment package


Documentation


To do list and time line for deliver Innovizio's Runtime Predict Model as a Product


Evaluate Innovizio's work


Reproduce the result they claims to be.

Code, Data (Could be in database/S3/locally) are properly delivered.

Documents they give is easy to understand.

Timeline: Rudyi will give me the updated code on 5/25/2017, currently I can run the code, but we don't have complete data to reproduce all figures due to some S3, Database access problem. But it doesn't takes long


Convert to Whiskerlabs Prod Code


Code refactor

Innovizio's code is more likely a lab, scientific scripts going through the whole process. But in production, we need it:

modulized: data source, data clean, pre-process, post-process module should be separate.
pluggable: it should provide a framework style API to allow us point the data points to arbitrary data source input in very flexible way.
testable: debug and log is needed.
deployment as prod: a lots of efforts to do for deployment.
Python2/3 Compat: Innovizio doesn't write code in Python2/3 compatible style. It's for Python3. But AWS Lambda needs Python2. I will rewrite the code in Python2/3 compatible style.


Modules

This list includes all pluggable module we need for this project.

1. Raw Feature Data Processor

Get, clean, structured raw feature data from any kinds of data source. Raw data is those data points are directly observed, such as indoorTemp, outdoorTemp, runtime; Not derived feature.

2. Feature Data Constructor

Derive other features, resolve NA, error data.

3. Training Model and save

The model should able to self-update. Basically we could train it every 2-week (this value varies) when we have new data feeds. And save the trained model on data persistance layer (Should be somewhere on cloud).

4. Predict/Test and parameters adjustment


This module provides a simple, straightforward API for prediction usage.

Should have a utility method to evaluate the accuracy on arbitrarily big test dataset. This helps us dynamically adjust parameters.

Timeline: above 4 should able to be done in one week, once we are able to reproduce their result


5. Deployment package


dependencies pack up and distributive

other software resource prepare, s3, lambda, deployment scripts

Timeline: this part needs team work, time line is not predictable at this time.


Documentation

It should have a nicer documentation for:

problem definition

solution, model explaination

features description

tech detail

experiment, guide for model adjustment

Timeline: 2-3 days