Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save MacHu-GWU/82be462b91a12a7bbcf71a6b6c7fa863 to your computer and use it in GitHub Desktop.
Save MacHu-GWU/82be462b91a12a7bbcf71a6b6c7fa863 to your computer and use it in GitHub Desktop.
Temp file for sharing

Table of Contents

To do list and time line for deliver Innovizio's Runtime Predict Model as a Product

Evaluate Innovizio's work

  1. Reproduce the result they claims to be.
  2. Code, Data (Could be in database/S3/locally) are properly delivered.
  3. Documents they give is easy to understand.

    Timeline: Rudyi will give me the updated code on 5/25/2017, currently I can run the code, but we don't have complete data to reproduce all figures due to some S3, Database access problem. But it doesn't takes long

Convert to Whiskerlabs Prod Code

Code refactor

Innovizio's code is more likely a lab, scientific scripts going through the whole process. But in production, we need it:

  1. modulized: data source, data clean, pre-process, post-process module should be separate.
  2. pluggable: it should provide a framework style API to allow us point the data points to arbitrary data source input in very flexible way.
  3. testable: debug and log is needed.
  4. deployment as prod: a lots of efforts to do for deployment.
  5. Python2/3 Compat: Innovizio doesn't write code in Python2/3 compatible style. It's for Python3. But AWS Lambda needs Python2. I will rewrite the code in Python2/3 compatible style.

Modules

This list includes all pluggable module we need for this project.

1. Raw Feature Data Processor

Get, clean, structured raw feature data from any kinds of data source. Raw data is those data points are directly observed, such as indoorTemp, outdoorTemp, runtime; Not derived feature.

2. Feature Data Constructor

Derive other features, resolve NA, error data.

3. Training Model and save

The model should able to self-update. Basically we could train it every 2-week (this value varies) when we have new data feeds. And save the trained model on data persistance layer (Should be somewhere on cloud).

4. Predict/Test and parameters adjustment

  1. This module provides a simple, straightforward API for prediction usage.
  2. Should have a utility method to evaluate the accuracy on arbitrarily big test dataset. This helps us dynamically adjust parameters.

    Timeline: above 4 should able to be done in one week, once we are able to reproduce their result

5. Deployment package

  • dependencies pack up and distributive
  • other software resource prepare, s3, lambda, deployment scripts

    Timeline: this part needs team work, time line is not predictable at this time.

Documentation

It should have a nicer documentation for:

  1. problem definition
  2. solution, model explaination
  3. features description
  4. tech detail
  5. experiment, guide for model adjustment

    Timeline: 2-3 days

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment