Skip to content

Instantly share code, notes, and snippets.

View ltiao's full-sized avatar
🐢
Slow is smooth. Smooth is fast.

Louis Tiao ltiao

🐢
Slow is smooth. Smooth is fast.
View GitHub Profile
@ltiao
ltiao / scikit-learn_issues.md
Last active August 29, 2015 13:55
Issues with scikit-learn that I need to try to articulate more clearly and report to the issue tracker.
  1. CountVectorizer seems a bit dense and tries to do too much. This is not a bad thing but from a design viewpoint, it could benefit from making use of the Pipeline framework and separately define the preprocessor and tokenizer or even just analyzer as transformers in and of themselves and define CountVectorizer as a pipeline of these operatons. Should conform or make use of existing frameworks where possible rather than extending functionality to allow for exceptions.
  2. Functions from the metric module such as confusion_matrix and classification_report ought to support cross-validation by perhaps allowing a cv parameter. This might be awkward given that we already have cross_val_score and learning_curve and those are supposed to be the functions which take care of cross validation scoring. It might make sense to use confusion_matrix as the scoring function for cross_val_score but the latter only accepts scoring functions which return a single value so that won't work. The other t
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Core Sentiment Analysis Package

  • Documentation, specifically docstrings. Set-up sphinx to organize documentation.
  • Data loader need to refactored to respond gracefully to faults in the provided datasets.
  • Helper function for concatenating train, dev and test data and providing the CV params (important/useful for learning_curve or other methods that only accept a single set of data with cross-validation parameters.)
  • Turn the miscellaneous scripts into either package scripts or helper methods
  • Clean up the IPython Notebooks and they can all be executed with "run all" and remove old/irrelevant ones
  • Provide Cookbook with useful snippets
  • Fix the package installer (setup.py) to support one-click install, so when the package is distributed, it will install everything required, including e.g. NLTK Copora, etc.
  • Refactor code so all components are consistent. So far, data.py and feature_extraction.py are fully up-to-date and compliant with the design philosophy, but n
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@ltiao
ltiao / template
Last active August 29, 2015 13:57
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Structured General Purpose Assignment
% LaTeX Template
%
% This template has been downloaded from:
% http://www.latextemplates.com
%
% Original author:
% Ted Pavlic (http://www.tedpavlic.com)
%
@ltiao
ltiao / sbrk_syscall.c
Created June 9, 2014 13:13
I'm da real mvp
int sys_sbrk(__intptr_t change, void **retval) {
struct addrspace *as = proc_getas();
if (as == NULL) {
return EFAULT;
}
vaddr_t heapbreak = as->as_heapvbase + as->as_heapsize;
// we're not checking page alignment of
// change (amount) here or anything
  • lect07 (Slide 30-36) Review Classic File Organization methods
  • lect16 (140513) 1:24: Top/bottom interrupt handling
  • lect16 (140514) 0:10: Drivers are re-entrant
  • lect16 (140514) Read up on the various I/O buffering in Tanenbaum
  • lect17 (140520) Why favour I/O bound processes over CPU-bound processes?
  • lect17 (140520) 00:31 Traditional UNIX scheduler (how priorities self-adjust with clock ticks etc.)
  • lect17 (140520) 00:48 (Slide 62) Formula for scheduling
  • lect17 (140520) 1:48 Why exactly do we favour Rate Monotonic scheduling other than that it is easier to implement (why is it easier to implement?)
  • lect18 (Slide 12) What is the per-CPU private memory used for exactly?
  • Write 2 double sided A4 sheets of handwritten notes (23 June 2014)
    • Polynomial Time Hierarchy
    • Running-time bounds on f(n) space Turing machines
    • Complexity Zoology
    • A language is decideable iff it is Turing-recognizable and co-Turing-recognizable
    • Rice's Theorem
    • List of undecidable problems relating to CFL's etc
    • Different types of reductions (Log-space transducers)
    • Savitches Theorem
  • Week 12 + 13 lectures (20 June 2014)
@ltiao
ltiao / django_quickstart.md
Last active August 29, 2015 14:05
Louis' Django Quickstart Checklist

This checklist is largely inspired by Two Scoops of Django: Best Practices for Django 1.5.

  • Install virtualenvwrapper
  • mkvirtualenv <env_name>
  • workon <env_name>
  • pip install django-toolbelt
  • Create new git repository (from Github https://github.com/new) so we can get the automatically generated .gitignore, README.md, LICENSE, etc. (for the lazy)
  • Add .DS_Store and other junk to .gitignore
  • git clone <repo> <repo_root> && cd <repo_root>
  • django-admin.py startproject <dj_project> <dj_project_root>
@ltiao
ltiao / questions.md
Last active August 29, 2015 14:06
MS Interview 16 Sept 2014
  1. First interviewer: Russian lady

    Behavioural questions:

    • Tell me about a challenging/interesting project you've done.

    Technical questions:

    • Given 2 triangles, write a function to determine if they intersect.
  • Simpler variant: Given 2 rectangles, write a function to determine if they intersect.