CountVectorizer
seems a bit dense and tries to do too much. This is not a bad thing but from a design viewpoint, it could benefit from making use of thePipeline
framework and separately define thepreprocessor
andtokenizer
or even justanalyzer
as transformers in and of themselves and defineCountVectorizer
as a pipeline of these operatons. Should conform or make use of existing frameworks where possible rather than extending functionality to allow for exceptions.- Functions from the
metric
module such asconfusion_matrix
andclassification_report
ought to support cross-validation by perhaps allowing acv
parameter. This might be awkward given that we already havecross_val_score
andlearning_curve
and those are supposed to be the functions which take care of cross validation scoring. It might make sense to useconfusion_matrix
as the scoring function forcross_val_score
but the latter only accepts scoring functions which return a single value so that won't work. The other t
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Core Sentiment Analysis Package
- Documentation, specifically docstrings. Set-up sphinx to organize documentation.
- Data loader need to refactored to respond gracefully to faults in the provided datasets.
- Helper function for concatenating train, dev and test data and providing the CV params (important/useful for
learning_curve
or other methods that only accept a single set of data with cross-validation parameters.) - Turn the miscellaneous scripts into either package scripts or helper methods
- Clean up the IPython Notebooks and they can all be executed with "run all" and remove old/irrelevant ones
- Provide Cookbook with useful snippets
- Fix the package installer (
setup.py
) to support one-click install, so when the package is distributed, it will install everything required, including e.g. NLTK Copora, etc. - Refactor code so all components are consistent. So far,
data.py
andfeature_extraction.py
are fully up-to-date and compliant with the design philosophy, but n
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% | |
% Structured General Purpose Assignment | |
% LaTeX Template | |
% | |
% This template has been downloaded from: | |
% http://www.latextemplates.com | |
% | |
% Original author: | |
% Ted Pavlic (http://www.tedpavlic.com) | |
% |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
int sys_sbrk(__intptr_t change, void **retval) { | |
struct addrspace *as = proc_getas(); | |
if (as == NULL) { | |
return EFAULT; | |
} | |
vaddr_t heapbreak = as->as_heapvbase + as->as_heapsize; | |
// we're not checking page alignment of | |
// change (amount) here or anything |
- lect07 (Slide 30-36) Review Classic File Organization methods
- lect16 (140513) 1:24: Top/bottom interrupt handling
- lect16 (140514) 0:10: Drivers are re-entrant
- lect16 (140514) Read up on the various I/O buffering in Tanenbaum
- lect17 (140520) Why favour I/O bound processes over CPU-bound processes?
- lect17 (140520) 00:31 Traditional UNIX scheduler (how priorities self-adjust with clock ticks etc.)
- lect17 (140520) 00:48 (Slide 62) Formula for scheduling
- lect17 (140520) 1:48 Why exactly do we favour Rate Monotonic scheduling other than that it is easier to implement (why is it easier to implement?)
- lect18 (Slide 12) What is the per-CPU private memory used for exactly?
- Write 2 double sided A4 sheets of handwritten notes (23 June 2014)
- Polynomial Time Hierarchy
- Running-time bounds on f(n) space Turing machines
- Complexity Zoology
- A language is decideable iff it is Turing-recognizable and co-Turing-recognizable
- Rice's Theorem
- List of undecidable problems relating to CFL's etc
- Different types of reductions (Log-space transducers)
- Savitches Theorem
- Week 12 + 13 lectures (20 June 2014)
This checklist is largely inspired by Two Scoops of Django: Best Practices for Django 1.5.
- Install
virtualenvwrapper
-
mkvirtualenv <env_name>
-
workon <env_name>
-
pip install django-toolbelt
- Create new
git
repository (from Github https://github.com/new) so we can get the automatically generated.gitignore
,README.md
,LICENSE
, etc. (for the lazy) - Add
.DS_Store
and other junk to.gitignore
-
git clone <repo> <repo_root> && cd <repo_root>
-
django-admin.py startproject <dj_project> <dj_project_root>
-
First interviewer: Russian lady
Behavioural questions:
- Tell me about a challenging/interesting project you've done.
Technical questions:
- Given 2 triangles, write a function to determine if they intersect.
- Simpler variant: Given 2 rectangles, write a function to determine if they intersect.
OlderNewer