Skip to content

Instantly share code, notes, and snippets.

View micaleel's full-sized avatar
:octocat:
I may be slow to respond.

Khalil micaleel

:octocat:
I may be slow to respond.
View GitHub Profile
@micaleel
micaleel / pmf
Created November 13, 2018 17:12 — forked from groverpr/pmf
pmf neural net fastai
ratings = pd.read_csv('ratings_small.csv') # loading data from csv
"""
ratings_small.csv has 4 columns - userId, movieId, ratings, and timestammp
it is most generic data format for CF related data
"""
val_indx = get_cv_idxs(len(ratings)) # index for validation set
wd = 2e-4 # weight decay
n_factors = 50 # n_factors - dimension of embedding matrix (D)
@micaleel
micaleel / request_handler_test.py
Created April 28, 2017 16:24 — forked from didip/request_handler_test.py
Testing Tornado RequestHandlers
import unittest, os, os.path, sys, urllib
import tornado.database
import tornado.options
from tornado.options import options
from tornado.testing import AsyncHTTPTestCase
# add application root to sys.path
APP_ROOT = os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))
sys.path.append(os.path.join(APP_ROOT, '..'))
@micaleel
micaleel / gist:da3c3df40ea0e374a20f6d897d4a8c1c
Created February 27, 2017 16:51 — forked from entaroadun/gist:1653794
Recommendation and Ratings Public Data Sets For Machine Learning

Movies Recommendation:

Music Recommendation:

@micaleel
micaleel / rank_metrics.py
Created October 11, 2016 15:23 — forked from bwhite/rank_metrics.py
Ranking Metrics
"""Information Retrieval metrics
Useful Resources:
http://www.cs.utexas.edu/~mooney/ir-course/slides/Evaluation.ppt
http://www.nii.ac.jp/TechReports/05-014E.pdf
http://www.stanford.edu/class/cs276/handouts/EvaluationNew-handout-6-per.pdf
http://hal.archives-ouvertes.fr/docs/00/72/67/60/PDF/07-busa-fekete.pdf
Learning to Rank for Information Retrieval (Tie-Yan Liu)
"""
import numpy as np
@micaleel
micaleel / elasticsearch-cheatsheet.txt
Created August 28, 2016 00:26 — forked from stephen-puiszis/elasticsearch-cheatsheet.txt
Elasticsearch Cheatsheet - An Overview of Commonly Used Elasticsearch API Endpoints and What They Do
# Elasticsearch Cheatsheet - an overview of commonly used Elasticsearch API commands
# cat paths
/_cat/allocation
/_cat/shards
/_cat/shards/{index}
/_cat/master
/_cat/nodes
/_cat/indices
/_cat/indices/{index}

A Few Useful Things to Know about Machine Learning

The paper presents some key lessons and "folk wisdom" that machine learning researchers and practitioners have learnt from experience and which are hard to find in textbooks.

1. Learning = Representation + Evaluation + Optimization

All machine learning algorithms have three components:

  • Representation for a learner is the set if classifiers/functions that can be possibly learnt. This set is called hypothesis space. If a function is not in hypothesis space, it can not be learnt.
  • Evaluation function tells how good the machine learning model is.
  • Optimisation is the method to search for the most optimal learning model.
@micaleel
micaleel / customer-segmentation.py
Created June 16, 2016 14:51 — forked from glamp/customer-segmentation.py
Analysis for customer segmentation blog post
import pandas as pd
# http://blog.yhathq.com/static/misc/data/WineKMC.xlsx
df_offers = pd.read_excel("./WineKMC.xlsx", sheetname=0)
df_offers.columns = ["offer_id", "campaign", "varietal", "min_qty", "discount", "origin", "past_peak"]
df_offers.head()
df_transactions = pd.read_excel("./WineKMC.xlsx", sheetname=1)
df_transactions.columns = ["customer_name", "offer_id"]
df_transactions['n'] = 1
df_transactions.head()
@micaleel
micaleel / nbstripout
Created April 28, 2016 12:48 — forked from minrk/nbstripout
git pre-commit hook for stripping output from IPython notebooks
#!/usr/bin/env python
"""strip outputs from an IPython Notebook
Opens a notebook, strips its output, and writes the outputless version to the original file.
Useful mainly as a git filter or pre-commit hook for users who don't want to track output in VCS.
This does mostly the same thing as the `Clear All Output` command in the notebook UI.
LICENSE: Public Domain
@micaleel
micaleel / .zshrc
Created April 11, 2016 11:34 — forked from zanshin/.zshrc
My .zshrc file
# Path to your oh-my-zsh configuration.
export ZSH=$HOME/.oh-my-zsh
# Set name of the theme to load.
# Look in ~/.oh-my-zsh/themes/
# Optionally, if you set this to "random", it'll load a random theme each
# time that oh-my-zsh is loaded.
#export ZSH_THEME="robbyrussell"
export ZSH_THEME="zanshin"
@micaleel
micaleel / matplotlib_barplot.md
Created January 25, 2016 16:33 — forked from ctokheim/matplotlib_barplot.md
Matplotlib: Stacked and Grouped Bar Plot

Stacked and Grouped Bar Plot

Oddly enough ggplot2 has no support for a stacked and grouped (position="dodge") bar plot. The seaborn python package, although excellent, also does not provide an alternative. However, I knew it was surely possible to make such a plot in regular matplotlib. Matplotlib, although sometimes clunky, gives you enough flexibility to precisely place plotting elements which is needed for a stacked and grouped bar plot.

Below is a working example of making a stacked and grouped bar plot.

import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np