Skip to content

Instantly share code, notes, and snippets.

@amanahuja
amanahuja / keybase.md
Created October 5, 2020 17:11
keybase.md

Keybase proof

I hereby claim:

  • I am amanahuja on github.
  • I am amanqa (https://keybase.io/amanqa) on keybase.
  • I have a public key ASASllH9sUL7cRzrWMq-nIMWp7iil-P5Y3I_7ec4VcBSXwo

To claim this, I am signing this object:

@amanahuja
amanahuja / shoutbase_client.py
Last active June 9, 2018 22:40
Shoutbase API helper utils
# From Shoutbase team
# 2018 June 04
import requests
import time
import urllib
import csv
try:
# for Python 2.x
@amanahuja
amanahuja / gini_coefficient_metric.py
Created January 26, 2017 20:38
Calculation of gini coefficient metric
"""
Calculation of gini coefficient metric
via https://www.kaggle.com/c/ClaimPredictionChallenge/forums/t/703/code-to-calculate-normalizedgini?forumMessageId=5897#post5897
I'm not the author, thant would be Kaggle user Patrick
See http://www.rhinorisk.com/Publications/Gini%20Coefficients.pdf
"""
def gini(actual, pred, cmpcol = 0, sortcol = 1):
assert( len(actual) == len(pred) )
all = np.asarray(np.c_[ actual, pred, np.arange(len(actual)) ], dtype=np.float)
all = all[ np.lexsort((all[:,2], -1*all[:,1])) ]
@amanahuja
amanahuja / womens_stats_2015.ipynb
Created May 26, 2015 22:35
Women's stats #d1natties (temp)
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@amanahuja
amanahuja / plotting_categorical_variables.py
Created May 16, 2014 22:20
Plotting a Categorical Variable in matplotlib with pandas
"""
Plotting a categorical variable
----------------------------------
`df` is a pandas dataframe with a timeseries index.
`df` has a column `categorical` of dtype object, strings and nans, which is a categorical variable representing events
----------------------------------
>>> print df[:5]
categorical
@amanahuja
amanahuja / andrews_curve_column_order.py
Last active May 4, 2017 23:50
Andrews plots in pandas of Rdatasets with changed column order
import pandas as pd
import statsmodels.api as sm
#Change next two lines for dataset, such as in
#http://vincentarelbundock.github.io/Rdatasets/
data = sm.datasets.get_rdataset('airquality').data
class_column = 'Month'
fig, (ax1, ax2) = plt.subplots(nrows=2, ncols=1, sharex=True)
@amanahuja
amanahuja / cancer_data_expore.ipynb
Created September 2, 2013 22:28
Age-adjusted Urinary Bladder cancer occurrence, by state:
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@amanahuja
amanahuja / sklearn-MAPE.py
Last active October 1, 2020 12:17
Mean Absolute Percentage Error (MAPE) metric for python sklearn. Written in response to a question on Cross Validated: http://stats.stackexchange.com/questions/58391/mean-absolute-percentage-error-mape-in-scikit-learn/62511#62511
from sklearn.utils import check_arrays
def mean_absolute_percentage_error(y_true, y_pred):
"""
Use of this metric is not recommended; for illustration only.
See other regression metrics on sklearn docs:
http://scikit-learn.org/stable/modules/classes.html#regression-metrics
Use like any other metric
>>> y_true = [3, -0.5, 2, 7]; y_pred = [2.5, -0.3, 2, 8]
@amanahuja
amanahuja / news_01.py
Created September 30, 2012 19:49
Fetch news items and parse
import feedparser
import nltk
from collections import defaultdict
#Some userful parameters
nitemstoparse = 5
new_words = []
feedurls = [
'http://www.nytimes.com/services/xml/rss/nyt/GlobalHome.xml',
@amanahuja
amanahuja / load-clean.py
Created May 27, 2012 02:39
Load and prepare data on Consumer Electronics sales and corresponding Google Search queries
# -*- coding: utf-8 -*-
"""
Created on Thu May 22 20:30:36 2012
http://www.meetup.com/r-enthusiasts/events/65306492/
Mirroring the work that we do in Python.
This is the code to import the sales and query data into a Py-Pandas
dataframe (with conversion to time series).
Author (twitter): @amanqa