Skip to content

Instantly share code, notes, and snippets.

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@mjbommar
mjbommar / is_ci_token_stopword_match.py
Last active August 29, 2015 14:02
Fuzzy sentence matching in Python - Bommarito Consulting, LLC: http://bommaritollc.com/2014/06/fuzzy-match-sentences-in-python
# ## IPython Notebook for [Bommarito Consulting](http://bommaritollc.com/) Blog Post
# ### **Link**: [Fuzzy sentence matching in Python](http://bommaritollc.com/2014/06/fuzzy-match-sentences-in-python): http://bommaritollc.com/2014/06/fuzzy-match-sentences-in-python
# **Author**: [Michael J. Bommarito II](https://www.linkedin.com/in/bommarito/)
# Imports
import nltk.corpus
import nltk.tokenize.punkt
import string
# Get default English stopwords and extend with punctuation
@mjbommar
mjbommar / is_ci_token_stopword_stem_match.py
Last active August 29, 2015 14:02
Fuzzy sentence matching in Python - Bommarito Consulting, LLC: http://bommaritollc.com/2014/06/fuzzy-match-sentences-in-python
# ## IPython Notebook for [Bommarito Consulting](http://bommaritollc.com/) Blog Post
# ### **Link**: [Fuzzy sentence matching in Python](http://bommaritollc.com/2014/06/fuzzy-match-sentences-in-python): http://bommaritollc.com/2014/06/fuzzy-match-sentences-in-python
# **Author**: [Michael J. Bommarito II](https://www.linkedin.com/in/bommarito/)
# Imports
import nltk.corpus
import nltk.tokenize.punkt
import nltk.stem.snowball
import string
@mjbommar
mjbommar / is_ci_token_stopword_lemma_match.py
Last active August 29, 2015 14:02
Fuzzy sentence matching in Python - Bommarito Consulting, LLC: http://bommaritollc.com/2014/06/fuzzy-match-sentences-in-python
# ## IPython Notebook for [Bommarito Consulting](http://bommaritollc.com/) Blog Post
# ### **Link**: [Fuzzy sentence matching in Python](http://bommaritollc.com/2014/06/fuzzy-match-sentences-in-python): http://bommaritollc.com/2014/06/fuzzy-match-sentences-in-python
# **Author**: [Michael J. Bommarito II](https://www.linkedin.com/in/bommarito/)
# Imports
import nltk.corpus
import nltk.tokenize.punkt
import nltk.stem.snowball
from nltk.corpus import wordnet
import string
# Imports
import nltk.corpus
import nltk.tokenize.punkt
import nltk.stem.snowball
import string
# Get default English stopwords and extend with punctuation
stopwords = nltk.corpus.stopwords.words('english')
stopwords.extend(string.punctuation)
stopwords.append('')
# Imports
import nltk.corpus
import nltk.tokenize.punkt
import nltk.stem.snowball
import string
# Get default English stopwords and extend with punctuation
stopwords = nltk.corpus.stopwords.words('english')
stopwords.extend(string.punctuation)
stopwords.append('')
@mjbommar
mjbommar / is_male_name.ipynb
Last active August 29, 2015 14:03
Sample is_male name probability estimates, conditioned by name, name/year, name/state, and name/year/state
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@mjbommar
mjbommar / kahan_response_20150103.ipynb
Created January 3, 2015 15:26
SCDB null model predictions
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@mjbommar
mjbommar / numpy_column_normalize.ipynb
Created January 25, 2015 13:33
Simple column-sum normalization using vectorized numpy methods
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@mjbommar
mjbommar / isotonic_test_case_20150129.json
Last active August 29, 2015 14:14
Test case for Isotonic Regression regression in fit vs. fit_transform
{"nbformat_minor": 0, "cells": [{"execution_count": 1, "cell_type": "code", "source": "# Imports\nimport matplotlib.pyplot as plt\nimport numpy\nimport pandas\nimport scipy\nimport sklearn\nimport sklearn.isotonic\nimport sys", "outputs": [], "metadata": {"collapsed": true, "trusted": true}}, {"source": "## Version strings", "cell_type": "markdown", "metadata": {}}, {"execution_count": 2, "cell_type": "code", "source": "print(sys.version)\nprint(sklearn.__version__)", "outputs": [{"output_type": "stream", "name": "stdout", "text": "2.7.3 (default, Mar 13 2014, 11:03:55) \n[GCC 4.7.2]\n0.16.dev\n"}], "metadata": {"collapsed": false, "trusted": true}}, {"source": "## Generate samples with and without ties", "cell_type": "markdown", "metadata": {}}, {"execution_count": 3, "cell_type": "code", "source": "# Sample with x ties\ndata_with_ties = pandas.DataFrame()\ndata_with_ties[\"feature\"] = [0, 0, 1, 2, 3]\ndata_with_ties[\"target\"] = [0.1, 0.05, 0.15, 0.2, 0.35]\n\n# Sample without x ties\ndata_without_ties =