Skip to content

Instantly share code, notes, and snippets.

View dyerrington's full-sized avatar
💭
I may be slow to respond.

David Yerrington dyerrington

💭
I may be slow to respond.
View GitHub Profile
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@dyerrington
dyerrington / my_little_pony_lstm.py
Created July 16, 2018 05:57
As a point of comparison with the default Nietzsche example from the Keras repo, this little experiment swaps out the dataset with forum comments from My Little Pony subreddit.
'''Example script to generate text from Nietzsche's writings.
At least 20 epochs are required before the generated text
starts sounding coherent.
It is recommended to run this script on GPU, as recurrent
networks are quite computationally intensive.
If you try this script on new data, make sure your corpus
has at least ~100k characters. ~1M is better.
'''
from __future__ import print_function
@dyerrington
dyerrington / custom_vectorizer.py
Created July 12, 2018 00:32
Remove "n-grams" first, before stopwords with this handy class that extends the functionality of scikit-learn's CountVectorizer. Substitute the class extension for other types of vectorizers such as TfIDF in the class definition at the top.
# defines a custom vectorizer class
class CustomVectorizer(CountVectorizer):
stop_grams = []
def __init__(self, stop_grams = [], **opts):
self.stop_grams = stop_grams
super().__init__(**opts)
def remove_ngrams(self, doc):
@dyerrington
dyerrington / readme.md
Last active January 9, 2019 20:59
This is a very basic data generator to test recommender systems. A future version may simulate the actual sparseness of ratings data with a simple bootstrap function but for now, numpy generator does the job.

RecData

To use this snippet, install faker:

pip install faker
@dyerrington
dyerrington / environment.yml
Last active April 19, 2018 01:52
This assumes you've created an environment called "dsi". To use this file, simply click on "raw" then download the contents to a new file on your local system called "environment".
name: dsi
channels:
- conda-forge
- defaults
dependencies:
- appnope=0.1.0=py36_0
- asn1crypto=0.22.0=py36_0
- attrs=17.2.0=py_1
- automat=0.6.0=py36_0
- backports=1.0=py36_1
@dyerrington
dyerrington / environment.yml
Created April 19, 2018 01:07
This assumes you've created an environment called "dsi", like this:
name: dsi
channels:
- conda-forge
- defaults
dependencies:
- asn1crypto=0.22.0=py36_0
- beautifulsoup4=4.5.3=py36_0
- blas=1.1=openblas
- bleach=2.0.0=py36_0
- bokeh=0.12.9=py36_0
@dyerrington
dyerrington / environment.yml
Created April 19, 2018 01:07
This assumes you've created an environment called "dsi", like this:
name: dsi
channels:
- conda-forge
- defaults
dependencies:
- asn1crypto=0.22.0=py36_0
- beautifulsoup4=4.5.3=py36_0
- blas=1.1=openblas
- bleach=2.0.0=py36_0
- bokeh=0.12.9=py36_0
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
from ipywidgets import interact, interactive, fixed, interact_manual
from IPython.display import clear_output
import ipywidgets as widgets
g1_mean = widgets.IntSlider(description="G1 Mean", min=50,max=250,step=1,value=50)
g1_std = widgets.IntSlider(description="G1 STD", min=1,max=50,step=1,value=3)
g1_sample_size = widgets.IntSlider(description="G1 Size", min=50,max=500,step=10,value=10)
g1_items = [g1_mean, g1_std, g1_sample_size]
g2_mean = widgets.IntSlider(description="G2 Mean", min=50,max=250,step=1,value=60)