Skip to content

Instantly share code, notes, and snippets.

@seanickle
seanickle / quick pandas dataframe scrub code.md
Last active December 4, 2017 18:58
quick pandas dataframe scrub code

given a dataframe, scrub cols

  • two options:
    • either fill col with random data
    • or replace w random data, but by hashing the values instead so that repete are consistent
import uuid
import pandas as pd

def make_random_phone_dict(phones_list):
@seanickle
seanickle / quick nltk tools.md
Last active August 24, 2017 17:06
quick nltk tools

Basic string matching tools

import nltk
from copy import deepcopy

def best_match_terms(from_pair, end_list, cutoff=10, include_distances=False):
    '''
    from_pair: [from_identifier, from_term] 
    '''
    [from_identifier, from_term] = from_pair
@seanickle
seanickle / ab testing.md
Last active August 13, 2017 23:17
A/B testing

1 tail vs 2 tail

how large does the sample size need to be for a test

  • well, it comes down to the requirement of the confidence level, that is, reaching statistical significance,
  • per [1] , this is phrased as, what rate of false positive are we willing to accept

Baseline conversion rate

@seanickle
seanickle / gist:add41f95571225a20af4f18d62847909
Created July 19, 2017 16:58
graphviz dot with images for nodes
```
digraph structs {
node [shape=plaintext];
struct1 [label=<<TABLE>
<TR><TD><IMG SRC="jpegs/Screen Shot 2017-07-19 at 11.08.56 AM.jpg"/></TD></TR>
<TR><TD>caption1 </TD></TR>
@seanickle
seanickle / useful image conversion.md
Created July 19, 2017 16:56
useful image conversion
@seanickle
seanickle / .vimrc
Last active October 26, 2020 17:53
my vimrc so far
set ignorecase
syntax on
set number
set softtabstop=4
set tabstop=4
set shiftwidth=4
set expandtab
@seanickle
seanickle / multiprocessing foo.py
Created June 21, 2017 23:11
multiprocessing foo
# from multiprocessing import Process
# from multiprocessing import Pool
from multiprocessing import Process, Pipe
import cPickle
def foo(multiproc, chunk_ids, feature_names, child_conn):
child_conn.send([chunk_ids, feature_names])
@seanickle
seanickle / migrations per tag.md
Created June 8, 2017 00:21
git and django migration foo
  • foo
last_n=15
# tags=($(git tag | egrep '^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$'))
tags=($(git tag --sort version:refname | egrep '^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$'))
begin=`expr ${#tags[@]} - ${last_n}`
subset_tags=(${tags[@]:${begin}})
# for (( i=`expr ${#subset_tags[@]} - 1`; i>=0 ; i-=1 )) ;   # reverse
for (( i=0; i&lt; `expr ${#subset_tags[@]} - 1` ; i+=1 )) ; # forwards
@seanickle
seanickle / what.py
Created June 2, 2017 20:07
numpy bool
* numpy bool
```python
ipdb> pp outdf
a b c d e
0 None a [] foo True
1 3.4 b [asdf] no False
2 34 0 [{}, 0] meh False
...
...
ipdb> pp type(outdf.ix[0,'e'])
@seanickle
seanickle / django queryset query notes.md
Created April 6, 2017 18:44
django queryset query notes
  • Can modify query search with __iregex for some regex based querying.
In [713]: BlahModel.objects.filter(phone__iregex=r"\d\d\d\d\d\d\d\d\d\d\d\d").val
     ...: ues_list('phone')
Out[713]: [(u'51615415135135',), (u'3424234234234',), (u'0971557335940',), (u' 923006306666',), (u' 923006306666',), (u' 923006306666',), (u'491725152799',), (u'4921195799170',), (u'527821398854',), (u'860290084656',)]