Skip to content

Instantly share code, notes, and snippets.

View usmcamp0811's full-sized avatar

Matt usmcamp0811

View GitHub Profile
@usmcamp0811
usmcamp0811 / BinDataFunc.py
Last active February 3, 2017 17:41
Function to quickly bin data in a pd df
import pandas as pd
import numpy as np
def bin_data(dataframe, field, num_bins):
bins = np.linspace(df[field].min(), df[field].max(), num_bins)
dataframe[field+'_Bins'] = pd.cut(dataframe[field], bins)
return dataframe
if __name__ == "__main__":
df = pd.DataFrame(np.random.uniform(0, 100, size=(100, 3)))
@usmcamp0811
usmcamp0811 / IQR_OutliersFunc.py
Created February 3, 2017 17:32
A Function to check if a field has any outliers using IQR
import pandas as pd
import numpy as np
def is_outlier(value, p25, p75):
"""Check if value is an outlier
"""
lower = p25 - 1.5 * (p75 - p25)
upper = p75 + 1.5 * (p75 - p25)
return value <= lower or value >= upper
@usmcamp0811
usmcamp0811 / ManyCSV_to_TFRecords.py
Last active May 23, 2019 01:51
Takes many CSVs and converts to a TFRecord file, then opens the TFRecords file and outputs training data
import tensorflow as tf
import pandas as pd
import numpy as np
from tqdm import tqdm
import os
def _int64_feature(value):
return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
@usmcamp0811
usmcamp0811 / TFRecord_Generator.py
Created February 8, 2017 16:42
TFRecord Generator and Reader
import tensorflow as tf
import pandas as pd
import numpy as np
import os
from tqdm import tqdm
from PIL import Image
from PIL import ImageFont
from PIL import ImageDraw
from PIL import Image, ImageOps
import PIL
import pandas as pd
import numpy as np
import time
import random
'''
This class will return a random time series based window of data. It intakes a dataframe
that is already pre sorted into the correct time pattern where index 0 is timestep 0
and index N is timestep N. The function window_maker will incrementally step through
the entire dataframe returning the next window of data. The function random_window_maker
@usmcamp0811
usmcamp0811 / text_class_encoder
Created February 25, 2017 17:34
This function takes a dataframe with N number of categorical variables and encodes them with the scikit label encoder. It will return the transformed dataframe and a dictionary with a label encoder for each text field so that an inverse transform can be done later.
def text_class_encoder(df):
dtypes = pd.DataFrame(df.dtypes)
text_cols = list(dtypes[dtypes.iloc[:,0] == 'object'].index)
label_encoder_dict = {}
for col in text_cols:
label_encoder_dict[col] = LabelEncoder()
label_encoder_dict[col].fit(df[col])
df[col] = label_encoder_dict[col].transform(df[col])
return df, label_encoder_dict
@usmcamp0811
usmcamp0811 / batch_maker.py
Last active March 6, 2017 22:02
A class object that takes in a dataframe or array and return batches as called. This will also return a onehot encoded target variable if provided
import numpy as np
class batcher(object):
def __init__(self, data, batch_size, target=None):
self.data = data
self.batch_size = batch_size
self.batch_n = 0
self.n_batches =int(data.shape[0]/batch_size)
self.target = target
if target != None:
@usmcamp0811
usmcamp0811 / mat2py.py
Created March 10, 2017 17:29
First... MATLAB sucks! Second... This is some code I found on Stackoverflow that converts .mat files to python dictionaries..
import scipy.io as spio
def loadmat(filename):
'''
this function should be called instead of direct spio.loadmat
as it cures the problem of not properly recovering python dictionaries
from mat files. It calls the function check keys to cure all entries
which are still mat-objects
'''
import numpy as np
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
from sklearn import datasets
from sklearn.mixture import GaussianMixture
from sklearn.model_selection import StratifiedKFold
#%matplotlib qt
from sklearn.cluster import KMeans
import numpy as np
from sklearn.manifold import TSNE
@usmcamp0811
usmcamp0811 / plot_latent_space
Created April 26, 2017 15:08
Simple plot for plotting the latent space of a VAE... could be used for anything because it has a TSNE function in there to reduce dimensionality to appropriate plotting size.
from sklearn.manifold import TSNE
from mpl_toolkits.mplot3d import Axes3D
%matplotlib qt
from IPython import display
import matplotlib.cm as cmx
import matplotlib.colors as colors
def get_cmap(N):
'''Returns a function that maps each index in 0, 1, ... N-1 to a distinct
RGB color.'''