David Yerrington dyerrington

## dsi_student_install_guide.md

      
              1 file
            
          
              2 forks
            
          
              0 comments
            
          
              1 star
            
          
                dyerrington
                / dsi_student_install_guide.md
            
            
              Last active
              December 31, 2019 00:12
            
          
     Data Science Immersive "Installfest"


DSI Computer Setup
Anaconda + Python Configuration
Additional Software


DSI Computer Setup

Welcome to GA's Data Science Immersive! Before you start class, you'll need to download and install a few tools. Follow this guide to get your computer all set up, and let us know if you have any questions.

  
## test_ttest_power_diff.py
from statsmodels.stats.power import  tt_ind_solve_power
from scipy.interpolate import interp1d
import matplotlib.pyplot as plt

def test_ttest_power_diff(mean, std, sample1_size=None, alpha=0.05, desired_power=0.8, mean_diff_percentages=[0.1, 0.05]):
    '''
    calculates the power function for a given mean and std. the function plots a graph showing the comparison between desired mean differences
    :param mean: the desired mean
    :param std: the std value
    :param sample1_size: if None, it is assumed that both samples (first and second) will have same size. The function then will

## polar_plot.py
from math import pi
from mpl_toolkits.axes_grid.inset_locator import inset_axes

# Set data
df = pd.DataFrame({
    # 'group': ['A','B','C','D'],
    'var1': [38, 1.5, 30, 4],
    'var2': [29, 10, 9, 34],
    'var3': [8, 39, 23, 24],
    'var4': [7, 31, 33, 14]

## generate_udf_js_big_query.py
# fighting == most common event type

def build_udf_prototype(event_types):

    null = "null" # default all types to null in the UDF function
    PIVOT_FEATURES = str({"col_" + event_name.replace("-", "_"): null for event_name in event_types.tolist()}).replace("'null'", "null")
    SQL_RETURN = "STRUCT<"
    for event_type in event_types.tolist():
        event_type = event_type.replace("-", "_")
        SQL_RETURN += f"col_{event_type} INT64, "

## sf_slicing_apply_map.ipynb

      
              1 file
            
          
              3 forks
            
          
              0 comments
            
          
              0 stars
            
          
                dyerrington
                / sf_slicing_apply_map.ipynb
            
            
              Created
              March 9, 2019 01:04
            
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## readme.md

      
              2 files
            
          
              2 forks
            
          
              0 comments
            
          
              1 star
            
          
                dyerrington
                / readme.md
            
            
              Last active
              January 9, 2019 20:59
            
              
                This is a very basic data generator to test recommender systems. A future version may simulate the actual sparseness of ratings data with a simple bootstrap function but for now, numpy generator does the job.
              
          
    RecData


To use this snippet, install faker:
pip install faker


## parse_jupyter.md

      
              2 files
            
          
              1 fork
            
          
              0 comments
            
          
              1 star
            
          
                dyerrington
                / parse_jupyter.md
            
            
              Created
              October 16, 2018 19:55
            
          
    Parse Jupyter

This is a basic class that makes it convenient to parse notebooks.  I built a larger version of this that was used for clustering documents to create symantic indeices that linked related content together for a personal project.  You can use this to parse notebooks for doing things like NLP or preprocessing.
Usage

parser = ParseJupyter("./Untitled.ipynb")
parser.get_cells(source_only = True, source_as_string = True)


## machine_learning_flashcards.py
import tweepy
import wget
import os

oauth = {
    "consumer_key":        "",
    "consumer_secret":     ""
}

access = {

## sf_review.ipynb

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                dyerrington
                / sf_review.ipynb
            
            
              Created
              September 17, 2018 01:36
            
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## my_little_pony_lstm.py
'''Example script to generate text from Nietzsche's writings.
At least 20 epochs are required before the generated text
starts sounding coherent.
It is recommended to run this script on GPU, as recurrent
networks are quite computationally intensive.
If you try this script on new data, make sure your corpus
has at least ~100k characters. ~1M is better.
'''

from __future__ import print_function
	from statsmodels.stats.power import tt_ind_solve_power
	from scipy.interpolate import interp1d
	import matplotlib.pyplot as plt

	def test_ttest_power_diff(mean, std, sample1_size=None, alpha=0.05, desired_power=0.8, mean_diff_percentages=[0.1, 0.05]):
	'''
	calculates the power function for a given mean and std. the function plots a graph showing the comparison between desired mean differences
	:param mean: the desired mean
	:param std: the std value
	:param sample1_size: if None, it is assumed that both samples (first and second) will have same size. The function then will
	from math import pi
	from mpl_toolkits.axes_grid.inset_locator import inset_axes

	# Set data
	df = pd.DataFrame({
	# 'group': ['A','B','C','D'],
	'var1': [38, 1.5, 30, 4],
	'var2': [29, 10, 9, 34],
	'var3': [8, 39, 23, 24],
	'var4': [7, 31, 33, 14]
	# fighting == most common event type

	def build_udf_prototype(event_types):

	null = "null" # default all types to null in the UDF function
	PIVOT_FEATURES = str({"col_" + event_name.replace("-", "_"): null for event_name in event_types.tolist()}).replace("'null'", "null")
	SQL_RETURN = "STRUCT<"
	for event_type in event_types.tolist():
	event_type = event_type.replace("-", "_")
	SQL_RETURN += f"col_{event_type} INT64, "
	import tweepy
	import wget
	import os

	oauth = {
	"consumer_key": "",
	"consumer_secret": ""
	}

	access = {
	'''Example script to generate text from Nietzsche's writings.
	At least 20 epochs are required before the generated text
	starts sounding coherent.
	It is recommended to run this script on GPU, as recurrent
	networks are quite computationally intensive.
	If you try this script on new data, make sure your corpus
	has at least ~100k characters. ~1M is better.
	'''

	from __future__ import print_function