Welcome to GA's Data Science Immersive! Before you start class, you'll need to download and install a few tools. Follow this guide to get your computer all set up, and let us know if you have any questions.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from statsmodels.stats.power import tt_ind_solve_power | |
from scipy.interpolate import interp1d | |
import matplotlib.pyplot as plt | |
def test_ttest_power_diff(mean, std, sample1_size=None, alpha=0.05, desired_power=0.8, mean_diff_percentages=[0.1, 0.05]): | |
''' | |
calculates the power function for a given mean and std. the function plots a graph showing the comparison between desired mean differences | |
:param mean: the desired mean | |
:param std: the std value | |
:param sample1_size: if None, it is assumed that both samples (first and second) will have same size. The function then will |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from math import pi | |
from mpl_toolkits.axes_grid.inset_locator import inset_axes | |
# Set data | |
df = pd.DataFrame({ | |
# 'group': ['A','B','C','D'], | |
'var1': [38, 1.5, 30, 4], | |
'var2': [29, 10, 9, 34], | |
'var3': [8, 39, 23, 24], | |
'var4': [7, 31, 33, 14] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# fighting == most common event type | |
def build_udf_prototype(event_types): | |
null = "null" # default all types to null in the UDF function | |
PIVOT_FEATURES = str({"col_" + event_name.replace("-", "_"): null for event_name in event_types.tolist()}).replace("'null'", "null") | |
SQL_RETURN = "STRUCT<" | |
for event_type in event_types.tolist(): | |
event_type = event_type.replace("-", "_") | |
SQL_RETURN += f"col_{event_type} INT64, " |
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This is a basic class that makes it convenient to parse notebooks. I built a larger version of this that was used for clustering documents to create symantic indeices that linked related content together for a personal project. You can use this to parse notebooks for doing things like NLP or preprocessing.
parser = ParseJupyter("./Untitled.ipynb")
parser.get_cells(source_only = True, source_as_string = True)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import tweepy | |
import wget | |
import os | |
oauth = { | |
"consumer_key": "", | |
"consumer_secret": "" | |
} | |
access = { |
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
'''Example script to generate text from Nietzsche's writings. | |
At least 20 epochs are required before the generated text | |
starts sounding coherent. | |
It is recommended to run this script on GPU, as recurrent | |
networks are quite computationally intensive. | |
If you try this script on new data, make sure your corpus | |
has at least ~100k characters. ~1M is better. | |
''' | |
from __future__ import print_function |