Skip to content

Instantly share code, notes, and snippets.

@ssrosa
ssrosa / understanding-word-vectors.ipynb
Created May 14, 2019 15:17 — forked from aparrish/understanding-word-vectors.ipynb
Understanding word vectors: A tutorial for "Reading and Writing Electronic Text," a class I teach at ITP. (Python 2.7) Code examples released under CC0 https://creativecommons.org/choose/zero/, other text released under CC BY 4.0 https://creativecommons.org/licenses/by/4.0/
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@ssrosa
ssrosa / carrot_data.py
Last active September 19, 2019 22:07
For a Medium post on correlation in Python
import pandas as pd
#Toy data: brightness of color of each carrot from Fancy Farms.
#(Ideal brightness is at the mean.)
brightness = (carrot_lengths / (5 + np.random.random(1000)))
#Extra toy data from the farm: dampness of the soil from which the
#carrot was pulled. 50 is normal dampness; 100 is sodden; 0 is dry.
dampness = np.random.normal(loc = 50, scale = 25, size = n)
@ssrosa
ssrosa / random_variable.py
Last active September 19, 2019 21:50
For a Medium post on correlation in Python
#Mean: carrot length in cm
mu = 20
#Size of population: number of carrots in market
n = 1000
# Random variable, normally-distributed: carrot length
import numpy as np
carrot_lengths = np.random.normal(loc = mu, size = n)
@ssrosa
ssrosa / find_std.py
Last active September 19, 2019 21:50
For a Medium post on correlation in Python
#Function to calculate standard deviation
def find_std(X):
mu = X.mean()
n = len(X)
sigma = np.sqrt(
np.sum(
(X - mu)**2
) / n
@ssrosa
ssrosa / find_var.py
Last active September 19, 2019 21:50
For a Medium post on correlation in Python
#Function to calculate variance
def find_var(X):
mu = X.mean()
n = len(X)
variance = np.sum(
(X - mu)**2
) / n
@ssrosa
ssrosa / three_dists.py
Created September 19, 2019 21:47
For a Medium post on correlation in Python
#Another plotting library, a little fancier than matplotlib
import seaborn as sns
#Another distribution of carrot lengths
#(The 'scale' is the standard deviation parameter)
virginias_vegetables = np.random.normal(loc = mu, scale = 3, size = n)
#A third distribution of carrot lengths
raouls_roots = np.random.normal(loc = mu, scale = 5, size = n)
@ssrosa
ssrosa / find_cov.py
Created September 19, 2019 21:48
For a Medium post on correlation in Python
def find_cov(X, Y):
#Make sure both distributions have the same population size
assert len(X) == len(Y), 'Distributions have different sizes.'
muX = X.mean()
muY = Y.mean()
n = len(X)
covariance = np.sum(
@ssrosa
ssrosa / pos_cov.py
Created September 19, 2019 21:48
For a Medium post on correlation in Python
#Draw a plot to show positive covariance
#Toy data: 1000 random numbers between 1 and 50
X = np.random.randint(1, 50, 1000)
#1000 slightly different numbers
Y = X + (np.random.randint(-15, 15, 1000) * np.random.random(1000))
plt.scatter(X, Y)
plt.title('Positive covariance')
@ssrosa
ssrosa / neg_cov.py
Created September 19, 2019 21:49
For a Medium post on correlation in Python
#Draw a plot to show negative covariance
#Toy data: numbers 1 through 1000
X2 = np.arange(1, 1001, 1)
#Similar numbers, going the other way
Y2 = np.flip(X2) + (np.random.randint(-200, 200, 1000) * np.random.random(1000))
plt.scatter(X2, Y2)
plt.title('Negative covariance')
@ssrosa
ssrosa / find_corr.py
Created September 19, 2019 21:49
For a Medium post on correlation in Python
def find_corr(X, Y):
correlation = find_cov(X, Y) / (X.std() * Y.std())
return correlation