Skip to content

Instantly share code, notes, and snippets.

@dyerrington
Created February 6, 2020 23:41
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dyerrington/6dca43eefa7a08fa83ae6f51f2677577 to your computer and use it in GitHub Desktop.
Save dyerrington/6dca43eefa7a08fa83ae6f51f2677577 to your computer and use it in GitHub Desktop.
Code pearson correlation coefficient from scratch
import pandas as pd
import math
lines = [line.split() for line in input.split("\n") if len(line)]
X, y = [[int(score) for score in scores] for index, (variable, _, *scores) in enumerate(lines)]
n = len(X)
sum_X, sum_y = sum(X), sum(y)
sum_Xy = sum([X[i] * y[i] for i in range(len(X))])
sum_X2 = sum([X[i] ** 2 for i in range(len(X))])
sum_y2 = sum([y[i] ** 2 for i in range(len(y))])
# use formula for calculating correlation
# coefficient.
corr = (n * sum_Xy - sum_X * sum_y) /\
math.sqrt(
(n * sum_X2 - sum_X * sum_X) *
(n * sum_y2 - sum_y * sum_y)
)
round(corr, 3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment