Skip to content

Instantly share code, notes, and snippets.

@bukson
Created February 4, 2020 12:45
Show Gist options
  • Save bukson/bfbdc9f70dcbc439826c5dc30148cc47 to your computer and use it in GitHub Desktop.
Save bukson/bfbdc9f70dcbc439826c5dc30148cc47 to your computer and use it in GitHub Desktop.
Calculating correlation for data with NaN using multiple cores
import pandas as pd
import numpy as np
from nancorrmp.nancorrmp import NaNCorrMp
dataset = pd.DataFrame([
[2, 3000, 0, 0],
[10, 0, 100, float('inf')],
[1, 0, -1150, float('-inf')],
[1, 0, 0, float('NaN')],
[4, 4000, -800, -0.2]
])
NaNCorrMp.calculate(dataset, n_jobs=4)
# >>>
# 0 1 2 3
# 0 1.000000 -0.108525 0.385631 -1.0
# 1 -0.108525 1.000000 -0.137864 -1.0
# 2 0.385631 -0.137864 1.000000 1.0
# 3 -1.000000 -1.000000 1.000000 1.0
dataset.corr()
# >>>
# 0 1 2 3
# 0 1.000000 -0.108525 0.385631 -1.0
# 1 -0.108525 1.000000 -0.137864 -1.0
# 2 0.385631 -0.137864 1.000000 1.0
# 3 -1.000000 -1.000000 1.000000 1.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment