Instantly share code, notes, and snippets. fabianp/partial_corr.py Last active Sep 19, 2019

Partial Correlation in Python (clone of Matlab's partialcorr)
 """ Partial Correlation in Python (clone of Matlab's partialcorr) This uses the linear regression approach to compute the partial correlation (might be slow for a huge number of variables). The algorithm is detailed here: http://en.wikipedia.org/wiki/Partial_correlation#Using_linear_regression Taking X and Y two variables of interest and Z the matrix with all the variable minus {X, Y}, the algorithm can be summarized as 1) perform a normal linear least-squares regression with X as the target and Z as the predictor 2) calculate the residuals in Step #1 3) perform a normal linear least-squares regression with Y as the target and Z as the predictor 4) calculate the residuals in Step #3 5) calculate the correlation coefficient between the residuals from Steps #2 and #4; The result is the partial correlation between X and Y while controlling for the effect of Z Date: Nov 2014 Author: Fabian Pedregosa-Izquierdo, f@bianp.net Testing: Valentina Borghesani, valentinaborghesani@gmail.com """ import numpy as np from scipy import stats, linalg def partial_corr(C): """ Returns the sample linear partial correlation coefficients between pairs of variables in C, controlling for the remaining variables in C. Parameters ---------- C : array-like, shape (n, p) Array with the different variables. Each column of C is taken as a variable Returns ------- P : array-like, shape (p, p) P[i, j] contains the partial correlation of C[:, i] and C[:, j] controlling for the remaining variables in C. """ C = np.asarray(C) p = C.shape P_corr = np.zeros((p, p), dtype=np.float) for i in range(p): P_corr[i, i] = 1 for j in range(i+1, p): idx = np.ones(p, dtype=np.bool) idx[i] = False idx[j] = False beta_i = linalg.lstsq(C[:, idx], C[:, j]) beta_j = linalg.lstsq(C[:, idx], C[:, i]) res_j = C[:, j] - C[:, idx].dot( beta_i) res_i = C[:, i] - C[:, idx].dot(beta_j) corr = stats.pearsonr(res_i, res_j) P_corr[i, j] = corr P_corr[j, i] = corr return P_corr

jwcarr commented Aug 22, 2015

 Thanks for posting this! I couldn't find any other Python implementations of the partial correlation. However, I'm a little confused about the results I'm getting. Should this function be equivalent to ppcor in R? (see here: https://cran.r-project.org/web/packages/ppcor/index.html) Given this data: >>> data = np.array([[ 0.81214374, 0.75793651, 0. ], [ 1.8964884 , 0.75880456, 1. ], [ 4.44113951, 0.73611111, 2. ], [ 6.07757503, 0.68043155, 3. ], [ 3.4540138 , 0.57291667, 4. ], [ 9.06033523, 0.4609375 , 5. ], [ 4.52278321, 0.546875 , 6. ], [ 3.96472532, 0.47842262, 7. ], [ 3.23761437, 0.40190972, 8. ], [ 2.94132016, 0.46961806, 9. ], [ 0.81405421, 0.77777778, 0. ], [ 2.59633893, 0.73177083, 1. ], [ 8.74133555, 0.640625 , 2. ], [ 4.91243565, 0.60243056, 3. ], [ 6.52566722, 0.61631944, 4. ], [ 9.14605222, 0.53888889, 5. ], [ 8.35901455, 0.49878472, 6. ], [ 10.51921626, 0.5 , 7. ], [ 11.56800923, 0.45486111, 8. ], [ 7.5933584 , 0.55902778, 9. ], [ 0.81296971, 0.85329861, 0. ], [ -1.26243957, 0.88020833, 1. ], [ 6.27671297, 0.68402778, 2. ], [ 13.72624297, 0.63541667, 3. ], [ 3.90786197, 0.63020833, 4. ], [ 1.79030896, 0.65674603, 5. ], [ 3.51589642, 0.52876984, 6. ], [ 5.1629781 , 0.60069444, 7. ], [ 9.10774469, 0.52256944, 8. ], [ 4.66882455, 0.56339286, 9. ], [ 0.81752733, 0.80381944, 0. ], [ 1.20513141, 0.69201389, 1. ], [ 3.18574928, 0.59002149, 2. ], [ 7.47468745, 0.57589211, 3. ], [ 10.93493755, 0.44583333, 4. ], [ 9.39221578, 0.4549979 , 5. ], [ 11.51554589, 0.5292298 , 6. ], [ 9.0335466 , 0.3671875 , 7. ], [ 13.11844822, 0.30989583, 8. ], [ 9.00946829, 0.42534722, 9. ]]) I get these results: >>> partial_corr(data) array([[ 1. , 0.25363885, 0.67638027], [ 0.25363885, 1. , 0.17226437], [ 0.67638027, 0.17226437, 1. ]]) However, using ppcor in R, I get these results for the same data: > pcor(data) \$estimate [,1] [,2] [,3] [1,] 1.0000000 -0.5434100 -0.1407695 [2,] -0.5434100 1.0000000 -0.7620759 [3,] -0.1407695 -0.7620759 1.0000000 Any ideas? (It's quite possible I'm doing something pretty stupid!) Thanks!

snarkmaster commented Mar 6, 2016

 @jwcarr -- I think you might need to add a column of 1s to include an intercept in the model, via np.column_stack([data, np.ones(data.shape)]) or similar. I'm not familiar with Matlab or R, but given your example, I would guess that R includes an intercept by default, and Matlab does not.

ghost commented May 4, 2016

 Sometimes R and Matlab libraries standardize your data by default. For this problem, adding the intercept column and standardizing the data have the same effect (due to removing the mean out of the data). Regular: >partial_corr(data) array([[ 1. , 0.25363885, 0.67638027], [ 0.25363885, 1. , 0.17226437], [ 0.67638027, 0.17226437, 1. ]]) Intercept: data_int = np.hstack((np.ones((data.shape,1)), data)) partial_corr(data_int)[1:, 1:] array([[ 1. , -0.54341003, -0.14076948], [-0.54341003, 1. , -0.76207595], [-0.14076948, -0.76207595, 1. ]]) Standardized: data_std = data.copy() data_std -= data.mean(axis=0)[np.newaxis, :] data_std /= data.std(axis=0)[np.newaxis, :] partial_corr(data_std) array([[ 1. , -0.54341003, -0.14076948], [-0.54341003, 1. , -0.76207595], [-0.14076948, -0.76207595, 1. ]])

mwetzel7r commented Mar 29, 2018

 maybe they should add this to the scipy library

rubenSaro commented Mar 2, 2019

 there is a mistake in line 61 and 62, the subscripts of the betas are incorrect
to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.