# Calculate Pearson Correlation Confidence Interval in Python

import numpy as np
from scipy import stats


Recently, many studies have been arguing that we should report effect sizes along with confidence intervals, as opposed to simply reporting p values (e.g., see this paper).

In Python, however, there is no functions to directly obtain confidence intervals (CIs) of Pearson correlations. I therefore decided to do a quick ssearch and come up with a wrapper function to produce the correlation coefficients, p values, and CIs based on scipy.stats and numpy.

There are many tutorials on the detailed steps and I mainly followed this one.

### Detailed steps

Letâ€™s use a random dataset for an example.

> x = np.random.randint(1, 10, 10)
> y = np.random.randint(1, 10, 10)
> x
array([6, 4, 3, 3, 2, 5, 8, 2, 6, 1])
> y
array([3, 3, 9, 4, 9, 4, 6, 9, 7, 9])


The first step involves transformation of the correlation coefficient into a Fishersâ€™ Z-score.

> r, p = stats.pearsonr(x,y)
> r,p
(-0.5356559002279192, 0.11053303487716389)
> r_z = np.arctanh(r)
> r_z
-0.5980434968020534


The corresponding standard deviation is $se = \dfrac{1}{\sqrt{N-3}}$:

> se = 1/np.sqrt(x.size-3)
> se
0.3779644730092272


CI under the transformation can be calculated as $r_z \pm z_{\alpha/2}\times se$, where $z_{\alpha/2}$ is can be calculated using scipy.stats.norm.ppf function:

> alpha = 0.05
> z = stats.norm.ppf(1-alpha/2)
> lo_z, hi_z = r_z-z*se, r_z+z*se
> lo_z, hi_z
(-1.3388402513358, 0.14275325773169323)


Finally, we can reverse the transformation by np.tanh:

> lo, hi = np.tanh((lo_z, hi_z))
> lo, hi
array([-0.87139341,  0.1417914 ])


We can validate this in R:

> x=c(6, 4, 3, 3, 2, 5, 8, 2, 6, 1)
> y=c(3, 3, 9, 4, 9, 4, 6, 9, 7, 9)
> cor.test(x,y)

Pearson's product-moment correlation

data:  x and y
t = -1.7942, df = 8, p-value = 0.1105
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.8713934  0.1417914
sample estimates:
cor
-0.5356559


### A wrapper function

I wrapped all these steps up within a single function on Github gist: