Computing correlation between vectors is a pretty standard process – and that functionality is included in many standard programming packages spanning, I would venture, all common programming languages.

The basic idea – given two vectors and what is the mutual relationship between the two. A classic measure of “mutual relationship” defined by the pearson’s correlation coefficient is simply the covariance of the two vectors divided by the product of their standard deviations .

The problem can be also translated to a set of vector operations involving the computation of the mean of each vector, the Sum of Squares (SSQ) of each vector and the Sum of Products between the two vectors. (If readers are interested – we can dig deeper here).

* But what if you have a matrix that is can you efficiently compute the correlation matrix?*

First off, l**et’s define a few helpers:**

- Sum of Products: and;
- the diagonal of as a vector.

Next, using the Sum of Products Matrix, a n-column version of the diagonal () as well it’s transpose (), we can use these to compute the pearson’s correlation coefficient using Sum of Products.

Pearson’s correlation = :

Here’s a python function that returns that matrix:

1 2 3 4 5 6 |
import numpy as np def matpcorr(M): SP = np.dot(norm(M),norm(M).transpose()) spDiag = np.tile([np.array(SP.diagonal())],(len(SP.diagonal()),1)) denom = np.sqrt(np.multiply(spDiag,spDiag.transpose())) return np.divide(SP,denom) |

And now for some sample code using that function:

1 2 3 4 5 6 7 8 9 10 11 12 |
import matplotlib.pyplot as plt import scipy.stats as sp #Using ipython? include the following %matplotlib inline a = np.random.random((100,1000)) plt.imshow(np.abs(matpcorr(a)), interpolation='none') plt.jet() plt.colorbar() print np.abs(matpcorr(a)) |

**So here are some results:**

1. 10 vectors of length 1000:

1 |
a = np.random.random((10,1000)) |

2. 100 vectors of length 1000:

1 |
a = np.random.random((100,1000)) |

As expected, given random normal distributions, vectors should have low correlation between any other vector (and trivially perfect correlation to themselves).

correlation, math, matrix, python, technology

Comments RSS Feed