From: Paige Miller <paige.miller@kodak.com>
Subject: Re: PCA on covariance or correlation matriceS?
Date: Tue, 09 May 2000 08:11:52 -0400
Newsgroups: sci.math.num-analysis,sci.stat.math
To: A T <alban.tsui@cognos.com>
Summary: [missing]

A T wrote:
> 
> Sorry to post this again. I had the answer awhile ago from the newsgroup but
> my machien crashed and lost my files.
> 
> Could someone explain to me the difference on using covaraince matrix and
> correlation matrix for working out the pricipal components again?

There is no difference in the calculations, the only difference is in
interpretation.

If you use covariance matrix, then you are implicitly assuming that all
of the original variables are in the same units. Thus, if they are not
all in the same units ... for example, if one variable is pH (range 0 to
14) and another variable is RPMs on a motor (range is 1000 to 10000 rpm)
then the variable with the larger variability (in this case RPMs) will
dominate the loadings that you find. Variables with larger variances
will dominate the results (which may or may not be what you want).

If you use the correlation matrix, then you force all of the original
variables to be in the same units. Thus, they will be given a priori the
same weight in the analysis (which may or may not be what you want).

In the case I cited where one variable was pH and another variable was
RPMs, I believe most people would select the correlation matrix as the
approach.

-- 
Paige Miller
Eastman Kodak Company
paige.miller@kodak.com
"It's nothing until I call it!" -- Bill Klem, NL Umpire