From: "Alan Miller" <amiller @ vic.bigpond.net.au>
Subject: Re: Problems with multiplying large matrices
Date: Wed, 09 Feb 2000 01:43:25 GMT
Newsgroups: sci.math.num-analysis
Summary: [missing]

E. Robert Tisdale wrote in message <38A0BE33.42F2AC6D@netwood.net>...
>Alan Miller wrote:
>
>> I recommend that you use a QR factorization
>>
>> X = QR
>>
>> where Q is an orthonormal transformation (Q'Q = I)
>> and R is the upper-triangular Cholesky matrix,
>> which in your case will be the upper triangle of a 15x15 matrix.
>
>This needs a little explanation.
>Q is a 100 million by 100 million element matrix.
>
Q is 100 million x 100 million, but it is never formed.
It needs a lot of explanation.
Anyone who is not familiar with modern methods of
least squares calculation (post about 1970), should
read a book on it such as:

`Solving Least Squares Problems' by C.L.Lawson &
R.J.Hanson which has been reprinted as a paperback
by SIAM (1995).  (First edition - Prentice-Hall, 1974)

The original contained the code for each algorithm.
This can be downloaded from the lawson-hanson
directory at netlib.

For a more advanced (far more mathematical) text, see:
`Numerical Methods for Least Squares Problems' by
Ake Bjorck, also published by SIAM as a paperback
(1996).
--
Alan Miller, Retired Scientist (Statistician)
CSIRO Mathematical & Information Sciences
Alan.Miller -at- vic.cmis.csiro.au
http://www.ozemail.com.au/~milleraj
==============================================================================

From: hrubin@odds.stat.purdue.edu (Herman Rubin)
Subject: Re: Problems with multiplying large matrices
Date: 9 Feb 2000 12:35:05 -0500
Newsgroups: sci.math.num-analysis

In article <ve1o4.3000$k6.8836@news-server.bigpond.net.au>,
Alan Miller <amiller @ vic.bigpond.net.au> wrote:

>LazyLizard wrote in message <87q5bc$75o$1@nnrp1.deja.com>...
>>In article <389F3291.97EF9B3C@netwood.net>,

>>Thanks for the suggestion but I need to take care of the number of
>>values (that would be cases of the variable set) of the variables.
>>Those cases are of the order of 10-100 million, its part of a data-
>>minig problem. So, basically, I would have limited number of columns
>>(say 20) and almost unlimited number of rows (say 10-100 million) for
>>the input set. How do I transpose such a matrix? Or better still, how
>>can I find (X^T)times(X)? I would still need X^T in order to solve the
>>matrix equation, because I need to pre-multiply D with X^T.
>>Let me list out the steps that I am using to solve the linear
>>regression equation:
>>1. Find X^T
>>2. Premultiply both sides of the equation with X^T
>>    ((X^T)X)W = (X^T)D
>>3. Find ((X^T)X) inverse
>>4. Premultiply both sides of the equation with result of step 3
>>   W = (inverse of (X^T)X)((X^T)D)
>>Here, X is a matrix dimension:(100million X n), n<40
>>      W is a matrix dimension:(n X 1), n<40
>>      D is a matrix dimension:(100million X 1)

>>By the way I am using Java to develop the application.
>>So, now can you help me?

>>Thanks in anticipation ...


>This is not a good way to carry out linear least-squares
>calculations.   I recommend that you use a QR factorization

>X = QR

>where Q is an orthonormal transformation (Q'Q = I) and R
>is the upper-triangular Cholesky matrix, which in your case
>will be the upper triangle of a 15x15 matrix.

The number of operations in the QR factorization is
roughly twice that in computing by the usual method,
and the memory requirement is larger.  

>Using planar (Jacobi/Givens) rotations, this can be done
>one row at a time, so that you never have to store your
>1million x 15 matrix X in memory.

No, but there would have to be large movements of 
data in and out.  To carry out the operations on
the right side, the 15 vectors, of essentially full
size, will still have to be used repeatedly.

On the other hand, if the X matrix is available
by rows, one can multiply X^T by X, or by D,
without doing anything involving much transient
storage.  One can do this for multidimensional
D easily, and this, using the symmetry, gives
as good a way to do it to obtain X^T*X as any.

This was even done for regressions using large
data sets BC (before computers).  Just write
out the matrix multiplication as a loop, adding
the product of the (transpose of one row of X)
with (one row of D) to the partial sum at a time.
Only the partial sums need to be stored as 
intermediates.

-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
hrubin@stat.purdue.edu         Phone: (765)494-6054   FAX: (765)494-0558
==============================================================================

From: "Alan Miller" <amiller @ vic.bigpond.net.au>
Subject: Least squares, 10 million cases
Date: Thu, 10 Feb 2000 00:25:50 GMT
Newsgroups: sci.math.num-analysis

Some further comments:

Herman Rubin is only partially correct.
The `straight' planar rotation algorithm does take about
twice as many operations as forming the sum of squares
and products matrix (X'X).   My version (either at my
web site or Applied Statistics algorithm AS 274) uses
Sven Hammarling's algorithm which requires about
50% more operations than forming X'X.
N.B. My version stores the Cholesky factor, R, as a
vector, which saves time with subscript calculations
compared with 2D storage.

There is a fast planar rotation algorithm by Anda & Park,
but I don't know of any public domain code for this.
It takes about the same number of operations as forming
X'X.

I don't understand his comments about the Y variable.
It is treated in the same way as one of the X variables.

With 10 million cases, and 15 X-values and a Y-value
for each case, access to the data is obviously an
important consideration.   The planar rotation updating
method accesses the data by rows.   Presumably, when
he talks about punched cards, he was also accessing
the data by rows (cases), rather than by columns (variables).

My understanding is that Java stores data by rows, so that
the planar rotation algorithm would be a very appropriate
one for this language, whereas Householder reduction,
which operates on columns, can be very efficient in Fortran.

I did a web search for `QR Least squares Java' and found several
sites with least-squares code in Java, but all of those which
I looked at used translations of Householder reduction code
which had originally been coded in Fortran.   I have seen two
translations of my code into C (using f2c), but I don't know of
a translation into Java.

--
Alan Miller, Retired Scientist (Statistician)
CSIRO Mathematical & Information Sciences
Alan.Miller -at- vic.cmis.csiro.au
http://www.ozemail.com.au/~milleraj