From: borchers@newshost.nmt.edu (Brian Borchers)
Subject: Re: regularization
Date: 5 Oct 1999 10:24:39 -0600
Newsgroups: sci.math.num-analysis
Keywords: error propagation, regularization

smarsly@mpikg-golm.mpg.de wrote:

>who can help me with a problem concerning the error propagation of fitting
>data WITH additional regularization (or some other constraints)?
>
>The problem: in least square fits of data with a fitting function f(c), c
>being the vector containing the fitting parameters, one has to minimize
>
>|f(c)- data|^2 + r*Reg(c). 
>Reg(c) is a regularization term. Let F be a
>matrix containing the derivatives d f(c)/dc AND the regularization terms,
>then the solution c of the fitting is c = inv(F) * data (neglecting weighting,
>etc.) with the covariance matrix Cc = inv(F) * C_data * inv(F)' (Cc provides
>the intervals of "confidence" for the fitting parameters c,
>C_data represents the data errors).
>
>My question: is it correct to include the regularization for the
>calculation of the covariance matrix (i.e. the intervals of
>confidence) in this way? If not, how is the regularization taken into
>account correctly in calculating the error propagation for the
>parameters c?

There are several important things to consider here:
 
  - First, you should always be aware that the estimates produced with 
    regularization are biased in the sense that the expected value of the
    regularized solution is not the value of the "true" solution.  Depending
    on how large the regularization parameter r is, the results may be
    extremely biased.  
 
  - If you don't include the regularization in computing the confidence
    intervals, then you'll get confidence intervals that are just as large
    as those that you would get from simple least squares.  Since you wouldn't
    be using regularization unless the least squares problem was badly 
    conditioned, these intervals are likely to be so large that they're
    useless.  
 
  - If you do include the regularization in computing the confidence 
    intervals, you have to be aware that the confidence intervals are
    also biased.  As r gets larger, the confidence intervals will get
    tighter...  

Two extreme cases illustrate the problem.  Suppose that you're using
simple 0th order Tikhonov regularization on a linear inverse problem
Ax=b.  That is, you're minimizing
 
   ||Ax-b||^2 + r^2 ||x||^2
 
Suppose further that the least squares problem is very badly conditioned.
If we use r=0, we get the least squares solution with very large 
confidence intervals for the parameters.  On the other hand, in the
limit as r goes to infinity, we get the solution x=0 with confidence
intervals of +- 0.  
 
Which do you prefer, an unbiased solution that tells you that the data
don't tell you anything, or an incredibly biased solution (a solution
to the wrong problem!) with very tight confidence intervals for the
parameters?  This trade off should be considered in constructing
confidence intervals for the parameters.

Note that papers in the geophysics literature often do include such
confidence intervals.  The typical approach is to find the generalized
inverse matrix (including the regularization at the level which was
used in constructing the inverse solution) and use it to compute a
confidence interval.  An alternative approach is to use a Monte Carlo
method to generate lots of inverse solutions for simulated data sets
(with the same statistics as the actual data set) and then generate
confidence intervals from these solutions...

There's a brief discussion of constructing confidence intervals for
discrete ill-posed problems (along with some references) in Per
Hansen's book, "Rank Deficient and Discrete Ill-Posed Problems", on
page 123.

-- 
Brian Borchers                              borchers@nmt.edu
Department of Mathematics                   http://www.nmt.edu/~borchers/
New Mexico Tech                             Phone: 505-835-5813
Socorro, NM 87801                           FAX: 505-835-5366