From: spellucci@mathematik.tu-darmstadt.de (Peter Spellucci) Subject: Re: nonlinear least squares (levenberg-marquardt) Date: 16 Mar 2001 17:16:31 GMT Newsgroups: sci.math.num-analysis Summary: How is the Levenberg-Marquardt process used for fitting to data? In article <98ropv01t59@news1.newsguy.com>, student@maths.edu (student) writes: |> You say not to use normal equations for least squares, yet |> LM requires normal equations (or so I understand). So what |> you are saying then is do not use LM? This is very confusing, |> since the texts I have looked at (not just NR) recommend LM |> for nonlinear least squares. Are they all wrong? no, although in most of these texts an important point is overlooked (see below). What modern LM-version does: Problem: ||F(x)||_2^2 = min_x where F is a smooth function with m components of n unknowns x, m>>n usually. m>=n definitely required. For a fit problem, the fit data are incorporated into the component, i.e. "i", e.g. F_i(x)= y_i - ( x(1)*exp(-x(2)*t_i) + x(3) ) where (t_i,y_i) are the given data to be fitted by one exponential term plus asymptotic term. It is simpler to describe anything in terms of F. Given a guess x^k for the optimal x* (problems of local versus global optimizers set aside) one linearizes the problem under a trust region constraint: step k: minimize ||F(x^k)+J_F(x^k)*d||_2^2 under the constraint ||d||<= Delta_k the norm is typically a scaled euclidean norm, I will write the constraint now as d'*B_k'*B_k*d <= Delta_k^2 with a regular matrix B_k, and the prime denotes transposition. J_F is the Jacobian matrix of F the solution process is a follows: First compute the min without the constraint. This is a usual linear least squares problem, which can be solved via the svd (minimum norm solution) even if J_F is rank deficient. If it happens that this d satisfies the constraint, we take it as d^k. Otherwise it follows that the constraint is active. Then the multiplier rule of Kuhn and Tucker says that there is a positive lambda, such that for the d^k sought there holds (J_F(x^k)'*J_F(x^k)+lambda*B_k'*B_k)*d^k=-J_F(x_k)'*F(x^k) (*) This lambda is unique. Hence the problem boils down to a unidimensional zero-finding problem: find lambda such that (*) holds and in addition d'*B_k'*B_k*d = Delta_k^2 (observe: d=d(lambda)) Given a guess for lambda, we can compute d^k for (*) from a linear least squares problem || J_F(x^k)*d+F(x^k) || || sqrt(lambda)*B_k*d + O || = min_d O is a zero vector of dimension n. Hence again here is a point to use the svd (although other solutions, e.g. QR-decomposition would also be feasible) With d^k computed we now test whether we can accept d^k: we check whether actual reduction: ||F(x^k)^2-||F(x^k+d^k)||^2 is sufficiently positive, and compare it with the predicted reduction: ||F(x^k)||^2-||F(x^k)+J_F(x^k)*d^k||^2 (>0) If the quotient of the two is >= rho_0 (e.g. =10^{-4}) accept x^k+d^k as x^{k+1} < rho_0 : reject, reduce Delta_k and repeat this step. If the quotient is in the interval [rho_0,rho_1] : take Delta_k as Delta_{k+1} If the quotient is >= rho_1 (typically 0.75 or so) then take Delta_{k+1}=2*Delta_k (increase). There is point: For a bad fit problem, it might be that due to the neglect of the true second order information in the model (the part sum_i Hessian(F_i(x))*F_i(x) is missing) we never get a value >0.75. If sometimes we got a small Delta_k, this will remain small all the time, and a small Delta_k may force a lambda>0, but a lambda>0 produces slow progress (like a pure gradient step), this in turn will produce a huge number of steps until the constraint becomes inactive) The zero problem with respect to lambda can be solved by a variety of ways, for example a special adaptation of Newtons method as proposed by Hebden, I propose simply Brent-Decker. (There are some subtle points in numerically evaluating the expressions above, i leave them out) Hence you see: the svd is helpful indeed here in order to avoid illconditioning which would result in a corrupted direction of correction. |> |> Also, could you recommend an on-line source that explains |> (in layman's terms) how one would go about using SVD to solve |> a nonlinear least-squares problem without forming the normal |> equations? And if you solve by that method, do you still get |> a covariance matrix for error propagation analysis? If the problem is not "almost linear", i.e. ||sum_i Hessian(F_i(x))*F_i(x)|| is not small compared to the smallest eigenvalue of J_F(x*)'*J_F(x*) you must compute the Hessian of ||F(x)||^2 at x* anyway, since otherwise your error propagation analysis will be corrupted. I.e. J_F'*J_F does not suffice in general. If you want to take it nevertheless, you also get inverse(J_F'*J_F) from the svd of J_F quite easily: If J_F= U*S*V' U, V unitary, S of the same shape with the singular values of J_F along its diagonal and zero otherwise, then inverse(J_F'*J_F) = V*inverse(S'*S)*V' and inverse(S'*S) =diag(1/singular values^2) hope that helps peter