From: Dave Rusin Subject: Re: Q:convex underestimator ?? Date: Mon, 10 May 1999 12:58:37 -0500 (CDT) Newsgroups: [missing] To: musa@nectar.kaist.ac.kr Keywords: approximating functions with quadratics I am sorry about the delay in responding to you. Thank you for giving your last explanation; I think I understand your situation better now. But your problem may not have the answer you want! ================================ Let me repeat your statements. You begin with the function f = x*Sin[9*x] + (9-x^3)/(-6) and the interval I = [-0.2, +1.5]. You wish to compare this f to the quadratic function q = (-0.2-x)*(1.5-x) in two different ways. First, you ask for the minimum A such that f + A q is convex on I. This is easy; you want the minimal A such that f''(x) + 2A >= 0 on I, which means A = max( -f''(x)/2, x in I ) = (-1/2)*min( f''(x), x in I). By my computations this occurs at x=1.5, so we need A = 42.7256179 or so. Second, you ask for the minimum k such that (k q + B) <= (f + A q) on I, for some B. But I don't really understand this. You suggest a value like k=46.2 but I could easily take k=0 ! Of course then we need B = -30 or so. So you haven't really clarified exactly how you decide when one combination of k and B is better than another. Is that your question? "What is a good description of an optimal pair k,B ?" If so, I could make a suggestion, but of course you can adjust it to get many other ideas. Here's how I would set up the question. For any fixed k, your inequality is satisfied for any B <= min( CF - k q ); it seems clear that the "best" B would have B as large as possible consistent with this (to minimize errors), i.e. B is exactly equal to the min. In particular, it is then clear that k q + B will be an "underestimator" for your convex CF, for each k. Now, which k is "best"? One possible idea is to ask for the k which minimizes, say, the L^1 - distance between your convex function and k q + B, that is, minimizes the AREA of the region between the two graphs. Since kq+B is supposed to be less than CF this is just going to be the choice of k which maximizes the integral of kq+B over I. This integral is just equal to - k*(xU-xL)^3/6 + B(xU-xL); dividing by the constant xU-xL and substituting in what B is, we see we are trying to choose k to maximize -k * (xU-xL)^2/6 + min( CF(x) -k*q(x), x in I). If I may write C = (xU-xL)^2/6 and G(x,k) = -k*C - k*q(x) + CF(x), then you are looking for the k and x which are the points where the optimization occurs in the calculation of max_k min_{x in I} G(x,k) . I will show below that the optimal quadratic function in this sense is 41.404319 * q - 2.9483065 But again I must stress that this is _not necessarily_ what you were hoping to calculate; I am trying to _define_ a question for you so that there is a unique answer. You could decide that you want the answer to a different question, but it must be a question which is precise in some way, just as this question is precise. ================================ Let me show you how you can solve your problem in the formulation I have given. You want to choose (x0,k0) to be a critical point in (-0.2, 1.5)x(-oo, oo) for the function G, where "critical" here means what I wrote above. Since we insist that G(x0,k0) <= G(x, k0) for all x in I, we must have dG/dx = 0 there, unless x0 is one of the endpoints of I. Well, the equation dG/dx=0 specifies a curve in the x-k plane; we are restricted to the combinations (x0,k0) which lie on this curve. Actually, we are more restricted than that: for some values of k0 there will be several values of x0 with (x0, k0) on the curve; we must find the choice of these x0 which minimizes G(x0, k0). The best x0 will vary continuously with k0 in general, that is, we will be looking at point in various branches of the curve. The only times when x0 will "jump" on this curve are at points where either the curve does not extend across an interval of k's, or when there are two values of x0 for which (x0,k0) is on the curve and where G takes on the same value. If this is not clear, let me describe a simple example: suppose the equation dG/dx=0 happened to describe the curve k=x^3-x, and that G(x,k)=x-k. Then for most k there is a unique x with (x,k) on the curve, but not for small k, where we have 3 choices for x. We see that among these three, the one making G smallest is the smallest such x. Thus as we vary k continuously from -oo to 2sqrt(3)/9 we find that the value of x (with (x,k) on the curve) which minimizes G varies continuously to -sqrt(1/3); then as k increases beyond that, the optimal x jumps over to +sqrt(1/3) and then increases as k does. This is not a bad picture to keep in mind, although it isn't exactly the picture which applies in your case. It just makes it clearer what happens when you say you need to maximize a function defined on a curve with a discontinuity. OK, now let me return to your example. We have the function G(x,k) = -k*C - k*q(x) + CF(x) and we want to choose the x for every k which minimizes G, and then choose the k which maximizes this. The first step requires we stay on the curve 0 = dG/dx = -k*q'(x) + CF'(x) or else stay on the boundaries x = xU or x = xL. Well, it's not so easy to say exactly how x varies with k, but just as in my previous example, the _curve_ is very easily described nonetheless: it's just the set of points (x,k) with k = CF'(x)/q'(x). For some k there may be no such x, or just one, or several. When there are several, we must choose the one which really minimizes G, but this should change smoothly along the curve except at k's where the number of solutions changes or at which two x's give exactly the same value of G. Then we need to maximize G along this disconnected curve. Since on the curve we just have k = CF'(x)/q'(x), we can determine the values of G along the curve from the x coordinates alone: by substituting, we get G(x,k) = H(x):= CF(x) - k*(C + q(x)) = CF(x) - (CF'(x)/q'(x))*(C+q(x)). The maximum value of this function then occurs either where H'(x) = 0 or at the endpoints of the domain (that is, at the "special" values of x where the curve "jumps" or changes to x=-0.2 or x=1.5). Interestingly, H'(x) = (C + q(x)) * (CF'(x)/q'(x))', which means H'=0 is only possible where q(x) = -C or where dk/dx=0; this last condition means the curve over which we are doing the maximizing has a flat spot. Such points are already of interest since these are places where the optimal x can "jump". Since -C=q(x) is easily checked not to occur for x between xL and xU, this means that the optimal k you seek will be precisely one of the values of k where the optimal x "jumps" from one branch of the curve to another. OK, now I am in position to analyze your particular function. You have CF(x) = f + A q where f = x*Sin[9*x] + (9-x^3)/(-6), q=(-0.2-x)*(1.5-x), and A=42.7256179, approximately. For any k, the optimal x will give a point on the curve k=CF'(x)/q'(x), which in this example varies up until about x=-0.11, then down until x=.13, then up until x=.44, then down to an asymptote at x=.65; on the other side of the asymptote, the curve varies down until about x=1.04, then up until about x=1.4, then down. So for every k>51.8 and every k< 31.5 (approximately) there is a unique x; for intermediate values of k there are several x's which are possible places where G is minimized. I ran through some numerical calculations. It appears to be true that when k < 41.4 or so, the value of x which minimizes G is the one on the branch of the curve between the critical point at x=0.444 and the asymptote at x=.65 . When k is between 41.4 and 45.7, the minimizing value of x is the one between the critical points at x=1.04 and 1=1.40. When k > 45.7, the value of x which makes G minimal is the left endpoint x=-0.2 ! The transition points more precisely are at k=41.404319 G=-22.891387 x jumps from 0.5484492 to 1.209296 k=45.710914 G=-23.323987 x jumps from 1.2595947 to -0.2 Investigating the behaviour of the minimal G for other values of k shows this: as k increases to 41.4, then H:=min(G(x,k), x in I) increases to -22.89; as k increases further to 45.7, H decreases steadily to -23.32. As k increases beyond that, H = G(-0.2, k) decreases linearly to -oo. Thus the largest value for H occurs at k=41.4, with x=0.548 or 1.209 (a tie). Since the values of these numbers are important let me show you how I got them, once I had approximate locations for them. Using Maple, I used steps like these: (recall I am trying to find the value kk of k where the optimal value of G "jumps" from x1 to x2). for i from 0 to 10 do x1:=.54+i/10^3: #start with estimate of first x kk:=eval(subs(x=x1,crv)): #crv = CF'(x)/q'(x) h1:=eval(subs(x=x1,H)): #H = G(x,k) where k=CF'(x)/q'(x) x2:=fsolve(crv=kk,x=1.2..1.3): #numerically compute the other value of x h2:=eval(subs(x=x2,H)): #see if this other x makes H bigger or not lprint(x1,x2,h1,h2,h1-h2,kk):od: #show the answer Then repeat with an additional digit after "0.54", etc. So I have decided that the "best" k according to my criterion is k=41.404319; then B = min( CF - k q ) = CF(x)-k*q(x) (with x=0.5484492 or 1.209296) is B = -2.9483065. Your "best" quadratic underestimator is 41.404319 * q - 2.9483065 By the way, you don't really need to start with a convex function for any of this analysis. You could have started with your original f and concluded that the "best" coefficients r and s with r*q(x) + s <= f(x) for all x in I ("best" in the sense of minimizing the integral of f - (rq+s) ) would be -1.321298 (-.2 - x) (1.5 - x) - 2.9483065 where the integral is only 1.5189. ================================ Now let me return to the problem of what, exactly, is the question we should answer! Another idea is to look for an _arbitrary_ quadratic polynomial q. That is, do not try to force q to have its minimum point at the center of the interval. Then you can make the lowest point of q be precisely at the lowest point (x0, y0) of your convex function, and you still have a one-parameter family of functions q = y0 + k*(x-x0)^2 to choose from. Now it makes sense to ask for the largest k which would keep this q always less than the convex function. Of course that's an easy problem to solve, too: k = max{ (CF(x)-y0)/(x-x0)^2, x in I} In your problem this is about k=67.58375556 with x0=.6106942, y0=-32.697781 This quadratic gives a much better fit near the low points in the graph of CF and a much worse fit near the endpoints. This is probably an easier question to answer, in general. Is it the RIGHT question? That depends on what you want to do with your "quadratic underestimator". The version you proposed would be appropriate if the endpoints xU and xL are important, and the low point of the quadratic is to be exactly between those two. On the other hand, this last version I proposed assumes that the low point not be exactly between xL and xU but is to match the low point of the function exactly. I've also indicated that we can use either integrals or maximum errors as our measure of "best". Which is the right question to ask? That depends on your application. dave