From: Dave Rusin <rusin@math.niu.edu>
Subject: Re:  Q:convex underestimator ??
Date: Mon, 10 May 1999 12:58:37 -0500 (CDT)
Newsgroups: [missing]
To: musa@nectar.kaist.ac.kr
Keywords: approximating functions with quadratics

I am sorry about the delay in responding to you. Thank you for giving your
last explanation; I think I understand your situation better now. But your 
problem may not have the answer you want!

                ================================

Let me repeat your statements. You begin with the function
    f = x*Sin[9*x] + (9-x^3)/(-6)
and the interval
    I = [-0.2, +1.5].

You wish to compare this  f  to the quadratic function
    q = (-0.2-x)*(1.5-x)
in two different ways.

First, you ask for the minimum  A  such that   f + A q  is convex on  I.
This is easy; you want the minimal  A  such that  f''(x) + 2A >= 0  on  I,
which means  A = max( -f''(x)/2, x in I ) = (-1/2)*min( f''(x), x in I).
By my computations this occurs at  x=1.5, so we need  A = 42.7256179 or so.

Second, you ask for the minimum  k  such that  (k q + B) <= (f + A q)  on I,
for some  B. But I don't really understand this. You suggest a value like
k=46.2 but I could easily take  k=0 ! Of course then we need  B = -30 or so.

So you haven't really clarified exactly how you decide when one combination
of  k  and  B  is better than another.

Is that your question? "What is a good description of an optimal pair  k,B ?"
If so, I could make a suggestion, but of course you can adjust it to get 
many other ideas. Here's how I would set up the question.

For any fixed  k, your inequality is satisfied for any  B <= min( CF - k q );
it seems clear that the "best"  B  would have  B  as large as possible
consistent with this (to minimize errors), i.e. B is exactly equal to the min.
In particular, it is then clear that  k q + B  will be an "underestimator"
for your convex  CF, for each  k.  Now, which  k  is "best"? One possible
idea is to ask for the  k  which minimizes, say, the  L^1 - distance
between your convex function and  k q + B, that is, minimizes the AREA of
the region between the two graphs. Since  kq+B  is supposed to be less than
CF  this is just going to be the choice of  k  which maximizes the integral
of  kq+B  over  I. This integral is just equal to  - k*(xU-xL)^3/6 + B(xU-xL);
dividing by the constant  xU-xL  and substituting in what  B  is, we see we
are trying to choose  k  to maximize
    -k * (xU-xL)^2/6 + min( CF(x) -k*q(x),  x in I).
If I may write  C = (xU-xL)^2/6  and  G(x,k) = -k*C - k*q(x) + CF(x),
then you are looking for the  k  and  x  which are the points where the
optimization occurs in the calculation of
	max_k min_{x in I}  G(x,k) .

I will show below that the optimal quadratic function in this sense is
	41.404319 * q  - 2.9483065
But again I must stress that this is _not necessarily_ what you were hoping
to calculate; I am trying to _define_ a question for you so that there is
a unique answer. You could decide that you want the answer to a different
question, but it must be a question which is precise in some way, just
as this question is precise.

                ================================

Let me show you how you can solve your problem in the formulation I have
given.

You want to choose  (x0,k0)  to be a critical point in  (-0.2, 1.5)x(-oo, oo)
for the function  G,  where "critical" here means what I wrote above.
Since we insist that  G(x0,k0) <= G(x, k0)  for all  x  in I, we must have
dG/dx = 0 there, unless  x0  is one of the endpoints of  I. Well, the
equation  dG/dx=0  specifies a curve in the  x-k plane; we are
restricted to the combinations (x0,k0)  which lie on this curve.

Actually, we are more restricted than that: for some values of  k0  there
will be several values of  x0  with  (x0, k0) on the curve; we must find
the choice of these  x0  which minimizes  G(x0, k0). The best  x0  will
vary continuously with  k0  in general, that is, we will be looking at
point in various branches of the curve. The only times when  x0  will
"jump" on this curve are at points where either the curve does not
extend across an interval of  k's, or when there are two values of  x0
for which (x0,k0) is on the curve and where  G  takes on the same value.

If this is not clear, let me describe a simple example: suppose the equation
dG/dx=0  happened to describe the curve  k=x^3-x, and that  G(x,k)=x-k.
Then for most  k  there is a unique  x  with  (x,k) on the curve, but not
for small k, where we have 3 choices for  x. We see that among these
three, the one making  G  smallest is the smallest such  x.  Thus as we
vary  k  continuously from  -oo  to  2sqrt(3)/9  we find that the value of
x  (with  (x,k) on the curve) which minimizes  G  varies continuously
to  -sqrt(1/3); then as  k  increases beyond that, the optimal  x
jumps over to  +sqrt(1/3) and then increases as  k  does.

This is not a bad picture to keep in mind, although it isn't exactly the
picture which applies in your case. It just makes it clearer what happens
when you say you need to maximize a function defined on a curve with
a discontinuity.

OK, now let me return to your example. We have the function
	G(x,k) = -k*C - k*q(x) + CF(x)
and we want to choose the  x  for every  k  which minimizes  G, and
then choose the  k  which maximizes this. The first step requires we
stay on the curve  0 = dG/dx = -k*q'(x) + CF'(x)  or else stay on the 
boundaries  x = xU  or  x = xL.  Well, it's not so easy to say exactly
how  x  varies with  k, but just as in my previous example, the _curve_
is very easily described nonetheless: it's just the set of points (x,k) with
k = CF'(x)/q'(x). For some  k  there may be no such x, or just one, or
several. When there are several, we must choose the one which really
minimizes  G, but this should change smoothly along the curve except at k's
where the number of solutions changes or at which two x's give exactly
the same value of  G.

Then we need to maximize  G  along this disconnected curve. Since on the
curve we just have  k = CF'(x)/q'(x),  we can determine the values of  G
along the curve from the  x  coordinates alone: by substituting, we get
	G(x,k) = H(x):= CF(x) - k*(C + q(x)) = CF(x) - (CF'(x)/q'(x))*(C+q(x)).
The maximum value of this function then occurs either where  H'(x) = 0  or
at the endpoints of the domain (that is, at the "special" values of  x 
where the curve "jumps" or changes to x=-0.2 or x=1.5).

Interestingly,  H'(x) = (C + q(x)) * (CF'(x)/q'(x))', which means  H'=0
is only possible where  q(x) = -C  or where  dk/dx=0; this last condition
means the curve over which we are doing the maximizing has a flat spot.
Such points are already of interest since these are places where the
optimal  x  can "jump". Since  -C=q(x)  is easily checked not to occur
for  x  between  xL  and  xU, this means that the optimal  k  you
seek will be precisely one of the values of  k  where the optimal  x
"jumps" from one branch of the curve to another.


OK, now I am in position to analyze your particular function. You have
CF(x) =  f + A q  where  f = x*Sin[9*x] + (9-x^3)/(-6),  q=(-0.2-x)*(1.5-x),
and A=42.7256179, approximately. For any  k, the optimal  x  will give a
point on the curve  k=CF'(x)/q'(x), which in this example varies up until
about x=-0.11, then down until x=.13, then up until x=.44, then down to
an asymptote at x=.65; on the other side of the asymptote, the curve varies
down until about x=1.04, then up until about x=1.4, then down. So for
every  k>51.8 and every k< 31.5 (approximately) there is a unique  x;
for intermediate values of  k  there are several  x's  which are possible
places where  G  is minimized.

I ran through some numerical calculations. It appears to be true that when
k < 41.4 or so, the value of  x  which minimizes  G  is the one on the
branch of the curve between the critical point at x=0.444 and the 
asymptote at x=.65 . When  k  is between 41.4 and 45.7, the minimizing
value of  x  is the one between the critical points at x=1.04 and 1=1.40.
When k > 45.7, the value of  x  which makes  G  minimal is the left
endpoint  x=-0.2 ! The transition points more precisely are at
	k=41.404319  G=-22.891387  x jumps from 0.5484492 to 1.209296
	k=45.710914  G=-23.323987  x jumps from 1.2595947 to -0.2

Investigating the behaviour of the minimal  G  for other values of  k
shows this: as  k  increases to  41.4, then  H:=min(G(x,k), x in I) increases
to -22.89; as  k  increases further to 45.7, H decreases steadily to -23.32.
As k increases beyond that, H = G(-0.2, k) decreases linearly to -oo.

Thus the largest value for  H  occurs at  k=41.4, with x=0.548 or 1.209
(a tie).

Since the values of these numbers are important let me show you how I
got them, once I had approximate locations for them.  Using Maple, I used steps
like these: (recall I am trying to find the value  kk  of  k  where the
optimal value of  G  "jumps" from x1 to x2).
 for i from 0 to 10 do x1:=.54+i/10^3:  #start with estimate of first x
 kk:=eval(subs(x=x1,crv)):  #crv = CF'(x)/q'(x)
 h1:=eval(subs(x=x1,H)):    #H = G(x,k) where k=CF'(x)/q'(x)
 x2:=fsolve(crv=kk,x=1.2..1.3): #numerically compute the other value of  x
 h2:=eval(subs(x=x2,H)):    #see if this other  x  makes  H  bigger or not
 lprint(x1,x2,h1,h2,h1-h2,kk):od:  #show the answer
Then repeat with an additional digit after  "0.54", etc.

So I have decided that the "best" k according to my criterion is 
k=41.404319; then  B = min( CF - k q ) = CF(x)-k*q(x) (with x=0.5484492 or
1.209296) is  B = -2.9483065.  Your "best" quadratic underestimator is
	41.404319 * q  - 2.9483065

By the way, you don't really need to start with a convex function for any
of this analysis. You could have started with your original  f  and
concluded that the "best" coefficients   r  and  s  with
r*q(x) + s <= f(x)  for all  x  in I  ("best" in the sense of minimizing
the integral of  f - (rq+s)  )  would be
	-1.321298 (-.2 - x) (1.5 - x) - 2.9483065
where the integral is only 1.5189.

                ================================

Now let me return to the problem of what, exactly, is the question we
should answer!

Another idea is to look for an _arbitrary_ quadratic polynomial  q.
That is, do not try to force  q  to have its minimum point at the center of
the interval. Then you can make the lowest point of  q  be precisely at the
lowest point  (x0, y0) of your convex function, and you still have a 
one-parameter family of functions  q = y0 + k*(x-x0)^2  to choose from. Now it
makes sense to ask for the largest  k  which would keep this  q  always
less than the convex function.  Of course that's an easy problem to solve, too:
     k = max{ (CF(x)-y0)/(x-x0)^2,  x in  I}
In your problem this is about k=67.58375556 with x0=.6106942, y0=-32.697781
This quadratic gives a much better fit near the low points in the graph of CF
and a much worse fit near the endpoints.

This is probably an easier question to answer, in general. Is it the
RIGHT question? That depends on what you want to do with your 
"quadratic underestimator". The version you proposed would be appropriate
if the endpoints  xU  and  xL  are important, and the low point of the
quadratic is to be exactly between those two. On the other hand, this
last version I proposed assumes that the low point not be exactly between
xL and xU but is to match the low point of the function exactly.
I've also indicated that we can use either integrals or maximum errors
as our measure of "best".

Which is the right question to ask? That depends on your application.

dave