From: parendt@black.nmt.edu (Paul Arendt)
Subject: Re: legendre transformation
Date: 21 Mar 2000 23:42:54 GMT
Newsgroups: sci.math
Summary: [missing]

In article <953663948.850903@marvin>,
mark <mark.vanloock@student.kuleuven.ac.be> wrote:
>what is the geometric interpretation of the legendre transformation

As it happens, I have an old post (from sci.physics.research)
on that topic junking up my disk space.  Here it is again...

*************************************************************

The Legendre transformation is used extensively in mechanics
(taking us from Lagrangians to Hamiltonians and back) and
thermodynamics (relating energy to Helmholtz and Gibbs free
energies, and enthalpy).  However, few texts show the geometric
meaning behind it, so let's look at that now.  I'll go into
more detail about the Lagrangian and Hamiltonian in a follow-up
post; this post is long enough as it is!

Let's imagine that we have a graph of some _concave_ function of
one variable (keeping things simple for now), called f(x)
for lack of a better term.  (By "concave" I mean that if you pick
two points on the function, the straight line connecting them never
crosses below the function.  The second derivative of f -- when it
exists -- is never negative.)  We can define a new function as
follows.  Imagine a straight line of slope "p" which we can place
somewhere far below the function, so that this line

   y = p x + b

doesn't intersect the given function (this won't always be
possible for every slope p).  Of course, "b" in the above
equation represents the intercept -- the y-value of the
point where the line crosses the y-axis.  Now, let's move
"b" up, while holding "p" fixed, until the line JUST touches
the function.  This means the above equation  (y = p x + b)
and

   y = f(x)

hold simultaneously for the value of x where this happens,
which we can call x0.  (Note that we might be touching the function
on an entire linear interval, so x0 may not be unique.)

This defines a special value for "b" which we'll call "b0",
again for lack of a better term.  It should
be clear that b0 will depend on both the function f(x) and the
slope p.  In fact, we can solve the above equations for b0 very
easily, by eliminating y:

  p x0 + b0 = f(x0)

(for any of the values x0 where f(x) and the line touch) so

b0 = f(x0) - p x0  .

Now, we can repeat this for every value of "p" which makes sense.
(Don't worry; some examples will be given very soon!)  Then we
have a "map" from p to "b0", which is unique under the assuptions
outlined above.  Given a value of p, we either have no line which
doesn't cross the function or we can find a unique b0.

What does one of these pairs (p,b0) tell us about the function?
Answer: not very much!  All we know for sure is that f(x)
doesn't go below the line  y = p x + b0, and that it touches
this line in at least one point.

However, the values (p,b0) for _every_ possible value of p tell
us a _lot_ about f(x)!!  Let's graph all of these values: it defines
a new function, which we can call g(p).  Our new function g(p)
is only defined for the values of p for which the above procedure
works.  The new function g(p) is called the Legendre transform of
the function f(x).  Note that it depends on a different variable.

Time for some examples!  The easiest example is a straight line:

 f(x) = m x + b

We can't draw any straight lines which don't cross this unless
they have the same slope (non-Euclidean geometries aside!).  So
we draw a line of slope p = m, and move it up until it touches this
line.  The Legendre transform of f(x) = m x + b
is thus the single point (m,b) or

        { b,         if p = m
g(p) =  { undefined, otherwise 

Let's try a slightly harder example: two half-lines meeting at
a point.  f(x) looks like this (use a fixed-width font to view
this!):

                 y
                 
                 |              / f(x)
                 |             / 
                 |            / 
                 |           /slope p2
                 |          / 
                 |         / 
-----------------|--------/------------ x
                 |       /
                 |     .'x*
                 |   .'
                 | .'  
         b1___   |'    
               .'|  .
             .'  | .
   slope p1.'    |. 
         .'      | ---b2
                .|

 f(x) = {  p1 x + b1,  x < x*
        {  p2 x + b2,  x > x*

Now the Legendre transform clearly has the points (p1,b1)
and (p2,b2) on it, but it also has an infinitude of points
(p,b) with p1 < p < p2 and b1 > b > b2.  These correspond
to all lines which can be drawn tangent to the graph at
x = x*, while not crossing the function.  It's not hard
to convince oneself that these points lie on a straight
line in (p,b) space!  All the lines we can draw tangent
at x* satisfy

f(x*) = p x* + b

Since g(p) = b in this equation, we have a line segment
in p-space, with slope -x* and intercept which would be
at f(x*):

g(p) = - x* p + f(x*)

In fact, the Legendre transform of
f looks like:

         b

         |     p1      p2
         |     |       |
---------|------------------------------ p
         |
         |
     b1--|     ._
         |       `._ g(p), with slope -x*
         |          `._ 
     b2--|             `


This g(p) is just a line _segment_, with endpoints
corresponding to the lines in f(x).  The middle
line segment comes from the point x*, where we
found a bunch of slopes that touched.  This is
a general feature of Legendre transformations: points
and lines are "dual" to one another!

Now, we can take the Legendre transform of g(p) again,
and then the transform of that, etc. etc. and conceivably
construct an infinite series of convoluted transforms of
f(x).  The remarkable thing is that this series has only
two distinct members: the L.T. of g(p) gives back f(x) again.
Well, almost... because of the funny minus sign above,
we actually get f(-x).  In physics, the L.T. is usually
minus the one we have defined above, so that we keep track
of the pairs (p,-b), where b is the intercept of the line
with slope p.  But I didn't want to introduce that before
showing the motivation behind it.

We thus see a pattern emerging: a single point of the
L.T. tells us little of the original function, but the
entire L.T. tells us the whole function, as the "envelope
of tangents" to the original function.  It's good to stop
and think about this for a while.

Now, things get interesting when f is differentiable!  Then
we can proceed further with the formula for the Legendre
transformation, because the lines we're drawing will be
tangent to the graph of f(x).  Instead of _choosing_ "p",
then moving its line up until it touches the graph of f(x)
somewhere, we can pick an x, and take the derivative of f
there to _find_ p corresponding to that x.  Thus:

p = df/dx  (partial derivative, if f has other variables)

This equation needs a set of instructions to go with it: for
a given x0, we evaluate df/dx at the point x0.  This gives a "p"
corresponding to that x0.  But the same p could be found for
other values of x besides x0, as the linear f(x) examples above
show. So the map from "x" to "p" is not necessarily 1-1.
However, the graph g(p) for _all_ p is enough to know f(x) for
_all_ x, so we don't lose any information.

Thus the Legendre transform of a differentiable function looks
like

g(p) = x p - f(x)

where we've inserted the overall minus sign, so g(p) is _minus_
the intercept of the line with slope p which touches the graph
of f(x).

This formula also should have instructions.  We don't pick
a value of x and a value of p to find g, despite the appearance
of the equation!  Instead, we either:

- Pick a value of x=x0.  Find df/dx at x0, and call it p.  This
gives one point (p,g(p)) on the graph of the Legendre transform
of f(x).  Doing this for all values of x will produce the entire
graph of g(p).  So x is the dependent variable in this case, and
p is a function of x.  If we don't restrict to differentiable f's,
we might get a whole slew of "p's" for that x, so it isn't
necessarily a function.

or:

- Pick a value of p.  We must use the "move the line up" procedure
to find the value(s) of x which go in the above equation.  However,
we don't really need to find them: we just read off the intercept of
the line when it first touches f(x).  In this case, the appearance of
"x" above is a bit spurious: x can be found given a "p", but we might
get a whole slew of "x's", so this isn't necessarily a function.

After g(p) has been constructed, this latter procedure is equivalent
to just looking a the graph of g(p) and reading off the value of g(p)
for a _given_ value of p, so p is the dependent variable.

Hope this makes things less confusing.  Now for some fun...

If we relax the condition that f be a convex function, and allow for
concave functions, multivalued graphs, and the like, we can define
Legendre transformations of "shapes" if we're careful.  (Just gotta
make sure the envelope of tangents defines the shape we're looking at;
this can be done by inspection more easily than by giving a bunch of
scary-looking conditions.)

As exercises, try to show that:

- The Legendre transform of a circle is a hyperbola
- The L.T. of a parabola is a parabola

This last fact is of great importance in nonrelativistic classical
mechanics.  For relativistic mechanics, we need to see that:

- The L.T. of f(x) = sqrt(1-x^2) (defined on the interval -1 < x < 1)
  is g(p) = sqrt(1+p^2)

Next time, we'll see why we take the Legendre transform of the
Lagrangian with respect to v = dx/dt, and not with respect to
x, to get the Hamiltonian.  We'll also see why the Lagrangian
is a function on the tangent bundle of configuration space, and
the Hamiltonian a function on the cotangent bundle.