From: parendt@black.nmt.edu (Paul Arendt) Subject: Re: legendre transformation Date: 21 Mar 2000 23:42:54 GMT Newsgroups: sci.math Summary: [missing] In article <953663948.850903@marvin>, mark wrote: >what is the geometric interpretation of the legendre transformation As it happens, I have an old post (from sci.physics.research) on that topic junking up my disk space. Here it is again... ************************************************************* The Legendre transformation is used extensively in mechanics (taking us from Lagrangians to Hamiltonians and back) and thermodynamics (relating energy to Helmholtz and Gibbs free energies, and enthalpy). However, few texts show the geometric meaning behind it, so let's look at that now. I'll go into more detail about the Lagrangian and Hamiltonian in a follow-up post; this post is long enough as it is! Let's imagine that we have a graph of some _concave_ function of one variable (keeping things simple for now), called f(x) for lack of a better term. (By "concave" I mean that if you pick two points on the function, the straight line connecting them never crosses below the function. The second derivative of f -- when it exists -- is never negative.) We can define a new function as follows. Imagine a straight line of slope "p" which we can place somewhere far below the function, so that this line y = p x + b doesn't intersect the given function (this won't always be possible for every slope p). Of course, "b" in the above equation represents the intercept -- the y-value of the point where the line crosses the y-axis. Now, let's move "b" up, while holding "p" fixed, until the line JUST touches the function. This means the above equation (y = p x + b) and y = f(x) hold simultaneously for the value of x where this happens, which we can call x0. (Note that we might be touching the function on an entire linear interval, so x0 may not be unique.) This defines a special value for "b" which we'll call "b0", again for lack of a better term. It should be clear that b0 will depend on both the function f(x) and the slope p. In fact, we can solve the above equations for b0 very easily, by eliminating y: p x0 + b0 = f(x0) (for any of the values x0 where f(x) and the line touch) so b0 = f(x0) - p x0 . Now, we can repeat this for every value of "p" which makes sense. (Don't worry; some examples will be given very soon!) Then we have a "map" from p to "b0", which is unique under the assuptions outlined above. Given a value of p, we either have no line which doesn't cross the function or we can find a unique b0. What does one of these pairs (p,b0) tell us about the function? Answer: not very much! All we know for sure is that f(x) doesn't go below the line y = p x + b0, and that it touches this line in at least one point. However, the values (p,b0) for _every_ possible value of p tell us a _lot_ about f(x)!! Let's graph all of these values: it defines a new function, which we can call g(p). Our new function g(p) is only defined for the values of p for which the above procedure works. The new function g(p) is called the Legendre transform of the function f(x). Note that it depends on a different variable. Time for some examples! The easiest example is a straight line: f(x) = m x + b We can't draw any straight lines which don't cross this unless they have the same slope (non-Euclidean geometries aside!). So we draw a line of slope p = m, and move it up until it touches this line. The Legendre transform of f(x) = m x + b is thus the single point (m,b) or { b, if p = m g(p) = { undefined, otherwise Let's try a slightly harder example: two half-lines meeting at a point. f(x) looks like this (use a fixed-width font to view this!): y | / f(x) | / | / | /slope p2 | / | / -----------------|--------/------------ x | / | .'x* | .' | .' b1___ |' .'| . .' | . slope p1.' |. .' | ---b2 .| f(x) = { p1 x + b1, x < x* { p2 x + b2, x > x* Now the Legendre transform clearly has the points (p1,b1) and (p2,b2) on it, but it also has an infinitude of points (p,b) with p1 < p < p2 and b1 > b > b2. These correspond to all lines which can be drawn tangent to the graph at x = x*, while not crossing the function. It's not hard to convince oneself that these points lie on a straight line in (p,b) space! All the lines we can draw tangent at x* satisfy f(x*) = p x* + b Since g(p) = b in this equation, we have a line segment in p-space, with slope -x* and intercept which would be at f(x*): g(p) = - x* p + f(x*) In fact, the Legendre transform of f looks like: b | p1 p2 | | | ---------|------------------------------ p | | b1--| ._ | `._ g(p), with slope -x* | `._ b2--| ` This g(p) is just a line _segment_, with endpoints corresponding to the lines in f(x). The middle line segment comes from the point x*, where we found a bunch of slopes that touched. This is a general feature of Legendre transformations: points and lines are "dual" to one another! Now, we can take the Legendre transform of g(p) again, and then the transform of that, etc. etc. and conceivably construct an infinite series of convoluted transforms of f(x). The remarkable thing is that this series has only two distinct members: the L.T. of g(p) gives back f(x) again. Well, almost... because of the funny minus sign above, we actually get f(-x). In physics, the L.T. is usually minus the one we have defined above, so that we keep track of the pairs (p,-b), where b is the intercept of the line with slope p. But I didn't want to introduce that before showing the motivation behind it. We thus see a pattern emerging: a single point of the L.T. tells us little of the original function, but the entire L.T. tells us the whole function, as the "envelope of tangents" to the original function. It's good to stop and think about this for a while. Now, things get interesting when f is differentiable! Then we can proceed further with the formula for the Legendre transformation, because the lines we're drawing will be tangent to the graph of f(x). Instead of _choosing_ "p", then moving its line up until it touches the graph of f(x) somewhere, we can pick an x, and take the derivative of f there to _find_ p corresponding to that x. Thus: p = df/dx (partial derivative, if f has other variables) This equation needs a set of instructions to go with it: for a given x0, we evaluate df/dx at the point x0. This gives a "p" corresponding to that x0. But the same p could be found for other values of x besides x0, as the linear f(x) examples above show. So the map from "x" to "p" is not necessarily 1-1. However, the graph g(p) for _all_ p is enough to know f(x) for _all_ x, so we don't lose any information. Thus the Legendre transform of a differentiable function looks like g(p) = x p - f(x) where we've inserted the overall minus sign, so g(p) is _minus_ the intercept of the line with slope p which touches the graph of f(x). This formula also should have instructions. We don't pick a value of x and a value of p to find g, despite the appearance of the equation! Instead, we either: - Pick a value of x=x0. Find df/dx at x0, and call it p. This gives one point (p,g(p)) on the graph of the Legendre transform of f(x). Doing this for all values of x will produce the entire graph of g(p). So x is the dependent variable in this case, and p is a function of x. If we don't restrict to differentiable f's, we might get a whole slew of "p's" for that x, so it isn't necessarily a function. or: - Pick a value of p. We must use the "move the line up" procedure to find the value(s) of x which go in the above equation. However, we don't really need to find them: we just read off the intercept of the line when it first touches f(x). In this case, the appearance of "x" above is a bit spurious: x can be found given a "p", but we might get a whole slew of "x's", so this isn't necessarily a function. After g(p) has been constructed, this latter procedure is equivalent to just looking a the graph of g(p) and reading off the value of g(p) for a _given_ value of p, so p is the dependent variable. Hope this makes things less confusing. Now for some fun... If we relax the condition that f be a convex function, and allow for concave functions, multivalued graphs, and the like, we can define Legendre transformations of "shapes" if we're careful. (Just gotta make sure the envelope of tangents defines the shape we're looking at; this can be done by inspection more easily than by giving a bunch of scary-looking conditions.) As exercises, try to show that: - The Legendre transform of a circle is a hyperbola - The L.T. of a parabola is a parabola This last fact is of great importance in nonrelativistic classical mechanics. For relativistic mechanics, we need to see that: - The L.T. of f(x) = sqrt(1-x^2) (defined on the interval -1 < x < 1) is g(p) = sqrt(1+p^2) Next time, we'll see why we take the Legendre transform of the Lagrangian with respect to v = dx/dt, and not with respect to x, to get the Hamiltonian. We'll also see why the Lagrangian is a function on the tangent bundle of configuration space, and the Hamiltonian a function on the cotangent bundle.