From: spellucci@mathematik.tu-darmstadt.de (Peter Spellucci) Subject: Re: Numerical integration: a question Date: 13 Apr 2000 09:25:38 GMT Newsgroups: sci.math.num-analysis Summary: [missing] In article <38F556B7.4AD3080E@NoSPAMeecs.umich.edu>, Thomas Kragh writes: |> If your data is given to you on a uniformly-sampled set of points, and |> you do not have any other information about the function, I would say |> that an iterated Simpson's Rule is probably your best best. |> |> Note that the "standard" simpson's rule is >exact< for polynomials up to |> 3rd-power, so fitting a cubic spline is a waste of time - the cubic |> polynomial fit is "built into" the numerical integration algorithm |> already. this is not completely correct. think what simpson does: it interpolates three consecutive points by a parabola, integrates this exactly and sums up. by hazard, if these three points are from a cubic, it integrates this cubic exact (because of symmetry of weights and nodes with respect to the midpoint of the interval). Now, what does the questioners code? it interpolates the data globally and obtains a cubic b e t w e e n a n y t w o grid points, evaluates this piecewise cubic on a refined grid with the half stepsize and integrates this one. If the data are indeed smooth, the order of the error is O(h^4) in both cases. But... assume that his data are subject to some (hopefully small) errors. then he can use a smoothing spline , do exactly the same thing and will get a much more meaningful result than simply applying Simpsons rule to the raw data. His question, whether there exists some "better" method is hard to answer. In principle one can use either piecewise integration by higher order Newton-Cotes formulae or interpolation of an interpolating spline of higher order (no problem to compute such) or smoothing splines of higher order (also no problem in principle, but are there ready to use codes out there say for a fifth or seventh degree smoothing spline?). But all this makes sense only if the errors in the data are very small, best zero, a n d t h e h i g h e r d e r i v a t i v e s of the function underlying all this growth slower in magnitude for order k than (1/h)^k, his grid size. For smooth data and high precision arithmetic, one could decide that on the basis of the higher order divided differences of the data, but for data subject to some noise this makes no sense. hope that helps peter ============================================================================== From: "r.e.s." Subject: Simpson's paradox, continuous-case Date: Sun, 16 Apr 2000 10:57:38 -0700 Newsgroups: sci.math,sci.stat.math A general form of "Simpson's paradox" is as follows: Random variables X,Y,Z, are distributed such that, for y,z in the support of Y,Z, E(X|Y=y) is strictly increasing in y, but, for every z, E(X|Y=y,Z=z) is strictly decreasing in y. (The distribution of (X,Y,Z) could be discrete, continuous, or mixed.) Are there especially nice & simple examples of this in which E(X|Y=y) and E(X|Y=y,Z=z) are *continuous* functions? --r.e.s. ============================================================================== From: Rich Ulrich Subject: Re: Simpson's paradox, continuous-case Date: Mon, 17 Apr 2000 16:16:53 -0400 Newsgroups: sci.math,sci.stat.math On Sun, 16 Apr 2000 10:57:38 -0700, "r.e.s." wrote: > A general form of "Simpson's paradox" is as follows: > > Random variables X,Y,Z, are distributed such that, > for y,z in the support of Y,Z, > < snip; x is correlated with y; controlling for z, x is negatively correlated with y .> > Are there especially nice & simple examples of this > in which E(X|Y=y) and E(X|Y=y,Z=z) are *continuous* > functions? The number of firemen attending to a fire and the cost of damages are pretty-much continuous variables. Before you control for seriousness, it does appear that firemen are responsible for damages. Simpson's paradox is also know as "Ecological Fallacy," and I think that the examples under that name may be more apt to be continuous. -- Rich Ulrich, wpilib@pitt.edu http://www.pitt.edu/~wpilib/index.html ============================================================================== From: "r.e.s." Subject: Re: Simpson's paradox, continuous-case Date: Tue, 18 Apr 2000 00:28:55 -0700 Newsgroups: sci.math,sci.stat.math "Rich Ulrich" wrote ... | "r.e.s." wrote: | | > A general form of "Simpson's paradox" is as follows: | > | > Random variables X,Y,Z, are distributed such that, | > for y,z in the support of Y,Z, | > | < snip; x is correlated with y; | controlling for z, x is negatively correlated with y.> Of course the part that you snipped, viz., E(X|Y=y) is strictly increasing in y, but, for every z, E(X|Y=y,Z=z) is strictly decreasing in y. is more general than correlation effects. (It reads as though you were saying they're equivalent. I agree that specializing to correlations is a first step toward geting a "nice & simple" (linear) example.) | > Are there especially nice & simple examples of this | > in which E(X|Y=y) and E(X|Y=y,Z=z) are *continuous* | > functions? | | The number of firemen attending to a fire and the cost of | damages are pretty-much continuous variables. Before you | control for seriousness, it does appear that firemen are | responsible for damages. The example seems to be this: X=#firemen, Y=cost of damages, Z=some other measure of "seriousness". It does seem reasonable that E(X|Y=y) might be increasing in y; however, does it seem likely to you that, for fixed z, E(X|Y=y,Z=z) is strictly decreasing in y? (Because of the likely strong associations among X, Y, Z, it seems to me that conditioning on Z=z might very much weaken, but not "reverse", the dependency on y. So this seems to be a case of "confounded" variates, and more an example of the Ecological Fallacy than of a generalized Simpson's paradox, which would involve the "reversal" behavior.) | Simpson's paradox is also know as "Ecological Fallacy," | and I think that the examples under that name may be more | apt to be continuous. Thanks, I hadn't seen that terminology. But from what I've read so far, the Ecological Fallacy doesn't require the pseudo-paradoxical "reversal" behavior -- i.e. the reversal from strictly increasing to strictly decreasing y-dependency between E(X|Y=y) and E(X|Y=y,Z=z) -- that's characteristic of Simpson's "paradox". http://www2.chass.ncsu.edu/garson/pa765/datalevl.htm has this to say about the term: "Coined by Robinson (1950), the ecological fallacy is assuming that individual-level correlations are the same as aggregate-level correlations. Robinson showed that individual level correlations may be larger, smaller, or even reverse in sign compared to aggregate level correlations." Elsewhere, the definition of "ecological fallacy" is extended to refer to the use of aggregate data to draw inferences about individuals -- a very broad definition, indeed! --r.e.s. ============================================================================== From: Rich Ulrich Subject: Re: Simpson's paradox, continuous-case Date: Tue, 18 Apr 2000 11:53:14 -0400 Newsgroups: sci.math,sci.stat.math Concerning my response to his post, on Tue, 18 Apr 2000 00:28:55 -0700, "r.e.s." wrote: < ... > > The example seems to be this: > > X=#firemen, > Y=cost of damages, > Z=some other measure of "seriousness". > > It does seem reasonable that E(X|Y=y) might be increasing > in y; however, does it seem likely to you that, > for fixed z, E(X|Y=y,Z=z) is strictly decreasing in y? < ... > Yes, there is that reversal, so they go the opposite directions. Pay attention to "reversal." It is not hard to write a description of "strictly increasing... decreasing" from a concrete example, but it is trickier to take the abstract and construct the concrete in terms that still sound natural. Reversal: - For *fixed* seriousness, expect less damage with more firemen. - Whereas, overall, expect more damage with more firemen. I snipped the E() lines the first time because I find them confusing; now, I guess that r.e.s. did, too. < snip; concerning Ecological Fallacy, which I described as another name for Simpson's Paradox -- > > http://www2.chass.ncsu.edu/garson/pa765/datalevl.htm > has this to say about the term: > > "Coined by Robinson (1950), the ecological fallacy is > assuming that individual-level correlations are the > same as aggregate-level correlations. Robinson showed > that individual level correlations may be larger, > smaller, or even reverse in sign compared to aggregate > level correlations." > > Elsewhere, the definition of "ecological fallacy" is > extended to refer to the use of aggregate data to draw > inferences about individuals -- a very broad definition, > indeed! Very good, and very well; I concede that the Ecological Fallacy is a broader term than Simpson's Paradox. I guess, the latter can provide the harsh examples of the former. I had not paid attention to that, but in the future, I will try to be more specific with my description. -- Rich Ulrich, wpilib@pitt.edu http://www.pitt.edu/~wpilib/index.html ============================================================================== From: israel@math.ubc.ca (Robert Israel) Subject: Re: Simpson's paradox, continuous-case Date: 18 Apr 2000 19:21:37 GMT Newsgroups: sci.math,sci.stat.math In article <8dcut8$pis$1@slb6.atl.mindspring.net>, "r.e.s." writes: > A general form of "Simpson's paradox" is as follows: > > Random variables X,Y,Z, are distributed such that, > for y,z in the support of Y,Z, > > E(X|Y=y) is strictly increasing in y, > but, for every z, > E(X|Y=y,Z=z) is strictly decreasing in y. > > (The distribution of (X,Y,Z) could be discrete, > continuous, or mixed.) > > Are there especially nice & simple examples of this > in which E(X|Y=y) and E(X|Y=y,Z=z) are *continuous* > functions? Suppose (Y,Z) have the joint distribution f(y,z) = 2 for 0 <= z <= y <= 1 0 otherwise and X = Z - Y/4. Then E[Z|Y=y] = y/2 so E[X|Y=y] = y/4 is strictly increasing in y, but E[X|Y=y,Z=z] = z-y/4 is strictly decreasing in y. Robert Israel israel@math.ubc.ca Department of Mathematics http://www.math.ubc.ca/~israel University of British Columbia Vancouver, BC, Canada V6T 1Z2