From: Jeffery J. Leader Subject: Re: Standard Deviation - Why (n-1) in denominator ? Date: Sun, 12 Mar 2000 17:58:55 -0500 Newsgroups: sci.math Summary: [missing] On 12 Mar 2000 17:20:47 -0500, antichrist_0@hotmail.com (dominic) wrote: >In the definition of the standard deviation of a sample of data, the >denominator is (n-1). Yup, this is correct. >But the population standard deviation definition has n (not (n-1)) as >the denominator. Also correct. >Does anyone have a proof or something convincing to say? It can be shown that the one with (n-1) in the denom. gives the right results in the limit. Read up about 'unbiased estimators' for more info. Also note that the population formula has the pop. mean in it, not the sample mean--so there is another difference between the two formulas. Having to estimate the pop. mean by the sample mean in some sense 'uses up' 1 of the n values used to find the sample s.d. (read up on degrees of freedom). ============================================================================== From: hrubin@odds.stat.purdue.edu (Herman Rubin) Subject: Re: Standard Deviation - Why (n-1) in denominator ? Date: 12 Mar 2000 20:20:02 -0500 Newsgroups: sci.math In article <238ocss9vbs9u8oukm5tpsv5dhc2n8t3pc@4ax.com>, Jeffery J. Leader wrote: >On 12 Mar 2000 17:20:47 -0500, antichrist_0@hotmail.com (dominic) >wrote: >>In the definition of the standard deviation of a sample of data, the >>denominator is (n-1). >Yup, this is correct. >>But the population standard deviation definition has n (not (n-1)) as >>the denominator. >Also correct. >>Does anyone have a proof or something convincing to say? >It can be shown that the one with (n-1) in the denom. gives the right >results in the limit. Read up about 'unbiased estimators' for more >info. Also note that the population formula has the pop. mean in it, >not the sample mean--so there is another difference between the two >formulas. Having to estimate the pop. mean by the sample mean in some >sense 'uses up' 1 of the n values used to find the sample s.d. (read >up on degrees of freedom). In the limit, it does not matter. The simple reason is that the expected value with n-1 in the denominator gives an unbiased estimator; i.e., the expected value is the true value, no matter what the distribution happens to be. In the case of normality, the distribution of the sum of squares of deviations from the mean is the same as that of the sum of squares of n-1 independent normal random variables with the same variance and mean 0. -- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 hrubin@stat.purdue.edu Phone: (765)494-6054 FAX: (765)494-0558 ============================================================================== From: Virgil Subject: Re: Standard Deviation - Why (n-1) in denominator ? Date: Sun, 12 Mar 2000 22:23:10 -0700 Newsgroups: sci.math On 12 Mar 2000 17:20:47 -0500, antichrist_0@hotmail.com (dominic) wrote: >In the definition of the standard deviation of a sample of data, the >denominator is (n-1). >But the population standard deviation definition has n (not (n-1)) as >the denominator. >Does anyone have a proof or something convincing to say? The reason is really based on variances, V, which are squares of standard deviations, SD, and not on the standard deviations themselves. When samples are used to estimate the population Vs, we must replace the n in the population V's denominator by an n-1 in order to have an "unbiased" estimate of the population V. Unbiased estimates of V will, in the long run, average to the true population value of V. With the sqrt(V), (SDs), the sample SD (with n-1 in the denominator) is a good estimator of the population SD, but is not quite unbiased, because square rooting throws things a little bit off. There is a small correction factor that one can use, especially for small n, but it is almost always ignored. -- Virgil vmhjr@frii.com ============================================================================== From: horst.kraemer@t-online.de (Horst Kraemer) Subject: Re: Standard Deviation - Why (n-1) in denominator ? Date: Mon, 13 Mar 2000 11:54:01 GMT Newsgroups: sci.math On 12 Mar 2000 17:20:47 -0500, antichrist_0@hotmail.com (dominic) wrote: > In the definition of the standard deviation of a sample of data, the > denominator is (n-1). > > But the population standard deviation definition has n (not (n-1)) as > the denominator. > > This is supposibly because the sample s.d. with (n-1) gives a better > estimate of the real s.d. of the population this way. Precisely: The mean value (w.r.t. to repetitive sampling) of the s.d. estimator with (N-1) in the denominator is exactly the s.d. of the population - i.e. it is an "unbiased" estimator. Proof: We have to show that Sum (X_i-m)^2 E ( --------------- ) = V(X) N-1 where V(X) = E(X^2) - E(X)^2 is the variance of the population, assuming that the X_i are pairwise uncorrelated and have an identical distribution. Sums are always taken from 1 to N and m is the mean value of the sample. It is easy to show that Sum (X_i - m)^2 (Sum X_i)^2 = Sum X_i^2 - ----------- N Taking apart the square yields Sum X_i^2 - Sum X_i*X_j Sum X_i^2 - ----------------------- N (N-1)Sum X_i^2 - Sum X_i*X_j = ---------------------------- N Sum X_i*X_j means the sum of all products for i different from j. This sum has N(N-1) components. Now E(X_i*X_j) = E(X_i)E(X_j) = E(X)^2 if i different from j because the X_i are pairwise uncorrelated and have an identical distribution. Thus (N-1)Sum X_i^2 - Sum X_i*X_j E( ---------------------------- ) N N(N-1)E(X^2) - N(N-1)E(X)^2 = --------------------------- = (N-1)(E(X^2)-E(X)^2) N q.e.d. Regards Horst