From: "Matthew Wolf" Subject: Re: M&M Color Distribution Problem: Please Help Date: Sun, 31 Jan 1999 16:56:12 +0000 Newsgroups: sci.math Keywords: Statistical tests to use for comparing populations Your daughter's data are OBSERVED FREQUENCIES (O) and the M&M data are EXPECTED FREQUENCIES (E). Your first idea - the paired sample t-test - was inappropriate for a number of reasons. First, the E are not a random sample from a population: they are fixed values determined by M&M. Second, the t-test requires that the two samples come from populations which are Normally distributed: since frequencies take integer values only, they clearly cannot be Normally distributed. Third, as you pointed out yourself, the t-test is only interested in the means of the populations from which the data are drawn regardless of the colour distribution. You must be careful to distinguish frequencies (counts of things) with measurements of things. The appropriate test is a chi-squared test: but a goodness of fit test. The null hypothesis is that your daughter's data are drawn from a population whose colour distribution is that given by M&M. The test statistic is SUM[(O-E)^2/E] = 20.283, the critical value is taken from a chi-squared distribution with 5 degrees of freedom: the test statistic is significant even at the 0.5% level of significance (CV = 16.750), from which we conclude that your daughter's data were NOT drawn from a population whose colour distribution was given by M&M. A surprising result that merits further investigation! (For example, if you fit a uniform distribution - with all the Es = 100/6 - you get a very small test statistic [=3.56] and conclude that the uniform distribution is an excellent fit to the data.) What you have done is to contruct a 2x6 contingency table treating both the O and the E as if they were all one set of observed data. Your computer has then calculated 12 corresponding expected frequencies and calculated the test statistic (= 10.424) as above. This incorrect test statistic is not significant at even the 5% level of significance (CV = 11.070). (Coincidentally the number of degrees of freedom for the goodness of fit test is the same as the number of degrees of freedom for the contingency table test.) There are two types of chi-squared test: goodness of fit (appropriate here) and contingency table (what your computer did: inappropriate here). A contingency table would arise where you collect data, say, on a sample of chocolate M&Ms and on a sample of peanut M&Ms: your null hypothesis would be that the colour distribution in the chocolate M&Ms is the same as that in the peanut M&Ms. You would construct a 2x6 table of observed data (the contingency table) and calculate a corresponding 2x6 table of expected data, and then proceed as above. Matthew Wolf Salinger's, Oxford ---------- >From: reid@kc.net >Newsgroups: alt.sci.math.statistics.prediction,sci.math >Subject: M&M Color Distribution Problem: Please Help >Date: Sun, 31 Jan, 1999, 1:00 am > >I'm trying to show my daughter how to compare small sample populations with >larger populations. She took 100 plain M&Ms and counted the following colors: > >blue 14 >red 16 >green 15 >yellow 23 >brown 14 >orange 18 > >Then we checked the M&M website and found that the intend for the plain M&Ms >to have the following distribution: > >blue 10 >red 20 >green 10 >yellow 20 >brown 30 >orange 10 > >Now, I thought I would try the Student's PAIRED T-test, but got a P-value of >1.0 and realized that even if there were 100 reds and no other colors in my >daughters sample that this test would yield the same result. Wrong approach. > >Then I tried to create a ? crosstab ? correspondence table (see below) and >tried the chi-square test which resulted in a P-value of 0.0598. > >Does this seem reasonable?? Should I be using another test?? > >Thanks in advance! > >Rob Reid, M.D. & Statistics-Idiot > >Crosstabs >Population Color >Count BLUE BROWN GREEN ORANGE RED YELLOW >Company Specs 10 30 10 10 20 20 100 >Ellie's Sample 14 14 15 18 16 23 100 > 24 44 25 28 36 43 200 > >Tests >Source DF -LogLikelihood RSquare (U) >Model 5 5.30149 0.0382 >Error 194 133.32794 >C Total 199 138.62944 >Total Count 200 > >Test ChiSquare Prob>ChiSq >Likelihood Ratio 10.603 0.0598 >Pearson 10.424 0 > >-----------== Posted via Deja News, The Discussion Network ==---------- >http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own ============================================================================== From: "Atul Sharma" Subject: Re: Can anyone help with some stats? Date: Tue, 02 Nov 1999 21:34:43 GMT Newsgroups: sci.math.symbolic If you're comparing frequency data (i.e. proportion of employees who are male and female among clerical staff vs. upper management), chi square is usually appropriate. If you're comparing the mean between two groups (i.e. mean salaries of men vs women), a Students t test is more useful. If you're comparing the mean among more than 3 groups, ANOVA would be suitable. With each of these methods, there are important caveats that you need to consider. For instance, if the group are small, the chi square tests may need to be replaced by Fisher's Exact. The question of normally distributed data and equality of variances between the groups are important considerations for any tests comparing mean values. You might want to consult an introductory statistics text for more details. Martin Bland's Introduction to Medical Statistics is short, easy to read, and full of practical tips, though it concentrates on medical examples more than business examples. Your chances of getting an answer from a list like this are also likely better if you included more specific detail with your question. A. S. ------------------------------------------------------------------ Atul Sharma MD, FRCP(C) Pediatric Nephrologist, McGill University/Montreal Children's Hospital Charlotte wrote in message <381F2869.93151E23@email.msn.com>... >Hello! I am trying to summarise and analyse some data but I really don't >know exactly what to do with it. Should I do chi-squared test of some >kind or is the data not suitable? The data is about the salary and other >details of 474 bank employees and shouldn't be too complicated really. >I hope there is someone out there who can give me some help in looking >at this stuff. I'm a student and hope this isn't too simple for this >page, if it is just delete me or something. >Please send any replies to: >mat7cmc@leeds.ac.uk >Thank you >Charlotte