HuskerN.png

Rudy Moser’s Husker Probability Page

HuskerN.png

Goodness of Fit Tests[i]

I figured it would be useful to include some goodness-of-fit tests in order to validate some of the claims made.

One such goodness of fit test is a chi-squared test.

 

This gives us a test statistic to see how far off our observed values are off from the expected values.  If the observed frequencies are close to the corresponding expected frequencies, the X2-value will be small, indicating a good fit.  If the observed frequencies differ considerably from the expected frequencies, the X2-value will be large and the fit is poor. The critical value of a Chi-squared distribution with a .05 level of significance is 21.026.  That is, that if the Chi-squared statistic is greater than 21.026 then we cannot say we have a good fit.

 

Spread (Expected)

Observed

(oi-ei)^2/ei

Western Michigan

14

23

5.785714

San Jose St.

26.5

23

0.462264

New Mexico St.

25.5

31

1.186275

Virginia Tech

6.5

-5

20.34615

Missouri

-10

-35

-62.5

Texas Tech

-20.5

-6

-10.2561

Iowa State

7

28

63

Baylor

11

12

0.090909

Oklahoma

-22

-34

-6.54545

Kansas

1

10

81

Kansas State

6

28

80.66667

Colorado

18

9

4.5

Gator Bowl (Clemson)

-2.5

5

-22.5

Chi-Squared Statistic

155.2364

 

However it is not recommended to use a Chi-squared test if some of the expected frequencies are less than 5.

 

There are other tests however.  We are making an assumption that our data follows a normal distribution.  To test normality we can use Geary’s Test.  The test statistic for Geary’s Test can be computed by the following formula.

 

 

The denominator gives a reasonable estimate of σ, std. deviation, whether the distribution is normal or otherwise.  The numerator is a good estimator of σ if the distribution is normal, but overestimates or underestimates when there are departures from normality.  Thus values of u differing considerably from 1.0 represent that the hypothesis of normality should be rejected.  A standardization of U is given by

 

 

Geary’s Test statistic then follows a normal distribution and we use a two-tailed test for significance.  The table for the husker data is as follows.

 

Result vs. Spread

|Xi - X|

(Xi - X)^2

Western Michigan

9

6.807692

46.34467

San Jose St.

-3.5

5.692308

32.40237

New Mexico St.

5.5

3.307692

10.94083

Virginia Tech

-11.5

13.69231

187.4793

Missouri

-25

27.19231

739.4216

Texas Tech

14.5

12.30769

151.4793

Iowa State

21

18.80769

353.7293

Baylor

1

1.192308

1.421598

Oklahoma

-12

14.19231

201.4216

Kansas

9

6.807692

46.34467

Kansas State

22

19.80769

392.3447

Colorado

-9

11.19231

125.2678

Gator Bowl (Clemson)

7.5

5.307692

28.1716

Average

2.192308

Geary's Test Statistic

1.056608

Std. Deviation

13.89475

p-value

0.443069

 

The test statistic is very close to 1 and the p-value is large, more than .05, which means we fail to reject the hypothesis that the data follows a normal distribution. 

 

Rudy Moser: rmoser@lps.org



[i] Walpole, Myers, Myers, Ye. Probability & Statistics for Engineers & Scientists.  7th Edition.  Prentice Hall.  Pg 334-336.