TRYING TO
UNDERSTAND THE
by P Barber (March 2010)
Introduction
1. The Anderson Darling test (Anderson Darling 1952) has been found to be one of the most effective tests for discriminating between candidates for the fitting of a distribution to a data set. (Stephens 1974). However, the author has struggled to understand how and why this test works. This paper is an attempt to set out the authors search for an explanation of this powerful test.
2. The results of the test are not dependent upon the particular distribution being selected, which means that the test is not limited to a few particular distributions. (Engineering Statistics Handbook)
3. The function underlying the test appears to be of the form:
4. Where P represents the cumulative distribution of idealised data points within the range zero to one. A plot of this function is shown below. The area under the graph is equal to one. Note that the curve below does not represent a statistical distribution, it might look like a frequency distribution curve, but it is a function, not a distribution.
5. As proof of the above equation, the website http:integrals.wolfram.com/index.jsp? Supplies the integral:
6. Inserting the limits of P = 1 we have:
7. And for P = 0 we have:
8. We therefore have:
9. When calculating the Anderson Darling statistic in a spreadsheet, integration is carried out using N rows of data, rather than across the probability range of P = zero to 1. As a result the area under the graph is not equal to one, but to the value N = 1 x N
10. The area under the graph show above is equal to N, the number of data elements, in the example shown N = 10,000. As the value of N is increased the value of P1, the smallest value of Pi, given by Pi = (2i 1)/2N where i = 1 is the elemental cumulative probability, tends towards zero. The equation for the ideal fit becomes:
11. The
a) all the data elements are evenly spaced, such that their cumulative probabilities follow the form: Pi = (2i 1)/2N
b) the probabilities fitted to the data points Xi, defined as Pf = Pi, and
c) given that the probabilities are evenly spaced, Pi = P (N i)
12. Hence the Anderson Darling elemental equation:
13. Reduces to:
14. Which when Pi = Pf = P(Nf) reduces to the equation shown above in paragraph 10.
Note that this equation could also be expressed in the form:
THE
15. The
a) 
b) 
c) 
d) 
e) 
f) 
g) 
h) 
i) 
Rank 
Data 
Pi 
Pf 
Ln(Pf) 
Ln(1Pf) 
InvRank 
Ln(Pnf) 
ADi 
1 
18,937 
0.00005 
0.0000001 
16.40826 
0.000000 
10000 
9.990734 
0.002640 
2 
20,549 
0.00015 
0.0000086 
11.65808 
0.000009 
9999 
9.851263 
0.006453 
3 
21,946 
0.00025 
0.0000381 
10.17494 
0.000038 
9998 
9.430134 
0.009803 
4 
22,060 
0.00035 
0.0000418 
10.08291 
0.000042 
9997 
8.662173 
0.013122 
5 
22,183 
0.00045 
0.0000460 
9.98684 
0.000046 
9996 
8.529218 
0.016664 
6 
23,155 
0.00055 
0.0000888 
9.32937 
0.000089 
9995 
8.008338 
0.019071 
7 
23,559 
0.00065 
0.0001121 
9.09646 
0.000112 
9994 
7.807460 
0.021975 
" 
" 
" 
" 
" 
" 
" 
" 
" 
" 
" 
" 
" 
" 
" 
" 
" 
" 
9992 
523,568 
0.99915 
0.9993558 
0.00064 
7.347528 
9 
0.000228 
0.001744 
9993 
524,362 
0.99925 
0.9993738 
0.00063 
7.375826 
8 
0.000202 
0.001656 
9994 
536,107 
0.99935 
0.9995933 
0.00041 
7.807460 
7 
0.000112 
0.001037 
9995 
541,350 
0.99945 
0.9996673 
0.00033 
8.008338 
6 
0.000089 
0.000843 
9996 
554,329 
0.99955 
0.9998024 
0.00020 
8.529218 
5 
0.000046 
0.000487 
9997 
557,505 
0.99965 
0.9998270 
0.00017 
8.662173 
4 
0.000042 
0.000429 
9998 
574,841 
0.99975 
0.9999197 
0.00008 
9.430134 
3 
0.000038 
0.000237 
9999 
583,662 
0.99985 
0.9999473 
0.00005 
9.851263 
2 
0.000009 
0.000123 
10000 
586,484 
0.99995 
0.9999542 
0.00005 
9.990734 
1 
0.000000 
0.000092 















Total ADi 
10,000.205288  






Less N =
10,000 
10,000.000000  















AD
statistic 
0.205288 
16. The columns in the table above are defined as follows:
a) Column a) shows a rank number for each element of data. This column is used to provide a row reference later in the process.
b) Column b) shows the values of the dataset which is to be tested. This data must be ranked in ascending order.
c) Column c) shows the value of Pi, which is a rank probability, calculated by the formula (2i 1)/2N, where i is the rank number and N the total number of values in the dataset. For example the value in the third row is calculated (2 x 3 1)/2 x 10,000 = (5/20,000) = 0.00025
d) Column d) shows the probability value Pf corresponding to the elemental value of the dataset. In the example the Excel function =BETADIST( data, Alpha, Beta, Minimum, Maximum) is used to calculate the value of Pf which would correspond to the data value under the distribution assumed. In this case the assumption that the data set corresponds to a Beta Distribution with parameters Alpha = 2.78692 Beta = 9.14524 Minimum = 18,581 Maximum = 760,332. Using these parameters in combination with a data value of 23,559 (column c) provides a Beta Cumulative Probability value of 0.0001121 which appears as the seventh item in column d).
e) Column e) shows the natural logarithm of Pf. Referring to the seventh item Ln(0.0001121) = 9.09646
f) Column f) shows the natural logarithm of (1  Pf) . Referring to the seventh item Ln( 1 0.0001121) =  0.00011
g) Column g) shows a reverse ranking: the Rank sequence shown in column a) sorted descending. This column is used as a reference.
h) Column h) uses a =VLOOKUP( ref, area, col, false) command, set to: = VLOOKUP( g, column a) to f) x N data elements, 6, false). The effect of this command is to select the value of Ln( 1 Pf) from the opposite end of the table. Referring to the seventh item in the table. The Inverse Rank appearing in column g) is equal to 9994, so the value of Ln(1 Pf) corresponding to the 9994th element in the table, of 7.807460 is selected.
i) The elemental value of ADi is calculated in column i). The equation = 2 Pi [Ln(Pf) + Ln(Pnf)] or referring to the columns in the table above this becomes =  2 x(c) x [(e) +( h)] duly the seventh item = 2 x 0.00065 x [  9.096460 +  7.807460] = 0.021975
The value of the Anderson Darling statistic is then found by summing the elemental values in column i) and subtracting the value of N, the number of data elements. The sum of the elements = 10,000.205288 from which is subtracted N = 10,000 to provide a statistic of AD = 0.205288
17. The
18. Wikipedia (April 2010) suggests an adjustment to the value of AD in respect of the size of the dataset N and provides a formula for use with a normal distribution, where AD* = AD(1 + (4/N) (25/N^{2})) and recommends that if AD* exceeds 0.751 then the hypothises of normality is rejected for a 5% level test; 0.632 for a 10% level 0.870 for a 2.5% level and 1.029 for a 1% level.
19. Annis (2009) and Romeu (20035) suggests an adjustment for the size of the dataset N, suitable for use with Normal and Lognormal distributions, where AD* = AD (1 + (0.75/N) + (2.25/N^{2})) with rejection of the hypothesis of normality if AD* is exceeds the values of: 0.631 at 10%, 0.752 at 5%, 0.873 at 2.5% and 1.035 at 1% levels of significance.
20. Annis (2009) and Romeu (20035) also suggests that for a Weibull and Gumbel distributions the value of AD* = AD(1 + 0.2/sqrt(N)) with rejection of the hypotheseis if the calculated value of AD* exceeds 0.637 at 10%, 0.757 at 5%, 0.877 at 2.5% and 1.038 at 1% levels of significance.
21. It is noted that the adjustments applied to the value
of AD proposed in paragraphs 18, 19 and 20 would increase the value of AD,
whereas the data derived in section 17 clearly shows that the value of AD
decreases as the size of the dataset N, is increased. The data used to compile
the graph in paragraph 17 was tested using both the
SENSITIVITY OF THE AD STATISTIC
22. The three graphs below show the resolution of the statistic to determine the bestfit to the data set as the value of Alpha is varied from 6.0 to 10.0 The range of variation is reduced as the graphs move from left to right. These graphs are based on a dataset with N = 100.
23. The way in which the Anderson Darling function provides such a discriminating result is not easy to see (meaning that I do not understand it). Apart from the issue of ranking, there appear to be three elements in the equation making up the Anderson Darling (Engineering Statistic Handbook) calculation, namely:
a) Pi where Pi = (2i1)/N
b) Ln (Pf)
c) Ln (1 Pf)
The distribution of each of these elements is shown graphically below.
24. As can be seen the form of Pi is linear, while the other two have a very similar form, indeed it would be difficult to distinguish any difference between these last two graphs visually (indeed the last two graphs are identical under the condition that Pf = Pi). A Graph showing the sum of the last two graphs is shown below.
25. The result of multiplying the sum of Ln(Pf) and Ln(1Pf) by Pi and by 2 yeilds the elemental Anderson Darling statistic ADi and this is shown graphically below. The area under this graph is equal to (AD N)
26. The two graphs below illustrate the difference between the values of ADi for a curve which matches the underlying dataset and test curves in which the value of Alpha are varied between 6.0 and 9.0 The first graph shows the elemental differences, while the second graph shows the cumulative differences for the series of tests. The underlying dataset was created from a Beta Distribution with parameters: Alpha = 8 Beta = 10 Min = 20 Max = 50
27. In the graphs above the value of Alpha takes on the values of 6, 7, 9 and 10. While the first graph illustrates that the distortion can occur in different forms and in different areas of the spectrum. It should be noted that is some areas of the spectrum (Alpha =6 and Alpha = 7, denoted by diff 6 and diff 7) the distortion is initially negative before going positive. The final result however, shown in the second graph is that the result is always positive, resulting in an increase in the value of the Anderson Darling statistic as the quality of the fitted distribution becomes more remote.
INDEPENDENT OF DISTRIBUTION BEING TESTED
28. An important feature of the Anderson Darling test is that it is independent of the distribution being tested. If the Cumulative Probabilities (given by Pi = (2i 1) / 2N Where i = the rank order and N = Total number of elements in the dataset) associated with a Ranked sample drawn at random are plotted against the ranked values (Xi), then the result is the familiar Cumulative Probability Distribution, or SCurve.
29. In the Anderson Darling Test the ranked values of the dataset (Xi) are used to determine a probability Pf. The cumulative probability Pf represents the cumulative probability that the particular sampled value of the dataset Xi that would be expected if the sample had been drawn from a population with the distribution that had been assumed. For example, In the segment of data illustrated in the table below the ninth element of data has a value of Xi = 25.03353. If a Beta distribution with parameters Alpha = 8, Beta = 10, Minimum = 20 and Maximum = 50 has been assumed, then using the Excel Beta function it can be determined that the expected cumulative probability is Pf = 0.003622.
30. If the values of Pf are plotted against Pi then if the assumed distribution fits the dataset, then the result will be a straight line.
31. The graph below assumes that the underlying distribution had parameters Alpha = 12, Beta = 10, Minimum = 20 and maximum = 50. The nonlinear form of the graph clearly illustrates that these assumed parameters provide a very poor fit of the sampled data.
32. The
SIGNIFICANCE
33. The two segments of data in the table below compare the values produced by Crystal Ball when using the same sequence of random numbers. In the first segment a Beta Distribution, with parameters Alpha = 8, Beta = 10, Minimum = 20 and Maximum 50, has been used to produce the values of Xi and to determine the values of Pf. In the second table a Normal Distribution, with parameters Mean = 35 and Standard Deviation = 5 has been used both to develop the values of Xi and compute the values of Pf. As can be seen the values of Pf in each table segment are the same. In fact if the Anderson Darling statistic is calculated for each table then it is found that a value of AD = 1.3527 is applicable to each; thereby demonstrating that the test is not only independent of the underlying distribution, but that the value of the statistic is dependent upon the sequence of random numbers selected.
34. The question being asked when considering significance is: If one were to extract a random sample from a larger dataset with a known distribution, and one were to then test this random sample against the known distribution, what would be the range of Anderson Darling statistic that might be expected? The graph below sets out the results of 200 trails (100 using the Beta Distribution and 100 using the Normal distribution as defined above) and plots the range of AD statistic that was encountered.
35. A trendline has been added to this data in the graph below. Although this trendline is crude, it could provide a rough estimation of the significance level encountered.
36. While the graph above provides the basis for significance test, it should be noted that the individual values are a function of:
a) the sample size, and
b) the sample values selected from the general population and placed in the dataset.
37. When using
38. As referred to in paragraph 20. both Annis (2009) and Romeu (20035) have suggested that different significance curves should be applied to different distributions. This assertion is disputed by the author, who believes that the results of the test are independent of the underlying distribution, but are dependent upon the random sequence used to select the data sample. As proof to this assertion the distributions shown below were tested using Oracle Crystal Ball employing a Latin Hypercube selection with a fixed starting seed of 999 and using a sample size of N = 2000. The Same result of AD = 0.002662093 was found to apply in each case, thereby verifying that the shape of the significance curve cannot be affected by the particular underlying distribution, but is affected by the random selection sequence. Note that the Maximum Extreme Distribution is the same as the Gumbel Distribution.
CONCLUSION
39. It is concluded that the Anderson Darling Test provides a powerful method of assessing the goodness of fit of a distribution to a data sample, and the results of the test is independent of the underlying or assumed distribution. Significance levels are not affected by the distribution, but are affected by the size of the data sample and the method employed to produce the random dataset.
REFERENCES
Anderson, T. W.; Darling, D. A. (1952). Asymptotic theory
of certain goodnessoffit criteria based on stochastic processes. Annals of
Mathematical Statistics 23: 193212.
Annis C., ref:
http://www.statisticalengineering.com/goodness.htm
Engineering Statistics Handbook. Ref http://itl.nist.gov/div898/handbook/eda/section3/eda35e.htm
Romeu J. L.
AndersonDarling: A Goodness of Fit Test for Small Samples Assumptions, START
Selected Topics in Assurance Related Technologies, Vol 10, Number 5 (20035), ref:
http://rac.alionscience.com
Stephens, M. A. (1974). EDF Statistics for Goodness of Fit
and Some Comparisons. Journal of the American Statistical Association 69:
730737
Wikipedia ( April 2010) ref:
http://en.wikipedia.org/wiki/Anderson%E2%80%93Darling_test