Palisade Knowledge Base Knowledge Books

HomeTechniques and Tips@RISK Distribution FittingBest Fit for Small Data Sets?

# 4.14. Best Fit for Small Data Sets?

Applies to: @RISK 6.x/7.x, Professional and Industrial Editions

When I do a fit on {1,2,3,4,5} as discrete data, @RISK prefers a RiskPoisson distribution, even though the RiskIntUniform is clearly a better fit. Why is that?

In @RISK 6.x, the default statistic for measuring goodness of fit is AIC (more specifically, AICc). For small data sets, the AIC calculation strongly prefers distributions with fewer parameters. (This is an application of the principle of parsimony.) The Poisson distribution and the geometric distribution (RiskGeomet) are both one-parameter distributions, but the uniform integer distribution (RiskIntUniform) is a two-parameter distribution. With a data set of only five points, the AIC statistic's preference for distributions with fewer parameters trumps the poorer likelihood functions computed for those distributions.

There are three countermeasures:

• For small data sets, consider changing Fit Ranking to BIC. Although BIC also favors distributions with fewer parameters, it doesn't favor them as strongly as AIC does. (Please see attached illustration.)

• Don't just take the first listed fit, but examine the fitted distributions. Your data probably won't show the kind of dramatic difference that we got from this artificial data set, but you may find that a fit that doesn't have the best statistic actually does a better job in a particular region of the graph that you care most about.

• Use more data points. @RISK does allow fitting to as few as five data points. But in general, the more points you have, the better the fitted distribution will match the true theoretical distribution that those points represent. Extending this made-up data set, with as few as nine points {1,2,3,4,5,6,7,8,9} @RISK computes the smallest AIC statistic for the integer uniform distribution.

Last edited: 2015-06-19