Palisade Knowledge Base

HomeTroubleshootingStatTools"Must have at least two more data points than independent variables."

13.8. "Must have at least two more data points than independent variables."

Applies to: StatTools 6.x/7.x

When I click OK in the Regression dialog, I get the message

Must have at least two more data points than independent variables.

That doesn't make sense to me. If I have m independent variables and m+1 points, why doesn't the regression just solve m+1 equations in m+1 unknowns to find the coefficients of the m variables plus the constant term?

Let's define
m = number of independent variables
p = number of data points

Yes, if p = m+1 you can directly solve for the constant term and the m coefficients of independent variables, but that's not really a regression analysis. If you just compute a direct solution in this case, you have a perfect fit for those m+1 points, but you don't know anything at all about how that equation would work with any other values of the independent variables.

StatTools regression is a statistical procedure to find the best set of m coefficients plus constant term, so that the resulting equation does the best possible job of predicting the dependent variable for all values of the independent variables. This works better with a large set of points — one rule of thumb is p ≥ 10·m. (Princeton University suggests p ≥ 20·m ideally, but no lower than 5·m.) The fewer data points you have, the larger the standard errors for the coefficients and the poorer your equation will do at predicting values of the dependent variable. Mathematically, the absolute minimum number of data points is p = 2+m, because the number of degrees of freedom in ANOVA is df = pm−1, and df must be greater than zero. (See the help topic "Regression Command" in StatTools for more about how StatTools does a regression.)

Last edited: 2016-07-29

This page was: Helpful | Not Helpful