What are variance statistics

Is R-Squared Really an Invalid Metric for Nonlinear Models?

I've read that the R-square is invalid for nonlinear models because the relationship that SSR + SSE = SSTotal no longer holds. Can someone explain why that is?

SSR and SSE are just the quadratic norms of the regression and residual vectors whose non-h components are (Y.ich ^ - -Y.¯) and (Y.ich- -Y.ich ^), respectively. As long as these vectors are orthogonal to one another, shouldn't the above relationship always hold, regardless of the type of function used to map predictor values ​​to fitted values?

Also, the regression and associated residual vectors should not be any of the least squares Model to be orthogonal by least squares definition? The remainder vector is the difference between the vector (Y.ich- -Y.ich¯) and the regression vector. If the regression vector is such that the remainder / difference vector is not orthogonal to it, the regression vector can be multiplied by a constant so that it is now orthogonal to the remainder / difference vector. This should also reduce the norm of the residual / difference vector.

If I explained it badly, please tell me and I'll try to clear it up.


The sums of squares in linear regression are special cases of the more general ones Deviation values in the generalized linear model. In the more general model, there is a response distribution with a mean associated with a linear function of the explanatory variable (with an intercept term). The three variance statistics in a GLM are defined as follows:

Null DevianceExplained DevianceResidual Deviance † DTOT = 2 (ℓ ^ S − ℓ ^ 0), DREG = 2 (ℓ ^ p − ℓ ^ 0), DRES = 2 (ℓ ^ S − ℓ ^ p).

In these expressions the value ℓ ^ S is the maximized log probability under a saturated model (one parameter per data point), ℓ ^ 0 is the maximized log probability under a null model (intercept only), and ℓ ^ p is the maximized log Probability under the model (intercept term and p coefficients).

These variance statistics play a role analogous to scaled versions of the sums of squares in linear regression. It is easy to see that they satisfy the decomposition DTOT = DREG + DRES, which is analogous to the decomposition of the sums of squares in linear regression. In fact, if you have a normal response distribution with a linear link function, you get a linear regression model and the variance statistic reduces to the following:

DTOT = 1σ2∑i = 1n (yi − y¯) 2 = 1σ2⋅SSTOT, DREG = 1σ2∑i = 1n (y ^ i − y¯) 2 = 1σ2⋅SSREG, DRES = 1σ2∑i = 1n (yi−