Coefficients can be variables

How do you interpret p-values ​​and coefficients in regression analyzes?

A regression analysis produces an equation that describes the statistical relationship between one or more predictor variables and the response variable. After you use Minitab Statistical Software to fit a regression model and confirm the fit by reviewing the residual plots, you are ready to interpret the results. In this post, I'll explain how the p-values ​​and coefficients are interpreted in the output of a linear regression analysis.

How do I interpret the p-values ​​in a linear regression analysis?

The p-value of each term tests the null hypothesis that the coefficient is zero (no effect). A low p-value (<0.05) indicates that the null hypothesis can be rejected. In other words, a predictor with a low p-value is likely to be a useful addition to the model because changes in the predictor value are related to changes in the response.

Conversely, a higher (insignificant) p-value indicates that changes in the predictor are not related to changes in the response.

In the output below, you can see that the north and south predictors are significant because both p-values ​​are 0.000. The p-value for the east (0.092), on the other hand, is larger than the common alpha level 0.05, which indicates that the term is not statistically significant.

The p-values ​​of the coefficients are usually used to determine which terms to keep in the regression model. In the model above, the east should be removed.

How do I interpret the regression coefficients for linear relationships?

Regression coefficients represent the mean change in the response when the predictor variable changes by one unit and the other predictors in the model remain constant. This statistical control that regression provides is important because it isolates the meaning of a variable from all other variables in the model.

The key to understanding the coefficients is to think of them as slopes. Therefore they are often referred to as slope coefficients. This is illustrated in the illustration of the fit line below, which is used to model their weight based on a person's height. This is the output in the Minitab Session window:

The fitted line plot shows the same regression results graphically.

The equation shows that the coefficient for the size in meters is 106.5 kg. The coefficient shows that for every additional meter in height, the weight can be expected to increase by an average of 106.5 kg.

The blue adjustment line represents this information graphically. When moving along the x-axis, which corresponds to a change in size of one meter, the adjustment line rises or falls by 106.5 kg. These body sizes come from girls between the ages of 11-14 and are between 1.3 m and 1.7 m. The relationship is only meaningful in this area, so 1-meter steps are not useful in this case.

If the fit line were horizontal (a slope coefficient of zero), the expected value for the weight would not change regardless of the position on the line. So a low p-value indicates that the slope is not zero. This in turn indicates that changes in the predictor variable are related to changes in the response variable.

I used a representation of the fit line here because it shows the values ​​very clearly. With such a representation, however, only the results of a simple regression can be represented, i.e. H. a predictor variable and the response variable. The concepts described here also apply to multiple linear regression, but this would require an additional spatial dimension for each additional predictor to represent the results. Unfortunately, this is beyond the capabilities of today's technology.

How do I interpret the regression coefficients for curved relationships and interaction terms?

In the example above, the size is a linear effect: the slope is constant, which indicates that the effect is also constant along the entire fit line. However, when a model contains polynomial or interaction terms, interpretation is not quite as intuitive.

As a reminder, polynomial terms model the curvature in the data, while interaction terms indicate that the effect of one predictor depends on the value of another.

The next example uses a data set in which the curvature must be modeled with a squared term. The output below shows that the p-values ​​for the linear and quadratic terms are significant.

The residual plots (not shown) indicate a good fit, so we can move on with the interpretation. But how are the coefficients interpreted? A representation of the fit line is of great help here.

You can see how the relationship between machine setting and energy consumption differs depending on which area of ​​the fit line is being viewed. If you z. For example, if you start with machine setting 12 and increase the setting by 1, expect a decrease in energy consumption. However, if you start at 25, an increase of 1 should result in higher energy consumption. And around the value 20, the energy consumption should not change at all.

A significant polynomial term can make the interpretation less intuitive, since the effect of changing the predictor is different depending on the value. Similarly, a significant interaction term shows that the effect of the predictor is different depending on the value of another predictor.

Take special care when interpreting regression models that contain such terms. You don't have to just look at the main effect (linear term) to understand the data! Unfortunately, the results of a multiple regression analysis cannot be interpreted using a plot of the fitted line. Expertise is particularly important here!

Particularly attentive readers will have noticed that I did not go into the interpretation of the constants. I will explain this in my next post.

Do not forget the following steps: