What is binary logistic regression

Logistic regression

You wonder what the logistic regression is and when do you use it? Then this post is the right place for you.

Would you like to clarify your questions even faster? Then check out our Video and find out everything there is to know about the logistic regression need to know.

Logistic regression explained in simple terms

Logistic regression is a form of Regression analysisthat you use to be a nominally scaled, categorical criterion to predict. This means that you always use logistic regression when the dependent variable is only a few few, equal values Has. An example of a categorical criterion would be the outcome of an entrance examination, in which one only has either "accepted" or "declined" can be.

If the criterion in the logistic regression has only two values, then one speaks of one binary logistic regression. If, on the other hand, the criterion has more than two categories, the method is referred to as multinomial logistic regression. In this article we will mainly focus on binary logistic regression with one predictor.

Logistic regression and probabilities

In contrast to linear regression do you not predict the concrete values ​​of the criterion in logistic regression? Instead, you guess how probably it is that a person falls into one category or the other of the criterion. For example, you could predict how likely it is that a person with an IQ of 112 will pass the entrance exam. You also use a for the prediction in logistic regression Regression equation. Do you translate this regression equation into Coordinate system, so you get the characteristic curve of logistic regression. You can use it to estimate how likely a characteristic value of the criterion is for a person with a certain predictor value and how well the model fits your data.

The Logistic regression function looks like:

Logistic regression versus linear regression

Let's take a closer look at how the logistic regression of the linear regression differs. In both linear and logistic regression, you use a predictor variable to predict a criterion variable. However, the two forms of regression analysis differ in the Type of your criterion.

In the linear regression do you use a continuous, interval-scaled criterion. An example of this would be the height. Body size has an infinite number of characteristics in an ascending order of rank, all of which are equidistant from one another. It looks different with the logistic regression from: Here you use a nominally scaled criterion. This criterion only has a few characteristics that do not have a natural sequence. An example would be that Favorite school subject one person. Here it is not automatically clear whether "Math" or "German" should be assigned the higher value, but both options are equivalent to.

Logistic regression prediction

You probably know that you are with the linear regression try that Values ​​of your criterion to estimate as accurately as possible. This means that you are trying to predict, for example, how tall a person is as precisely as possible. In the logistic regression is that a little different. Here you are not directly predicting the values ​​of the criterion. Instead, you estimate which of the two expressions of the criterion is how probably is. As a result of the regression equation, you don't get a criterion value, but one Probability for one of the two criterion values

In order to be able to include the two expressions of your categorical criterion in the regression analysis, you assign one of them to each value to (mostly 0 and 1). If a person is rejected at the entrance exam, for example, they have the criterion value and is she accepted the value . If you now carry out the logistic regression, you will always get a value for as the result , that is, how likely it is that a person was accepted with a certain value of the predictor.

In purely mathematical terms, you could also use a criterion with two expressions linear regression predict. However, the linear regression equation can also be used to predict values ​​that way below 0 or way above 1 or somewhere in between lie. This is not very conclusive in terms of content, after all, only either level 0 or level 1 can occur. Therefore it is more clever to use a logistic regression, because here it is not the expression itself, but yours Probability of occurrence predicted.

Regression equation

Logistic regression has one too Regression equation. On the one hand, this equation describes the regression graph, which you can draw in a coordinate system. On the other hand, you can use the regression equation Predictor values deploy. If you then calculate the regression equation, you get an estimate of how probably one of the two expressions of the criterion is.

To the different Regression parameters the regression equation will get the Maximum likelihood method applied. This method tries to find those parameters for which the Most likely occurrence of the available data is. The implementation of the Maximum Likelihood Method is comparatively complicated and is usually carried out with the help of a computer program.

Use the regression equation to estimate how it is likely that your criterion is 1 accepts. If you have assigned the values ​​“1” for accepted and “0” for rejected to the results of the entrance exams, then you use the regression equation to calculate the probability that a person will pass the entrance examination .  

The Regression equation of logistic regression looks like:

Interpretation of the logistic regression

The interpretation of the Regression coefficients is not quite as simple in logistic regression as it is in linear regression. First, however, you can see which one sign the regression coefficient Has. Is the coefficient positive, then the probability that the criterion assumes the value 1 increases the higher the value of the predictor is. On the other hand, is the regression coefficient negative, the probability decreases with increasing predictor values.

You can also use the so-called Odds ratios consider. A Odd looks at the ratio of the probability for one characteristic to the probability of the other characteristic. If you put different odds in a ratio in the next step, you can collect information about how much the probabilities change between the predictor values ​​under consideration.

You can also use a for logistic regression Coefficient of determination to calculate. The coefficient of determination of logistic regression is also called pseudo denotes and exists in two variants: On the one hand there is that Cox & Snell and on the other Nagelkerkes . It is best to always include both parameters.

Coefficient of determination

You can find out what the coefficient of determination is and how to calculate it in our video. Check it out right now!