The texts in this article were partly generated by artificial intelligence and corrected and revised by us. The following services were used for the generation:
How we use machine learning to create our articlesClassification
Machine Learning has basically many application areas. Often these deal with classification, clustering or other problems. Regressions also play a role. In this article, a special type of regression is considered: logistic regression.
Definition of logistic regression
Logistic regression is a statistical model that acts to predict the occurrence of an event based on the linear combination of different variables. Classically, a binary dependent variable is represented by a set of independent variables. Here, the independent variables can be binary (values or only) or continuous (all numbers in the interval ). 1 2 3
In simple terms, linear combination is a summation, which also works over elements of higher dimensions.
Requirements for the variables
There are two relevant requirements that should be placed on the occurring variables in order for logistic regression to be successful: 4 5
- The dependent variable must be a binary variable,
- the independent variables should not be multi-collinear.
Multi-colinearity describes the concept that two or more stochastic variables are correlated with each other, i.e. can be linearly intercorrelated. Ver
Simplified, multi-colinearity means for us that no information can be gained by one or more variables, because we can also represent them by one or more other variables. Such a relationship is later represented in the logistic function.
Mathematical basis of the model
The basis of the statistical model is the logistic function, defined by
This definition contains the parameters , which must be calculated so that and the scale parameter . The function represents the probability with which the corresponding event occurs. Since this is a probability, the function is limited to , since an event cannot have a probability of occurrence . 4
Assumption to the problem: There is only one dependent variable.
The expression means graphically that the center of the curve is given by the logistic function.
By redefining the parameters, we obtain a classical linear function in the argument of the exponential function:
The new parameters arise here as follows:
The task of the mathematical model is now to find corresponding parameters so that the most accurate prediction possible can be made.
Training the model
Basically, the two parameters in the expression cannot be chosen arbitrarily; optimal parameters must be determined. However, due to the nonlinearity of the exponential function, it is not directly possible to determine a parameter by, for example, a simple mathematical operation such as averaging, as was the case with linear regression.
Classically, it is necessary to use iterative numerical methods like Gradient Descent. With this a loss function is minimized, which acts as a measure of the uncertainty of the statistical model on the training data.
The gradient method is based on the idea that a function is minimized exactly if one chooses a descent direction and follows this step by step. To do this, the gradient (multidimensional derivative) is calculated and a descent direction is obtained from the gradient information. 6
The loss function
In the example above, a cross entropy is usually used. For a data set with input variables , corresponding labels and a total of data points, the binary cross entropy is defined as follows: 7
Note: The factor does not change anything in the minimization, represents here only a normalization factor.
This function is then minimized to find optimal parameters. Such a procedure is also called Maximum Likelihood Estimation. 1
Decision boundary
For a classification, whether the event occurs or not, a decision boundary must be defined. This boundary is used to decide whether the result of should be set to or .
As an example, in a spam detection, the message can be assumed to be spam if . There are different types of such decision bounds, they have to be adapted according to the application and user requirement. 3
Advantages of logistic regression
The following properties speak for an implementation of logistic regression for some use cases: 5
- logistic regression is easier to implement than some other methods.
- logistic regression works well when the data set is linearly separable.
- logistic regression provides information about the importance of a particular independent variable. The importance is reflected in the magnitude of the corresponding coefficient. In addition, the sign indicates whether the independent and dependent variables are positively or negatively correlated.
Linear separability in this case means that when visualizing the data set, the classes can be separated by a line.
Note: Correlation does not always indicate a causal relationship, especially if the corresponding sample is very small!
Disadvantages of logistic regression
In addition to the advantages of logistic regression, there are also a number of disadvantages that come with its implementation: 5
- logistic regression cannot be used to predict a continuous variable, only a binary variable.
- the logistic variable assumes a linear relationship between the independent variables. This is a limitation because our environment is often non-linear, and while a linear function often approximates such relationships correctly, it also often introduces error - and thus cannot necessarily make perfect predictions.
- like all machine learning methods, a larger set of data points is necessary. The necessary amount depends on various parameters of the data set, such as the number of input variables.
Versions of logistic regression
In logistic regression it is possible to make many different classifications. These will be characterized in the following. 2 3
Binary logistic regression
Binary logistic regression is a dependent variable that can only take on the values and .
Multinomial logistic regression
A multinomial logistic regression involves several dependent variables that do not have a sequence.
Example: Preference of food types vegan, vegatarian or non-vegetarian.
Ordinal logistic regression
An ordinal logistic regression involves several dependent variables that are related to each other and have a sequence.
Example: Rating of a video from one to five stars.
Application examples of logistic regression
Logistic regression has many real world applications: 8 9
- medicine: understanding of causes when patients suffer from various diseases
- text editing: editing documents to transform them into a consistent format
- hotel bookings: Predicting which hotels are likely to be booked with a particular user.