Grafik from Isaac Smith –

Linear Regression: Identifying Simple Correlations in Data

In the Big Data business, (co-)rellations are often useful to make decisions. Here is one easy method to find such information: Linear Regression.

Henrik Bartsch

Henrik Bartsch

Classification of linear regression

In linear regression, we are in what is called supervised learning, a subtype of machine learning. This type of machine learning involves using known input and output data to find correlations and generate a model from them.

Applications of linear regression

Linear regression is a simple tool to establish simple and understandable correlations in many data-driven areas and to draw insights from them. The quality of the results is not necessarily in the foreground here, errors should be acceptable up to a certain point. In spite of everything, the linear regression achieves it with this, very well trends or other changes to let recognize.

Classical subjects in which linear regression is used are psychology and economics, when studies are to be evaluated. How companies could use linear regression, for example, is also briefly explained in this article.

How does linear regression work?

Linear regression, in its basic form, is based on trying to find linear expressions for the relationship between various metrics (the independent variables) and a target metric (called the dependent variable), where of all metrics must be present. Furthermore, linear regression is limited to one dependent variable, while the number of independent variables is basically unlimited. The number of dependent variables in this context is denoted by nn. There are also forms of linear regression that allow multiple dependent variables to be defined, but these will not be discussed here.

Example: We want to find out the dependencies of grade point averages of students. For this purpose we define this variable as dependent variable, while we define variables like school hours, homework effort (in hours) and parents’ income (in Euro) as dependent variables.

By definition, we use a linear equation that has n+1n + 1 parameters: nn parameters for the correlation of the dependent variable and one parameter for the “base value” of the independent variable.

y:M1××MnM,y=a+b1x1++bnxn.y: M_1 \times \cdots \times M_n \rightarrow M, y = a + b_1 x_1 + \cdots + b_n x_n.

In dieser Gleichung bezeichnet yy die abhängige Variable und xi,i=1,...,nx_i, i=1, ..., n die abhängigen Variablen, MiM_i die Definitionsmengen der unabhängigen Variablen (alle möglichen Werte) und MM die Definitionsmenge der abhängigen Variable (klassischerweise die reellen Zahlen). Durch Lösung der Parameter aa, bib_i erhalten wir letztlich unser Modell. 1 2

Solve the parameters for linear regression

Now, to obtain the parameters for the linear regression as well, we use the minimum squared distance (also known as least-squares estimation). For this we want to minimize the sum of the squared distance between our “exact solution” (here: yy) and our NN data points (yiy_i):

J:=i=1Nyyi2. J := \sum_{i=1}^N |y - y_i|^2.

The function JJ here is called the cost function. This function describes how well our model has fitted to our data - lower values represent a better solution. However, this is only done by an “ideal” fit of the parameters.

Simple independent variable

For a single independent variable, we have a model of the form

y=a+b1x1.y = a + b_1 x_1 .

Here, by least-squares estimation and for

yˉ:=1Ni=1Nyi,xˉ:=1Ni=1Nxi \bar{y} := \frac{1}{N} \sum_{i=1}^N y_i, \bar{x} := \frac{1}{N} \sum_{i=1}^N x_i

we can compute the solution

a=yˉb1xˉ,b1=i=1N(xixˉ)(yiyˉ)i=1N(xixˉ)2. a = \bar{y} - b_1 \bar{x}, b_1 = \frac{\sum_{i=1}^N (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^N (x_i - \bar{x})^2}.

This can be used directly to calculate correlations or trends.

Multiple independent variables

When there are multiple independent variables, the solution can no longer be calculated algebraically directly. In this case, we need a different cost function:

J(θ):=12Ni=1N(hθ(xi)yi)2J(\theta) := \frac{1}{2N} \sum_{i=1}^N (h_\theta(x_i) - y_i)^2

with the parameter vectors θ\theta as the vector of all parameters (including aa). The function hθ(x)h_\theta(x) then represents only the linear combination of all parameters with the respective features of the input vector.

For the solution here the so-called gradient method must be used. This procedure is an iterative solver which subjects our parameters to continuous updates so that the cost function is gradually minimized. Basically, our method will converge after a while, but there is no guarantee. Changing the initial parameters is usually taught here as a classical approach to solving. 3

An application example

Linear regression can be applied to a wide variety of data.

The data from the above image was randomly generated for simplicity. Applying linear regression to the corresponding data gives the following picture:

Conclusions can be drawn from the calculated mathematical model and the image.

Referring to the example with school grades used earlier, we could identify relatively clearly from the slope of the curve that school grades with correspondingly higher time expenditure bring better results. It should be noted that the data above are not related to this topic and this is only an explanation of the procedure.


An implementation of a linear regression is basically not complex with different languages. Here is a listing for different programming languages: