
In other blog posts from the series Machine Learning (ML) Basics, we have already seen the different types of Machine Learning and how optimization plays a key role in it. In this post, we will specifically focus on linear models, a fundamental model class that can be applied in supervised learning. Linear models assume a linear relationship between the observed data and a dependent target value. Linear regression is probably the most commonly used method to fit a linear model to data, and it works quickly and fairly reliably on both large and small datasets.
In supervised learning, we assume that we have a set of training data for which we know the value of a target function $F$. For example, our data points $x$ could be the settings on the temperature regulator of our refrigerator, and the target function values $F(x)$ could correspond to temperature measurements inside the fridge.

Our training data: Temperature measurements $F(x)$ in degrees Celsius for different controller settings $x$ of a fridge.
In our example, the training data consists of real numbers that we have plotted along the $x$-axis. The target function values that we know for the training data are plotted along the $y$-axis. Now, we want to “learn” $F$ so that we can predict the value of the target function for new data points that are not part of our training set. In our example, we might want to estimate how cold the refrigerator will get if we set the regulator to ten, for instance.
A Linear Relationship Between the Data and the Target Function Values
When we look at the data in the example above, we can observe that there is likely a linear trend: The higher we set the regulator on our refrigerator, the lower the temperature inside the fridge drops. Additionally, it seems that we can find a straight line that roughly describes this relationship—for example, the following:

An intuitively meaningful function $f(x)$ that describes the relationship between the controller settings $x$ and the measured temperature $F(x)$.
We now want to try to find such a straight line that best describes our training examples. A line like the one in our example can be described by a linear equation:
$$ f(x) = a \cdot x + b $$
for real numbers $a$ and $b$. Here, $a$ represents the slope of the line, and $b$ is the $y$-intercept. To predict the target function value for an unknown data point $x′$, we can simply substitute $x′$ into our linear equation and compute $f(x’)$, assuming we know $a$ and $b$. This gives us a prediction $f(x’)$ that hopefully approximates the unknown target function value $F(x’)$ well.
Linear regression
We have now reduced our problem of predicting the target function value for an unknown data point $x’$ to determining the values of $a$ and $b$. Linear regression is a method that does exactly that. It applies the least squares method to achieve this. The goal is to find, among the infinitely many possible lines that can be defined by the linear equation, the one that intuitively best fits our data. The least squares method (LSQ) minimizes an error function $E_{LSQ}$, which describes the squared distance of all our training points from the line:
$$ E_{LSQ}(a, b) = \sum_{\substack{x \text{ in amount of training}}} \left( a \cdot x + b\; – F(x) \right)^2 $$
We have represented our line using its two parameters $a$ and $b$ and now seek the line (or, more precisely, the values $a$ and $b$) that minimizes $E_{LSQ}$. Next, we will see how we can accomplish this using the Python programming language and the NumPy package.
Practical Implementation of Linear Regression
We assume that we have some form of a Python environment available with the NumPy package installed—for example, a locally running Jupyter Notebook or a notebook on google colab. Therefore, we start by importing NumPy and loading our data, which is stored in a text file:
import numpy as np
data = np.loadtxt('data.csv', delimiter=', ')
vecX = data[0]
vecY = data[1]
The code expects that the file data.csv contains the real-valued training data in the first row and the corresponding target function values in the second row. As a result, vecX now contains the training data, and vecY contains the target function values. Our error function $E_{LSQ}$ can now be expressed in matrix-vector notation as follows:
$$ E_{LSQ}(a, b) = \sum_{\substack{x \text{ in amount of training}}} \left( \left( a, b \right) \cdot \begin{pmatrix} x \\ 1 \end{pmatrix} – F(x) \right)^2 $$
This allows us to define the following matrix. It consists of a row containing our data points, followed by a row containing only ones.
matF = np.vstack(vecX, (np.ones_like(vecX)))
The least squares method, which computes a vector vecW = (a, b) and minimizes our error function $E_{LSQ}$ on our training data, is already implemented in the NumPy subpackage linalg and can be called as follows.
vecW = np.linalg.lstsq(matF.T, vecY, rcond=None)[0]
We can now perform linear regression using the least squares method. How the function $f$ and the data can be graphically represented can be found in the accompanying code file. Therefore, we will now only output the two computed line parameters:
print ('a =',vecW[0])
print ('b =',vecW[1])
This yields the following results:
$$ a = -0.399 \quad b = 10.099 $$
Thus, we have “learned” a linear model from our training data and can utilize this learned relationship in various ways. For example, we can display our discovered line or use the model to predict the temperature inside our refrigerator for a specific regulator setting. To make a prediction for any given data point, we can proceed with the coefficients as follows:
def predict(x, vecW):
return np.array([x,1]) @ vecW
print(predict(10, vecW))
As a result, for our case, we get
$$ f(10) = 6.105 $$
Here, @ is the operator that calculates the dot product between two vectors, which in this case corresponds exactly to our definition of the linear equation. We can now freely use our prediction or simply display the data point along with its target function value (in yellow).

The temperature of the refrigerator predicted by $f$ for the controller setting $x = 10$.
Outlook
We have learned about a simple Machine Learning method: linear regression using the least squares method. We have seen how to apply this method in Python to one-dimensional data to learn a line that accurately represents our data. We have also used this line to predict a target function value for an unknown data point.
For those interested in a more general formulation of this method, we recommend our ML2R Coding Nuggets series. In another blog post, we will explore how to implement a more robust training method that performs well even when our training data contains some outliers.