This post explains what lies behind the concept of classification, building on previous contributions in the series about Machine Learning (ML) basics. Classification is a modeling approach, a thinking tool, to represent problems in a specific way and thus abstract them. In the overview post on the various types of ML, we learn that all questions that have a fixed set of answers can be represented as classification problems. The concept of classification thus contributes to problem solving in a wide variety of domains: in image processing for autonomous systems, such as robots and in self-driving cars, to assist in diagnoses in medicine, for filtering data in physics, to analyze formal and informal texts, such as court judgments, job applications, product reviews, or social media posts, …
But how does this fundamental Machine Learning tool work? Learn more in the post below.
What is classification?
Classification is the second fundamental modeling approach in supervised learning alongside regression. Classification can be described as the assignment or grouping of observations into predefined categories. These categories are referred to as classes and observations formally as data points. The program or function that assigns data points to classes is called a classifier. The label of a data point contains information about the class to which the classifier has assigned the data point. While regression aimed to return one or more real numbers as output for a data point, the response of a classifier is limited to a relatively small set of discrete values, the classes.
A classification problem is like a multiple-choice test for the computer, where the answer options are always the same, and the questions only differ in the specific subject being considered: Object classifiers, for example, answer the question which object, out of a set of possible objects (answer options), can be seen on a particular given image (the specific subject of the question). As in multiple-choice tests, certain types of questions are distinguished in modeling classification problems.
Binary classification problems: In this type of problem, there are only two possible answers. Often, the problem is formulated as a yes-no question by explicitly asking if one of the answers is true.
Example: What is the weather like? {Good, Bad} → Is the weather good? {Yes, No}
Multi-class classification problems: In this case, there are more than two answer options, but only one answer is requested. This is the most common case of classification problems in practice.
Example: What kind of object is in the picture? {animal, vehicle, food, piece of furniture, …}
Multi-label classification problems: In this scenario, there are also more than two answer options, but they are not mutually exclusive. The classifier can assign multiple labels to each data point.
Example: Which diagnoses apply to a patient? {flu, broken bone, inflammation, poisoning, …}
Classification means recognizing patterns in data
As can be seen from the classes given in the examples above, the concepts that a “class” is supposed to represent are usually clearly definable or intuitively understandable to a human. After all, for us humans, it is obvious what distinguishes a vehicle from an animal, what a broken bone looks like, or whether the weather is good or bad. But how do we get the classifier to recognize and distinguish data within these categories?
Classification is one of the approaches in supervised learning. Before a classifier is ready for use, it must first go through a training phase. During this phase, it processes training data in the form of labeled data points. Usually, these are data points that have been labeled manually by a human (for more information on training models, see the article “How do machines learn?”). The task of the developers is to ensure that the training data are “good” examples. To do this, the training data set must contain as many different examples as possible for each class.
During training, the classifier learns a kind of indirect definition for each class through the examples. It learns which characteristics speak for a certain class and which do not. This is also called pattern recognition: data points of the same class have more patterns in common than data points of different classes.
As a result, the classifier does not necessarily learn a “complete” definition of a class, but only as much as is necessary to distinguish the classes from each other.
Feature engineering: defining the appearance of a data point
Now let’s take a look at what a “data point” exactly is. In feature engineering, developers define which information should be present in a data point and how (raw) data should be processed before it is presented to the classifier. Developers must ensure that the data contains all the information needed to solve the task. Thus, feature engineering is not only important in the development of classification systems, but in all areas of Machine Learning.
To solve classification tasks, each data point must contain enough information to uniquely assign it to a particular class. However, it should also not contain too much information, as this can have unintended side effects and make it difficult for the classifier to identify the information relevant for classification.
Classify shapes
To further explain the basic concept of classification, we now set ourselves the following task – based on the post about the various types of Machine Learning:
“Categorize the figures in the following figure according to their shape:”
First, we identify that there are three different shapes on the image. Let’s give the classes the names “circle”, “triangle” and “square”.
For the computer to understand the term “figure”, we need to describe the object it is supposed to categorize. In practice, there is usually already a data basis for this. Let’s say that in our case we are provided with the following features (properties) for each figure:
Figure = [X-coordinate, Y-coordinate, color, number of edges, number of corners].
The order in which the features are listed here is always the same for each figure.
Find representation
Would this list of features already be a good representation of a “shape” as a data point?
Let’s take a closer look: The features “X-coordinate” and “Y-coordinate” are not useful for the question, because there is no relation between position and shape of the figure. The feature “color” looks more promising: All triangles are green, all circles are orange, and all squares are turquoise. On the other hand, we can also imagine a green circle. We do not observe such a circle in the data, but our life experience tells us that shape and color are not necessarily related.
Remaining are the features “number of edges” and “number of corners”. With these features, we can uniquely assign all figures to their shape. For example, a square has both four edges and four corners. This avoids the problem we had with the “color” feature – no square will ever have more or fewer than four corners or edges. Thus, theoretically, we are done, but one thing still stands out: The number of edges and corners are equally suitable for classification, making the information redundant. In this case, it could happen that the classifier learns to recognize triangles based on the number of corners and squares based on the number of edges. This is not wrong and will always lead to the correct classification, but in practice one would choose to keep only one of several redundant features.
As a preprocessing of the raw data, we remove five of the six given features. Thus, our final representation of a shape as a data point is:
[number of edges]
Define the classification function
To complete the task, we now need to define the classification function that uses the representation we have developed. A possible function could be defined by the following three rules:
- assign the class “circle” to an object with one edge,
- assign the class “triangle” to an object with three edges, and
- assign the class “square” to an object that has neither one nor three edges.
The last rule clarifies the assumption that we do not assume to observe a figure with exactly two or more than four edges.
With this, we have now found a suitable representation for our data points “by hand” for our simple example and defined “by hand” a simple rule-based classifier.
Classification in practice
Of course, both feature engineering and finding a classification function are not done by hand in practice. Compared to our example, the data sets in practice are much larger and the feature relationships much more complex. Therefore, developers use statistical and visual analysis tools for feature engineering. After this analysis, developers must then decide which model class seems most appropriate to solve the task. They decide which type of classification function should now be subjected to the training phase mentioned above. Well-known examples of model classes are neural networks, support vector machines, Naive Bayes, decision trees or k-nearest neighbors.
In practice, each model class has different advantages and disadvantages depending on the application. For example, while neural networks are very good at doing much of the feature engineering themselves, their training phase often takes longer, and comparatively large amounts of training data are required.
Classification as a thinking tool in Machine Learning
In summary, we can conclude: Classification poses a problem as assigning data to different classes. A distinction is made between binary, multi-class, and multi-label problems. Classifiers comprehend the semantic concepts behind classes not directly via a definition, but indirectly by learning to recognize and distinguish patterns in the data. Feature engineering is about finding a representation of the data that allows the classifier to find and learn characteristic patterns in the data as easily as possible.
Classification is used in a variety of Machine Learning applications. These posts of our ML blog present exemplary algorithms that are suitable as classifiers or discuss real-world problems that are solved using classifiers:
- Convolutional Neural Networks in der Bildklassifikation (in German)
- Convolutional Neural Networks zur Gesichtserkennung (in German)
- Recurrent neural networks in the financial sector
- Neuronal networks in astronomy
- Neuronal networks for verdict text analysis
- Support Vector Machines to better assess disease progression
- Rekurrente Neuronale Netze zur Widerspruchserkennung in Aussagen (in German)