Deep Learning-based Re-Identification in Logistics

The importance of Machine Learning (ML) technologies has greatly increased in the industry as part of digital transformation. The use of intelligent systems not only facilitates the work of employees but also optimizes many processes and material flows. However, to this day, significant areas remain non-automated, resulting in additional work associated with increased time and capacity costs.

Machine Learning can do many things, one of them being the classification and localization of images and objects (Object Detection) within the scope of Computer Vision procedures. A common use case of this function is the re-identification of specific items or objects. In this blog post, we will explain more about how this re-identification works.

Re-Identification

Re-Identification (Re-ID) is the process in which an individual, an object, or a pattern is identified across different images. Often, this process occurs across multiple non-overlapping cameras. A well-known example of re-identification is facial recognition. In this case, new images of faces (and extracted features) need to be compared to a database of stored faces (and features) to determine whether they can be matched to a previously known face or not.

The recognition of objects in images is very similar to the task of classification. However, in the context of re-identification, it involves not only objects of the same type but also the same individual objects. Additionally, the set of different entities is usually not delimited.

A similar use case is object tracking. It refers to the process of locating and following the movement of an object in a sequence of frames or images within a video. Re-identification can take place over arbitrary time periods and can support the tracking of objects.

Cars drive in a highway. — An example of Object Tracking: In the field of video analysis, object tracking is not only used to determine the position and class of an object but also maintains a unique ID for each recognized object throughout the course of the video.
© Lance Chang/unsplash.com & Fraunhofer IML

Methods of Re-Identification

An established Deep Learning-based method for re-identification is known as Siamese Networks. These are neural networks that incorporate two or more identical subnetworks, meaning the same layers, parameters and weights. They are used to highlight similarities between different inputs based on the calculated feature vectors.

The Siamese Network architecture takes two input images, employs identical subnetworks for each input and determines the similarity of the input images based on the calculated distance between the outputs.
© https://arxiv.org/abs/1707.02131

The result of the computation in Siamese Networks is a similarity score, indicating how likely it is that two inputs belong to the same entity. During the training of these networks, pairs of inputs can be presented, including positive pairs (for example, two different photos of the same person) and negative pairs (for example, two photos of different persons). The network’s task is to minimize the distance in the similarity score for positive pairs and to maximize the distance for negative pairs.

Another method to train Siamese Networks is the so-called “Triplet Loss.” In this approach, the network is presented with three inputs instead of two: a so-called anchor as a reference image, a positive example corresponding to the same entity and a negative example representing a different entity. In terms of the similarity score, the goal during training is to ensure that the positive example is closer to the anchor input than the negative example by a predetermined margin, defined as a hyperparameter. This helps the network learn more robust features for distinguishing between different entities with a clear delineation between them.

[Formular Triplet Loss: L=max{d(a,p)−d(a,n)+margin,0}]

Visualization of Triplet Loss using a positive pair (upper and middle face) and a negative pair (upper and lower face).
© Christian Buehner/unsplash.com & Fraunhofer IML

Classifiers as the Basis for Re-Identification

An alternative to directly learning differences is to initially train a classifier. In this approach, the IDs of the entities are used as class labels, allowing the model to implicitly learn the differences between the entities.

In the case of re-identification, however, the total number of IDs is often not known, even at the time of training. Therefore, the actual classification layer is discarded after training, and only the feature vector is extracted, which would have been used for classification. Using this feature vector, various inputs can now be compared to each other using similarity and distance measures (such as cosine similarity, Euclidean distance).

For this purpose, a kind of database is used consisting of feature vectors of already known entities, which are compared with the vectors of new inputs. The entity with the highest similarity is then assigned to the new input. However, a minimum threshold of similarity is considered. If this threshold is not met for any of the already known entities, it is considered a yet unknown entity that is then added to the database.

A network architecture specifically developed for this case is the so-called Part-based Convolutional Baseline (PCB). This architecture utilizes any Convolutional Network as the backbone network and a special pooling layer to divide the features into a predefined number of parts. The feature vectors of individual parts are then fed into separate fully connected layers for ID classification. Later in use, the feature vectors of individual parts can be reassembled and compared as a whole.

How a PCB works: The input image passes through the Convolutional Network and the special pooling layer. Each resulting column vector is fed into a respective classifier, which predicts the identity of the input images during the training.
© https://arxiv.org/abs/1711.09349

Automation in Intralogistics through Re-Identification

In logistics, it is often necessary to recognize load carriers, such as Euro pallets, in various process steps (e.g. receipt and dispatch of goods). Typically, label-based methods, such as barcodes, RFID tags, and similar technologies, are used for this purpose. These labels often need to be manually attached to the load carrier or the goods and packages on it, requiring additional processes. To further increase the level of automation in intralogistics, methods have been researched that explore how to recognize load carriers entirely without manual labeling.

To achieve this, the explained re-identification method, Part-based Convolutional Baseline, was tested using Euro pallets as an example. The procedure involves capturing camera images of pallets moving on a conveyor belt in transshipment warehouses. From these images, the connecting blocks of the pallets are extracted using object recognition, and the images of these blocks are then individually transformed into feature vectors, known as “fingerprints,” using a PCB network.

Example recordings of pallet blocks: The relevant visual features are captured from the recorded images of the pallet blocks. These features serve as the basis for calculating the fingerprints.
© https://ieeexplore.ieee.org/abstract/document/10068869

Based on the fingerprints of the individual blocks, entire pallets can now be recognized. The division of the pallet into individual blocks (six per pallet) ensures sufficient redundancy in the recognition process.

In the goods receipt, the fingerprints of the pallets are stored in a database and can be compared during recognition in goods dispatch. This automates the tracking of incoming and outgoing pallets in a transshipment warehouse.

Still, a current challenge in re-identification lies in tracking over longer periods. Euro pallets, especially, can change their appearance over time, for example, due to aging or damage.

Potentials of Re-Identification

In contrast to classification, re-identification allows the tracking of actual entities, not just types of products. This means that not only quantities and consumption of various production materials can be tracked but also actual flows of goods. Processes that require clear recognition of entities for traceability or security reasons can be carried out more efficiently and without manual efforts. This type of tracking could not be possible through only classification and object recognition, since all possible entities would need to be known in advance.

Additional use cases that could be addressed by this Machine Learning approach include improving shipment tracking by eliminating physical IDs, optimizing material flows, and implementing various legal requirements for the traceability of products or production materials. The traceability of product combinations instead of individual entities is also a possible application.

Curious? Then find out more in these papers:

PCB ref: Beyond Part Models: Person Retrieval with Refined Part Pooling: https://arxiv.org/abs/1711.09349

Deep Learning Based Re-Identification of Wooden Euro-pallets: https://ieeexplore.ieee.org/abstract/document/10068869

Towards Re-Identification for Warehousing Entities – A Work-in-Progress Study: https://ieeexplore.ieee.org/abstract/document/9613250

https://www.silicon-economy.com/one-in-500-million-algorithm-identifies-pallets-by-its-grain/

Datasets:

https://zenodo.org/records/8125376

https://zenodo.org/records/6358607

https://zenodo.org/records/6353714

Christian Pionzewski, Antonia Ponikarov,

17. January 2024