Machine Learning (ML) technologies already facilitate the work of employees in many industrial applications by providing intelligent support for various tasks. However, especially in logistics, there are still many processes where there is a need for automation or software-based support for employees. Often, traditional paper documents or handheld devices are used, which limit employees’ mobility, are inconvenient, and consume a lot of time and resources.
In this context, the project on the Intelligent Shelf was initiated as part of the Innovation Lab – Hybrid Services in Logistics at Fraunhofer IML, funded by the Federal Ministry of Education and Research. The Innovation Lab deals with issues related to designing efficient human-machine interactions in a connected economy, also known as the Hybrid Economy, under the themes of Industry 4.0 and the Internet of Things. Accordingly, the focus of the Intelligent Shelf is to efficiently implement research-developed concepts for human-machine communication into practice. The unique aspect of the Intelligent Shelf is the design of flexible software components that operate on small “smart devices,” which can be deployed directly on-site where they are needed. One potential application is as a picking assistant in inventory management. This way, employees no longer need to carry devices or documents with them but can access their orders directly in the warehouse and document missing or defective products on the system without any hassle.
Design of intelligent interfaces
The key points around which the Intelligent Shelf was designed are user-friendliness and energy efficiency. A flexible design was adopted, and individual interaction components were developed that can work together or be used individually. This applies to both the physical interaction modules, which can be freely placed on shelves in the warehouse, and the software. The attached modules operate independently of each other and can therefore be used simultaneously by different employees. The information flow is regulated via a control center.
The intelligent software components are found at the interface between the employee and the interaction module. Employees can interact with the Warehouse Management System (WMS) via an ML-based voice assistant and access their current picking orders through it. If problems arise or an order is completed, they can inform the WMS via voice commands. Another essential component of the Intelligent Shelf is person identification: it provides a security function as well as an important personalization component. Instead of a password, the login for the Intelligent Shelf is managed through visual identification in the form of facial recognition. This not only grants access to the system but also retrieves the orders stored for that user and adapts the voice assistant to their preferences.
Face detection with pre-trained networks
Initially, the module is in a sort of sleep mode to consume as little energy as possible during inactive times. Mainly, a presence detection system is powered. When it detects an approach to the module, the rest of the module is awakened, and it starts capturing an image from the area directly in front of it. The image is then forwarded to person identification. A predefined “dwell time” and a predetermined radius prevent images from being taken of people passing by randomly.
Persons are identified based on their individual facial features (e.g., general face shape, eye shape, glasses, etc.). This occurs in a nested, multi-step process: first, the snapshot of the employee goes through a model for face detection, then numerical feature vectors (embeddings) are generated and forwarded for actual face identification. Identification is handled as a classification problem, and a decision is made about the identity of the employee. Face detection is crucial to accurately frame the face to be identified. If there is another person in the background, they are filtered out at this stage.
In the implementation, several pre-trained neural networks were used to capture a wide diversity of faces and improve the system’s reliability. These include the Multi-task Cascaded Neural Network (MTCNN) for face detection and FaceNet for generating facial embeddings.
MTCNN consists of a series of interconnected Convolutional Neural Networks (CNNs) and detects potential faces with high accuracy by finding a set of basic characteristics in the input: the five key points (eyes, nose, mouth corners). It outputs bounding box coordinates for the faces. To generate feature vectors, FaceNet is used. Similar to MTCNN, FaceNet is specialized in encoding facial features and produces vectors that capture and describe the individual facial features of a person in detail. Trained on a dataset with several thousand people, it can encode key features and produce representative embeddings for people of any appearance.
Challenges of multi-class classification
One use case that strongly influenced the design of person identification is the login attempt by an unregistered user, i.e., a person “unknown” to the system. It should not happen that an unknown person is mistakenly assigned to one of the known persons and thus gains access. To cover and avoid this case, an ensemble (a combination of several classifiers) consisting of Support Vector Machines (SVMs) and a cosine distance algorithm was implemented for identification. Both models conduct independent classifications, which are then jointly processed. In the case of registered users, both should make the same decision; in the case of unregistered users, there should be a conflict.
SVMs are good at recognizing a known user but cannot reject unknown persons. This is where the cosine model comes in, calculating similarity scores for each registered user. Based on this, a well-founded decision can be made about a person’s familiarity.
For a registered user, after successful identification, the information of the current picking order is displayed on the smart device. The user can process the order and mark it as completed or report problems via the voice assistant or manual input through buttons.
Prototyping vs. industrial application
The aim of this work is to explore the technical possibilities to develop a solution for a current problem of an exemplary use case. It is not yet about a finished, industrial-grade product, but a so-called “proof-of-concept.” At the beginning of the project, certain processes, called user stories, were defined, which had to be realized.
Since the focus of such work is on technical feasibility, certain aspects are explicitly not considered. In this case, concepts for data protection or IT security were deliberately excluded.
For industrial use, these points would also need to be developed, and the prototype expanded to include customer requirements and quality assurance. Nevertheless, the Intelligent Shelf already offers promising approaches that are to be further developed in subsequent projects.
More information about interacting with the Intelligent Shelf can be found in the following video: