ML in logistics: How can CNNs make automation in bin picking more efficient?

00 Blog Gouda Bin Picking - Lamarr Institute for Machine Learning (ML) and Artificial Intelligence (AI)

This article focuses on how Convolutional Neural Networks (CNNs) can be implemented more rapidly in industrial automation by eliminating individual customization processes, allowing near-instant use. We demonstrate this through our main application, Bin Picking, showing how CNNs can replace traditional algorithms like clustering and 3D matching. Our core concept of class-agnostic segmentation can be applied not only to bin picking but also to robotic manipulations in industrial applications. The improvements in image processing systems for bin picking can lead not only to automating the task itself but also to improving material flow efficiency between and within warehouses and factories.

The introduction of new technology in industrial automation mainly depends on two factors: its integration capability and its reliability. CNN-based object segmentation is a relatively new technology in industrial automation. Its integration would allow multiple tasks that are currently done manually to be automated. However, to use CNNs in any application, they require a long customization process, leading to poor initial integrability and high effort. Additionally, new methods always face skepticism in the industry regarding how reliable and reproducible the final implementation will be. As a result, CNN-based object segmentation is well-developed in research but has seen limited use in automation.

This gap between CNN capabilities in research and actual use in automation applications arises because researchers overlook a key piece of information that can be used in real-world applications: prior knowledge of the objects or object types being handled. This prior knowledge is simply the database of the factory or warehouse, which contains lists of all stored objects and their locations.

In research, object segmentation typically deals with a limited number of classes. In logistics and industry, however, hundreds or even thousands of object types often need to be segmented—sometimes within the same facility. Moreover, the objects frequently change, as new ones are added and others are removed. A CNN trained on a specific, limited number of objects would not be useful in the industry, as it would need to be retrained on the changing target objects, requiring significant effort. As mentioned earlier, many facilities would only be interested in a solution that is universally applicable and covers a wide range of objects, rather than a model with limitations.

Class-agnostic segmentation pipeline

The main idea behind our proposed pipeline is based on class-agnostic segmentation, addressing the challenges of CNN use in automation mentioned earlier. This approach breaks down the segmentation process into the following three steps:

Class-independent segmentation: A CNN segments all objects in the image into a single class, regardless of type or object class. This allows the same CNNs to be used for a broader range of objects, simplifying their integration into the industry.
Retrieval of candidate objects and images: A list of objects stored in the bin is retrieved from the warehouse database. For each object, six pre-stored images covering all sides are retrieved.
Matching step: The object masks are matched with the list of candidate objects. After this step, we can classify the segmentation masks of all objects.

pipeline ext 1 - Lamarr Institute for Machine Learning (ML) and Artificial Intelligence (AI) — © ML2R
The figure shows the architecture of our proposed pipeline.

Training our CNN with synthetic data

Our class-agnostic CNN is developed from Mask R-CNN. For training, we used the synthetically generated NVIDIA Failing Things Dataset (FAT). FAT contains a subset of 21 objects from the YCB Object Dataset. The training dataset includes 80,000 images where no other objects are present in the background. This absence of background objects is critical during training, as other objects could confuse the class-agnostic CNN, being recognized as false positives when they are correct positives, as they are not included in the annotations.

The following image shows the result of our segmentation when tested with the validation set: the network was able to segment unseen objects not included in the dataset. These include a frying pan, milk carton, bread, plates, and apples.

Bild3 - Lamarr Institute for Machine Learning (ML) and Artificial Intelligence (AI) — © ML2R

Inference on real data (first step of our pipeline)

But can training with synthetic data be transferred to real data? The next two images show an example from our DoPose dataset. It can be seen that, after quick fine-tuning, the data transfers well to real images. Notably, none of the detected objects were in the dataset. This shows how well the network was able to generalize.

Bild5 - Lamarr Institute for Machine Learning (ML) and Artificial Intelligence (AI) — © ML2R

Bild4 - Lamarr Institute for Machine Learning (ML) and Artificial Intelligence (AI) — © ML2R

Feature matching (third step of our pipeline)

The final part of our pipeline is feature matching. The image below shows the classic ORB feature correspondence between an image of the object stored in the database (left) and a scene image (right). The top 30 matches from both images all corresponded to the same object. This means that classic feature matching can easily be used to match segmented masks with candidate objects by using pre-stored images for the objects. For each object, six pre-stored images should be used, covering all six sides—the following image shows only one to simplify the visualization. This process is repeated for each segmented mask to classify it with the highest match from the candidate objects.

Bild6 - Lamarr Institute for Machine Learning (ML) and Artificial Intelligence (AI) — © ML2R

Conclusions

The results presented above show that our proposed pipeline can replace classic object recognition algorithms (e.g., ICP, 3D matching) still used in bin picking object recognition. Our pipeline can surpass these algorithms because CNNs can perform better in segmentation. Furthermore, our pipeline has shown that, despite its learning-based approach, it is general enough to replace these algorithms. This could allow future processing of a wider range of products with the same robotic cell. Additionally, this would enable the increased automation of other tasks, such as bins containing mixed objects, the automation of returns, and much more.

For more details on this topic, see our article “Object class-agnostic segmentation for practical CNN utilization in industry,” published in the IEEE ICMERR 2021 Conference Proceedings, Link.

Anas Gouda,

13. April 2022

Topics

Application