Meanwhile, they encounter us everywhere: the devices of the Internet of Things (IoT) – cameras, drones, industrial plants, or the first precursors of autonomous vehicles. All these things have in common that they generate immense amounts of data, which is typically processed automatically and increasingly with methods of Machine Learning. In classical AI infrastructures, the data is collected centrally, for example, in a data center and made available for data-driven algorithms. The results of these algorithms are then transferred to the respective destination. In the case of industrial systems, sensor data can be used, for example, to detect anomalies in data patterns and inform the maintenance service in the event of identified or impending disruptions.
However, this becomes critical when data streams are very large or come from many devices, there are requirements for real-time analysis, or the transferred data is sensitive. Large data streams from one device or data streams from many devices cause very high bandwidths in network use. The necessary transfer to the central data center triggers harmful delays for real-time applications, and sensitive data simply prevents the possibility of central storage. Another challenge is environmental and social factors. Both the energy consumption of AI models and the control over centrally stored data and cost-effective access to AI technologies are increasingly coming into focus.
Edge Deployment – a possible solution?
A solution to many of these challenges is offered by the so-called Edge Deployment of AI solutions, whereby data processing and evaluation are provided decentrally at the edge of the network. In contrast to central deployment in the data center, the evaluation algorithm is implemented on one or more decentralized computers or microprocessors – typically where the data is also generated.
A typical AI application from image processing will serve as an illustration: object recognition. In the case of classic central deployment, the video streams of the cameras are centrally collected and passed through the object recognition algorithm for inference. The result is the confidence, box coordinates, and class names of the recognized objects. If there is a requirement for real-time processing, the video stream of a camera with the required resolution and frame rate (FPS) must be transmitted. This data stream is multiplied by the desired number of devices and quickly reaches the required transfer speeds of several megabytes or gigabytes per second (depending on the setup) that need to be transferred to the data center. Of course, costs will be incurred through this required bandwidth, which cannot be neglected.
A possible edge solution to this problem could be, for example, a computer directly connected to the camera on which the object recognition algorithm is executed. Only the result string is transmitted, which, in comparison to the video stream, requires only a few kilobytes per second for transmission. Since the required edge computer in this scenario only needs to evaluate and transmit the video stream, typical cost-effective microcomputers can be used, which can already be purchased for under 100 euros. This approach is particularly attractive when a large number of devices are needed to solve the technical task. However, it additionally requires precise coordination between the edge computer and the AI model.
Autonomous driving as a pioneer
A significant advance in the field of Edge Deployment has been contributed by the automotive industry with the exploration of autonomous vehicles. Here, the need for small, resource-efficient solutions is particularly high, which, on the one hand, can reliably process large data streams (for example, through camera recordings) in real-time and, on the other hand, must remain competitive in the mass market. The holistic nature of the required solutions becomes apparent, especially concerning tailor-made, resource-optimized hardware to specially developed network architectures for neural networks that favor deployment on resource-constrained hardware compared to established object detection algorithms. Especially in neural networks, which have strongly influenced the field of image processing in recent years, the number of model parameters is particularly critical. This means that a reduction in model parameters, along with the necessary operations, is desirable to maximize inference speed and thus the FPS rate. Unfortunately, many industries do not have the financial means of the automotive industry. Still, many of the developed solutions can be transferred to other areas with relatively little effort, especially for widespread data types such as images, text, and audio.
Decentralized training and data management
A new aspect of Edge Deployment is the expansion of model training options. A distinction must be made between central training and subsequent decentralized Edge Deployment, and completely decentralized training (federated training). Completely decentralized training stands out in particular due to the elimination of the condition that training data must be stored centrally and is an efficient way to train AI models safely, especially in the case of sensitive or very large data. Both distributed learning and model compression are usually iterative and require careful management of training runs. Here, it is advisable to draw on the insights from the field of Machine Learning Operations (MLOps) (Link in German) and to map all aspects from the initial architecture design to operational training and revision of AI models in an early stage. MLOps describes a mix of classical DevOps and Machine Learning approaches and pursues the systematic planning and implementation of such solutions.
Furthermore, monitoring and updating such implemented edge solutions are more challenging than central deployment in the data center and require careful planning in the design. Many of these challenges are already the subject of investigation in the newly emerged field of Machine Learning Operations (MLOps) (Link in German).
Hardware model coordination: On the way to the TinyML movement
As mentioned in the upper section, the performance of the edge computer and the type and size of the AI model must be coordinated. Therefore, it is not surprising that size, energy consumption, and hardware performance are strongly correlated. A particularly pronounced variant of Edge Deployment is the use (deployment) or embedding of Machine Learning solutions on very small platforms, such as microcontrollers. These have only a few milliamperes and kilobytes of memory available and can theoretically be operated with a solar cell or a small battery for weeks or even months. An independent movement has emerged from these extreme challenges, the TinyML movement, which deals with the solution of these problems. Here, model compression and highly optimized and reduced Machine Learning approaches play a significant role. With TinyML approaches, AI approaches become mass-produced goods. Intelligent clothing or components with integrated microprocessors are conceivable and have long since ceased to be a vision of the future.
In summary, one can say that Edge Deployment represents an attractive option to bring AI-based applications into production. The advantages include reduced data transfer, the possibility of real-time analysis on the device, secure handling of sensitive data, and the potential for decentralized data management and training. The disadvantages include more complex fleet and software management and limited computing resources of edge computers. Therefore, a precise coordination of the AI algorithm needs to be implemented and the selection of the edge computer must continue to be carried out. This becomes even more relevant the smaller and more economical the target device is intended to be. TinyML forms the lowest level, where deployment takes place on microcontrollers and embedded systems. AI models can be specifically designed small for this purpose, or existing models can be compressed and optimized. Almost all aspects mentioned require precise revision and design of the processes. The field of Machine Learning Operations (MLOps) can help to map these aspects in a very early phase. In the future, with increasing productization in the form of edge products, the MLOps field will become increasingly important both in companies and in research institutions to ensure the fundamental use of the model, as well as its monitoring and updating with newly available data.