For many companies, Machine Learning (ML) solutions are still largely confined to research and development and are rarely applied within the company. In practice, Data Science is primarily implemented nowadays in projects for potential analysis and feasibility studies with demonstrator character. However, since such projects are inherently time-limited and resource-constrained, their methods and approaches often fail to effectively deploy ML solutions as long-term and sustainable products.
However, in some companies, Machine Learning is already in use and contributes to value creation. These ML pioneers include not only technology giants and startups but also established companies from various industries. Thus, ML solutions that contribute to more resource-efficient production, facilitate logistics planning, or help in sales and marketing to better understand customer needs, are already in use.
So, the question is: How have these companies managed to make the step from a research-oriented niche existence into the structured daily business of a company? A holistic view of the entire lifecycle of ML solutions is particularly beneficial, as we will explore further below.
Transitioning from project to product is challenging
Conveniently, established software development processes are used for many ML solutions. However, due to their structure and development cycle, ML solutions pose requirements in many areas of development, quality assurance, and operations that cannot be fully covered by existing methods. For instance, the established approach CRISP-DM (Cross Industry Standard Process for Data Mining) for Data Science projects is primarily intended for model development and does not explicitly describe the construction and operation of ML software applications (for more information on the CRISP-DM model, refer to our blog post “Machine Learning in Practice”). Overall, there are currently few concepts for the smooth transition from model development to integrated usage.
On a technical level, appropriate testing procedures must be employed for quality assurance, for example, to evaluate the models themselves. Additionally, versioning of data, models, and associated artifacts with common tools is only feasible to a limited extent. However, the detailed traceability of the dependencies between training data, test data, and derived models is crucial for operational purposes.
Furthermore, on the organizational level, challenges arise regarding the skills and tasks of employees in companies. Who monitors the training and testing procedures of productive ML solutions? How are the ML solutions operated and monitored? Is ML expertise necessary here? How is the interaction between Data Scientists, IT operations, and customers organized in general? Many of these questions pose fundamental requirements and challenges for the organization, which cannot be answered solely for an ML solution but require a more cross-cutting and comprehensive approach.
The challenges must be tackled holistically
As we have seen, the challenges are extensive and not localized to a single point; hence, a holistic perspective is crucial here:
To transition Machine Learning into application, it is essential, alongside purely technical solutions, to involve experts from the relevant departments in the concept and use-case development on the organizational level early on. Particularly, the advantages and disadvantages of ML solutions must be understood at all levels of expertise within the company, starting from development through to productive deployment. By fostering awareness among all employees early on regarding the value of ML technologies, as well as the specific requirements and challenges behind them, an important foundation for the implementation of ML in the company is laid.
From our perspective, it is crucial to prevent the emergence of solutions that are deeply integrated into extensive applications and can only be understood, maintained, and further developed by highly qualified experts. To date, banks, for example, face significant challenges due to the need to maintain or replace previously unsustainable integrated solutions without disrupting operational activities.
The MLOps process model can provide a structured remedy
In the field of software development, there already exist processes, procedures, and tools aimed at avoiding mistakes made in the past during the introduction of new software technologies. These are primarily agile project organizations. The state-of-the-art model “DevOps” (Development Operations) entails a strong integration of customer-side, development, and IT operations.
At Fraunhofer IAIS, we work with the process schema “MLOps” (Machine Learning Operations), an extension and adaptation of DevOps. MLOps integrates elements from DevOps and CRISP-DM and expands them with ML-specific characteristics. This provides guidelines for targeted development, rapid integration, and secure operation of ML technologies. CRISP-DM primarily provides content for the analysis and exploration phase, as repeated work is done here in business understanding, data understanding, and model development. DevOps provides the fundamental framework and understanding for developing software projects using agile methods and strong involvement of the customer and IT operations.
The focus of MLOps lies in a technically highly automated approach that provides ML solutions sustainably and reliably. Additionally, current virtualization technologies, such as Docker containers, can be utilized to develop a permeable architecture that supports the transition of ML solutions from development to operational deployment.
The integration, scaling, and monitoring of ML solutions for various infrastructures, from individual workstations to cloud implementations, are supported by a variety of platforms. There are tools and methods that facilitate the implementation of MLOps processes. Specifically, this means that each phase of the process is supported by specific tooling. For example, ML-specific tooling is used for exploration (such as Jupyter Notebooks and mlflow for experiment tracking).
During the transition from exploration to development, ML pipeline tools are utilized (for example, airflow for implementing automated ML pipelines), and typical procedures and tools of professional software development are applied (standard tooling for repositories and registries, namely git or docker registries).
Tools like gitlab-ci, bamboo, jenkins introduce Continuous Integration principles into the process and are used for testing, quality assurance, and automation. This encompasses both the software technical aspects and the ML-specific aspects by verifying the entire ML pipeline and executing overall system and integration tests.
The same tooling also supports the automation of deployments through Continuous Deployment and Delivery. The operation of ML solutions is then highly specific to the company and the solution. On one hand, scalable platforms like Kubernetes and OpenShift can be supported through containerization (for example, using docker), while on the other hand, the application can also function as an individual solution on dedicated infrastructure.
In addition to technology and tools, communication plays a central role in the successful implementation of ML solutions. MLOps is an agile process with strong and frequent feedback between users and developers within Data Science projects. SCRUM is often used as a project management method, given its focus on short, iterative cycles for developing ML solutions. The productive component also requires a clear identification and definition of responsibilities in terms of roles and processes. Employees are often deployed specifically for solution-specific tasks, or, depending on the size and scope of the ML solutions, cross-cutting tasks such as data management, testing/quality management, and release management. Overall, the tool-supported process ensures that new ML requirements and models can be quickly deployed into production through automated quality-assured procedures.
In summary, ML solutions are transitioning into operational use in many areas. There are various challenges on both technical and organizational levels. MLOps is a holistic process approach that addresses these challenges and implements them with appropriate methods and tools. Further and more detailed insights can also be obtained through our whitepaper (in German).