ML lifecycle – technical aspects
While you might think pushing into production is the final step of operationalizing ML models, it is really just the beginning of the story. Monitoring it – making sure that it is doing what it is supposed to do, when it is supposed to do it – is essential. A key consideration is model performance. You need to monitor this closely because, generally, the performance of your ML model degrades over time and you need to ensure you notice when the degradation is severe enough to require a model update or replacement. There are many possible reasons for this performance degradation in an ML model, but a main reason is data drift, i.e., your train data is not representing your test data anymore (that is, the information upon which your model relies no longer reflects the reality of the data upon which you are using it). The underlying reasons vary but range from a change in your clients’ behavior to a technical change in the schema of data you are collecting.
This might not cause a lot of extra work if you already have a Continuous Training (CT) process in place. In that case, using a triggering functionality, you can automatically train a new model and push it into production. However, CT process is usually only applicable to the simpler cases of drift where you only need to retrain the same ML model (with the same architecture) using fresh updated train data. For scenarios in which you have a new business question, there is a fundamental behavior change or new data becomes available, the architecture of your ML model must also be updating – leaving you with the manual update (actually redevelopment) by a data scientist. In other words, if you do not have a CT process or if a CT process cannot help you with model update, you need to ask your data scientists to rebuild (retrain) the model and deploy it to production.
ML lifecycle – resource and competence
As I pointed out a couple of times, different competencies are essential to the well-functioning MLOps process.
- Business experts who have domain knowledge can (1) assess the value the ML model brings to the business, (2) help to set the model acceptance and evaluation criteria during development, and (3) decide if the model is performing satisfactorily in production.
- Data scientists are essential for (1) translating business needs to a math problem, (2) conducting data exploration, and (3) building and measuring the performance of an ML model.
- Data engineers and DevOps professionals transfer data, build pipelines, and maintain the development and production environments.
This means for a finely tuned, efficient MLOps process you need a multi-skilled team of individuals who can respectfully and effectively communicate across perspectives.
Having highlighted challenges in the end-to-end MLOps methodology, let me recap. ML models are built on both data and code, making their life cycle management even more challenging than that of software. It means putting version control not only on the code and algorithm, but also on the data. Constantly changing data drives the need for a more dynamic life cycle and increases the importance of putting in place a proper monitoring procedure to decide when to retrain or replace the ML model.
Beyond the technical and process challenges, ML models require multiple people with different skills and perspectives to work with each other, which emphasizes the importance of having a good communication channel. This must happen in detail – the same word might have different meaning for them. As an example, performance might mean the accuracy of the model for a data scientist, the business value the model brings to the organization for a business expert, the speed of calculation to a software developer, and, for a data engineer, it might mean how fast the model or pipeline is.
In this blog post series, I tried to shed some light on the core principles of MLOps and highlight things to consider when you use, maintain, and derive real value from ML models. Of course, there are other complexities and considerations in life cycle management of ML models that I did not discuss in this post, e.g. governance and Responsible AI. I leave these each to their own, dedicated, blog post.
I would like to thank Diane Reynolds, Manne Fagerlind, and Julatouch Pipatanangura for their valuable comments and suggestions that led to improvement of this blog post series.