Share to gain more social capita
Written by — Joonas Kivinen, ML ENGINEER
It's time to talk about MLOps. Why should you invest in data science development even without instant gratification? Well, maybe precisely for that reason.
Written by — Joonas Kivinen, ML ENGINEER
Share to gain more social capita
At the time of writing this, GenAI has been buzzing for a while increasing interest and opening new business opportunities in the data science domain. Now is a good time to step off the hype train for a minute and think about the practical challenges that many organizations still face.
Big-scale data science has pretty much become a lot like software development so it feels natural for many to use the same well-established techniques and principles in the data work. Understandably so. All organizations, and especially data departments, do not however have a history in that domain, and DevOps culture is not widely adopted in the industry. Many data science practitioners neither have an extensive software development background – the author of this text included. So let’s start with the basics real quick.
AWS defines DevOps as the combination of cultural philosophies, practices, and tools that increase an organization’s ability to deliver applications and services at high velocity. Sounds pretty easy to just implement in big-scale data work. Then again, though we are writing code and usually running the stuff in the same environments, data science has some unique aspects that make it a little different.
That's why there is MLOps, Machine Learning Operations.
The yet-another-Ops fatigue is real but the existence of the word MLOps is justified. You see, a narrow definition for MLOps could be rather technical and include the things that MLOps tools and services are designed to tackle: developing, deploying, and monitoring ML models. A more broad definition brings in DevOps, the developer’s experience, and all the challenges that are part of data science work on a daily basis. Personally, I prefer the latter definition: The path from an idea to production includes many steps and can require skills for example in data science, data engineering, cloud, data, DW, CI/CD, API, and backend development along with good team conventions and business knowledge. If we solely focus on MLOps tools, we can already ease the process a lot but things can go wrong if not built on solid foundations.
To put it bluntly, this blog aims to justify investments in data science development even when there is no instant gratification. Instead, there is long-term business value. Data science often is a domain where projects come and go and if there is no big-picture planning, maintaining becomes a burden – let alone scaling things. The larger the data science team and the number of projects or the scope of a typical project, the larger the returns. This blog will concentrate on the why side of things. The how will be a separate post in the near future so remember to follow our socials not to miss that out!
Sidenote: For the lack of a better word, data science is used here to cover ML, AI, data science, and so on. So everything that could fall for someone working with the title Data Scientist.
In other words, some of the distinctive aspects that characterize Machine Learning are:
An example of a typical data science workflow is where we try to validate and define a business case and if there is one, we probably start with something simple, explore, and iterate until someone decides the solution is a good enough MVP. Then someone probably (and hopefully) wants to use the end product so we put the code somewhere to run. Then, depending on the use case, we start working on a new project. Alternatively, if the use case at hand is a serious one, we keep iterating, exploring and trying to improve the model.
In the case of a customer-facing solution, we want to get some real feedback to see if our model is working or not. In other words, we need a feedback loop. If we have multiple models, we need the ability to do A/B testing to see which one is working the best. At some point, we might want to test something completely new like replacing our simple model with a fancy neural network. At all times, we still want to monitor not only the technical side but also the performance of our model. Depending on the problem, all this can take anything from weeks to months. It could also be that we leave our model to run for a while and then come back to it later if there is new data or a new algorithm available or something else needs to be changed.
Do any of these sound appealing:
Great, because that’s what we want to achieve with MLOps. It all pretty much comes down to the last bullet, though: If the data science team is happy and the developer experience is good, many of the other things are probably at least on a decent level.
Now let's think about what might happen if we do everything in the wild without any forward thinking. Let's assume that we start from scratch with a fixed-size data science team, an empty cloud account, an empty code base, and so on. As seen from the workflow above, there are quite many steps from an initial business idea to production. Those steps also tend to heavily repeat from model to model. The actual model code, the data science part, is usually just a small part of the end-to-end pipeline. Now let's say that some years have passed, we have done some projects that we’ve all started separately. Things have begun to accumulate and there’s a little different pipeline for every project. The codebase and the (cloud) environments are a mess and there are ten different deployment pipelines for ten models.
The harsh reality is that projects need to be maintained and that things do and will break every now and then. Do we want to end up in a situation where most of our time is spent on only keeping our current projects running? The tricky part is that things accumulate rather slowly and it's hard to point out the exact time that we have gone beyond the critical point. Even if our workflow and processes were perfect, some accumulation is always going to happen as time goes by. In the perfect world, the overall complexity, meaning everything that a data science team needs to maintain, would increase only by our model-specific code and other model-specific things. The opposite would be the previous scenario and a reinvented wheel of the whole process on top of that. Luckily the good thing is that many of the steps in the development workflow can be abstracted and automated. Even if we don't have anything accumulated yet, we want the process to be smooth so that if there is a potential business case, we don't have to spend weeks just to get started.
As hinted above, a lot comes down to the developer experience. If the workflow is smooth and the annoying things are easy, the overall end product is usually going to be in good shape. Data scientists are also a scarce resource; many can be very picky about their workplaces so why would they work for an organization where their work is pretty much just pain? Another good proxy is that if a project is outsourced, do we have any development guidelines to follow or is the job just to get something done and then get out? It would be nice if our own data science team could easily take over and do the possible updates in the future.
This blog gives some ideas on why it is important to think ahead when doing data science. The good thing is that in general, things tend to get easier as more and more aspects get abstracted and industry standards develop. All of the major cloud providers have tools and services for MLOps on top of other commercial and open-source solutions.
As we learned here, though, it is more than just installing some tools; it all starts with the fundamentals. This does not mean that you should stop everything and start building a platform to tackle all of the possible use cases. It is, however, good to have some type of long-term goals in mind and then build towards those step by step. At some point, you might even realize you have a platform in the making, or at least you’re not spending so much time with the annoying stuff but instead, with the things that matter. If there is an opportunity, moving fast is great but in the long run, sustainability should be the thing in mind.
Now we know the why but the big question of how still remains. More about that in part two so let us know if you wanna get a heads-up when that comes.