DevOps Pipeline for a Machine Learning Project
- by 7wData
Machine learning is getting more and more popular in applications and software products, from accounting to hot dog recognition apps. When you add machine learning techniques to exciting projects, you need to be ready for a number of difficulties. The Statsbot team asked Boris Tvaroska to tell us how to prepare a DevOps pipeline for an ML based project.
There is no shortage in tutorials and beginner training for data science. Most of them focus on “report” data science. A one-time activity is needed to dig into a data set, clean it, process it, optimize hyperparameters, call .fit() method, and present the results. On the other hand, SaaS or mobile apps are never finished, always changing and upgrading complex sets of algorithms, data, and configuration.
There are two different types of applications:
Machine learning often expands functionality of existing applications — recommendations on a web shop, utterances classification in a chat bot, etc. It means it will be part of the bigger cycle of adding new features, fixing bugs, or other reasons for frequent changes in overall code.
One of my last projects was to build a productivity bot — help knowledge workers in the large company to find and execute the right process. It has a lot of software engineering — integration, monitoring, forwarding the chat to the person, several machine learning components, etc. The main one was intent recognition where we decided to use our own instead of online services (such as wit.ai, luis.ai, api.ai) because we were using a person’s role, location, and more as additional features. As any agile project, we were aiming to get new functionality on a fast and regular basis.
The standard application that manages this fast-moving environment is managed through the pipeline of version control, test, build, and deployment tools (CI/CD cycle). In the best case, it means a fully automated cycle; from a software engineer submitting the code into central version control (for example github.com), through building and testing till deployment to the production environment.
Adding machine learning into this life cycle brings new challenges and changes in a DevOps pipeline.
Traditional unit and integrations testing run on a small set of inputs and expect to produce stable results. The test will either pass or fail. In machine learning, part of the application has statistical results — some of the results will be as expected, some not.
We have started with 80% accuracy and put the stretch target to move 1 point per week for 10 weeks. The important decision here is what can an end user still accept as help or improvement over application without ML. Our targets were too strict and we were “failing” too many builds, which the client wanted to see in production.
The second important area in ML testing is testing the data set. There is only one simple rule — more is better.
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
From Text to Value: Pairing Text Analytics and Generative AI
21 May 2024
5 PM CET – 6 PM CET
Read MoreCategories
You Might Be Interested In
What you should know about AI
4 Aug, 2017Artificial intelligence seems to be nearly everywhere these days, yet most people have little understanding of AI technology, its capabilities …
Strategic Placement for Big Data in Organizations
15 Sep, 2016I tend to examine the different roles played by data. For instance, when I work on computer code, I often …
Lead your own data science projects with the 3 Ps
28 Sep, 2016There’s a lot of literature on learning the technical aspect of data science: statistics, machine learning, data munging, big data. …
Recent Jobs
Do You Want to Share Your Story?
Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.