DevOps Pipeline for a Machine Learning Project

DevOps Pipeline for a Machine Learning Project

Machine learning is getting more and more popular in applications and software products, from accounting to hot dog recognition apps. When you add machine learning techniques to exciting projects, you need to be ready for a number of difficulties. The Statsbot team asked Boris Tvaroska to tell us how to prepare a DevOps pipeline for an ML based project.

There is no shortage in tutorials and beginner training for data science. Most of them focus on “report” data science. A one-time activity is needed to dig into a data set, clean it, process it, optimize hyperparameters, call .fit() method, and present the results. On the other hand, SaaS or mobile apps are never finished, always changing and upgrading complex sets of algorithms, data, and configuration.

There are two different types of applications:

Machine learning often expands functionality of existing applications — recommendations on a web shop, utterances classification in a chat bot, etc. It means it will be part of the bigger cycle of adding new features, fixing bugs, or other reasons for frequent changes in overall code.

One of my last projects was to build a productivity bot — help knowledge workers in the large company to find and execute the right process. It has a lot of software engineering — integration, monitoring, forwarding the chat to the person, several machine learning components, etc. The main one was intent recognition where we decided to use our own instead of online services (such as wit.ai, luis.ai, api.ai) because we were using a person’s role, location, and more as additional features. As any agile project, we were aiming to get new functionality on a fast and regular basis.

The standard application that manages this fast-moving environment is managed through the pipeline of version control, test, build, and deployment tools (CI/CD cycle). In the best case, it means a fully automated cycle; from a software engineer submitting the code into central version control (for example github.com), through building and testing till deployment to the production environment.

Adding machine learning into this life cycle brings new challenges and changes in a DevOps pipeline.

Traditional unit and integrations testing run on a small set of inputs and expect to produce stable results. The test will either pass or fail. In machine learning, part of the application has statistical results — some of the results will be as expected, some not.

We have started with 80% accuracy and put the stretch target to move 1 point per week for 10 weeks. The important decision here is what can an end user still accept as help or improvement over application without ML. Our targets were too strict and we were “failing” too many builds, which the client wanted to see in production.

The second important area in ML testing is testing the data set. There is only one simple rule — more is better.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

What you should know about AI

4 Aug, 2017

Artificial intelligence seems to be nearly everywhere these days, yet most people have little understanding of AI technology, its capabilities …

Read more

Strategic Placement for Big Data in Organizations

15 Sep, 2016

I tend to examine the different roles played by data. For instance, when I work on computer code, I often …

Read more

Lead your own data science projects with the 3 Ps

28 Sep, 2016

There’s a lot of literature on learning the technical aspect of data science: statistics, machine learning, data munging, big data. …

Read more

Recent Jobs

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

D365 Business Analyst

South Bend, IN, USA

22 Apr, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.