12 Steps to Applied AI

12 Steps to Applied AI

For those who’ve been looking for a 12 step program to get rid of bad data habits, here’s a handy applied ML/AI project roadmap. Well, it should properly be 13 steps, so we’ll start counting at zero to make it work.

Check that you actually need ML/AI. Can you identify many small decisions you need help with? Has the non-ML/AI approach already been shown to be worthless? Do you have data to learn from? Do you have access to hardware? If not, don’t pass GO.

Pro tip: besides looking like a bunch of rampaging amateurs, leaders who try to shoveAIapproaches where they don’t belong usually end up with solutions which are too costly to maintain in production. Instead, find a good problem to solve and may the best solution win.If you can do it without AI, so much the better— it’ll be probably be cheaper to maintain.ML/AIis for those situations where the other approaches don’t get you the performance you need. It’s useful and it’s here to stay, but it’s not for everything.

Clearly express what success means for your project. Your ML/AI system is going to produce a bunch of labels for you: how will you score its performance on the task you set it? How promising does it need to be in order to be worth productionizing? What’s the minimum acceptable performance for it to be worth launching?

Pro tip: skipping this step or doing it out of sequence is the leading cause of data science project failure. Don’t. Even. Think. About. Skipping. It. Make sure this part is done by whoever knows the business best and has the sharpest decision-making skills, not the best equation nerdery.

My detailed Step 1 guide is here.

Create the processes and code that collects instance IDs, some features that go with those IDs, and the correct labels if you’re doing supervised or semi-supervised learning. Don’t look at the data yet.

Pro tip: consider a dress rehearsal with simulated data before purchasing data or going out into the real world to collect your own.

Set some of your data aside so that you have the opportunity to check how well your pattern-based recipes work outside the data your found them in. It’s crucial that you evaluate performance where it matters: on fresh, relevant data you haven’t used for anything else.

Split your data into 3 datasets: training, validation, and test. (You’ll later split your training dataset further into two pieces for model fitting and debugging, but don’t worry about that just yet.)

Pro tip: implement splitting at the infrastructure level if you can and have tight access control so your test data don’t get misused accidentally.

It’s time for analytics! Look at some (not all!) of your data. Use your training dataset to plot data, complete sanity checks, and engineer new features. Never forget that real world data are messy, so trust no one and trust nothing. Instead, think of your dataset as a textbook you’re using to teach your machine students. Only a daft teacher assigns a textbook they haven’t looked inside.

Pro tip: don’t forget to apply the code you write to clean your data and create new features to your validation and test datasets… without poking around in them.

This is where you make friends with your ML/AI toolbox and get to know all the pattern-finding algorithms you’re going to try running. Don’t expect your data to be in a format those packages will accept — you’ll likely need to do a bunch of setup and code wrangling to get those algorithms to accept your data.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

The power of MLOps to scale AI across the enterprise

19 Mar, 2023

To say that it’s challenging to achieve AI at scale across the enterprise would be an understatement.  An estimated 54% …

Read more

Machine Learning as a microservice in a Docker container on a Kubernetes cluster — say what?

10 Nov, 2017

It is always fascinating to see the versatile ways in which machine learning can be used. At Outfittery, algorithms help …

Read more

5 Tools to Consider for Healthcare Data Integration

7 Nov, 2016

The healthcare industry is home to some of the most cutting-edge and forward-thinking new technologies. While healthcare organizations continue to be …

Read more

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

Remote (United States (Nationwide))

9 May, 2024

Read More

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.