12 Steps to Applied AI

by 7wData
December 11, 2019

For those who’ve been looking for a 12 step program to get rid of bad data habits, here’s a handy applied ML/AI project roadmap. Well, it should properly be 13 steps, so we’ll start counting at zero to make it work.

Check that you actually need ML/AI. Can you identify many small decisions you need help with? Has the non-ML/AI approach already been shown to be worthless? Do you have data to learn from? Do you have access to hardware? If not, don’t pass GO.

Pro tip: besides looking like a bunch of rampaging amateurs, leaders who try to shoveAIapproaches where they don’t belong usually end up with solutions which are too costly to maintain in production. Instead, find a good problem to solve and may the best solution win.If you can do it without AI, so much the better— it’ll be probably be cheaper to maintain.ML/AIis for those situations where the other approaches don’t get you the performance you need. It’s useful and it’s here to stay, but it’s not for everything.

Clearly express what success means for your project. Your ML/AI system is going to produce a bunch of labels for you: how will you score its performance on the task you set it? How promising does it need to be in order to be worth productionizing? What’s the minimum acceptable performance for it to be worth launching?

Pro tip: skipping this step or doing it out of sequence is the leading cause of data science project failure. Don’t. Even. Think. About. Skipping. It. Make sure this part is done by whoever knows the business best and has the sharpest decision-making skills, not the best equation nerdery.

My detailed Step 1 guide is here.

Create the processes and code that collects instance IDs, some features that go with those IDs, and the correct labels if you’re doing supervised or semi-supervised learning. Don’t look at the data yet.

Pro tip: consider a dress rehearsal with simulated data before purchasing data or going out into the real world to collect your own.

Set some of your data aside so that you have the opportunity to check how well your pattern-based recipes work outside the data your found them in. It’s crucial that you evaluate performance where it matters: on fresh, relevant data you haven’t used for anything else.

Split your data into 3 datasets: training, validation, and test. (You’ll later split your training dataset further into two pieces for model fitting and debugging, but don’t worry about that just yet.)

Pro tip: implement splitting at the infrastructure level if you can and have tight access control so your test data don’t get misused accidentally.

It’s time for analytics! Look at some (not all!) of your data. Use your training dataset to plot data, complete sanity checks, and engineer new features. Never forget that real world data are messy, so trust no one and trust nothing. Instead, think of your dataset as a textbook you’re using to teach your machine students. Only a daft teacher assigns a textbook they haven’t looked inside.

Pro tip: don’t forget to apply the code you write to clean your data and create new features to your validation and test datasets… without poking around in them.

This is where you make friends with your ML/AI toolbox and get to know all the pattern-finding algorithms you’re going to try running. Don’t expect your data to be in a format those packages will accept — you’ll likely need to do a bunch of setup and code wrangling to get those algorithms to accept your data.

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

12 Steps to Applied AI

Leave a Reply Cancel reply

Upcoming Events

MarkLogic World | Amsterdam

Knowledge Graph — The Ultimate Center of Excellence

From Text to Value: Pairing Text Analytics and Generative AI

Bringing Data Closer to Decision Makers with Data Fabric

Categories

Tags

You Might Be Interested In

The power of MLOps to scale AI across the enterprise

Machine Learning as a microservice in a Docker container on a Kubernetes cluster — say what?

5 Tools to Consider for Healthcare Data Integration

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

IT Engineer

Data Engineer

Applications Developer

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

12 Steps to Applied AI

Leave a Reply Cancel reply

Upcoming Events

Categories

Tags

You Might Be Interested In

Recent Jobs

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

To Drive Analytics Adoption
And manage change