12 Steps to Applied AI
- by 7wData
For those who’ve been looking for a 12 step program to get rid of bad data habits, here’s a handy applied ML/AI project roadmap. Well, it should properly be 13 steps, so we’ll start counting at zero to make it work.
Check that you actually need ML/AI. Can you identify many small decisions you need help with? Has the non-ML/AI approach already been shown to be worthless? Do you have data to learn from? Do you have access to hardware? If not, don’t pass GO.
Pro tip: besides looking like a bunch of rampaging amateurs, leaders who try to shoveAIapproaches where they don’t belong usually end up with solutions which are too costly to maintain in production. Instead, find a good problem to solve and may the best solution win.If you can do it without AI, so much the better— it’ll be probably be cheaper to maintain.ML/AIis for those situations where the other approaches don’t get you the performance you need. It’s useful and it’s here to stay, but it’s not for everything.
Clearly express what success means for your project. Your ML/AI system is going to produce a bunch of labels for you: how will you score its performance on the task you set it? How promising does it need to be in order to be worth productionizing? What’s the minimum acceptable performance for it to be worth launching?
Pro tip: skipping this step or doing it out of sequence is the leading cause of data science project failure. Don’t. Even. Think. About. Skipping. It. Make sure this part is done by whoever knows the business best and has the sharpest decision-making skills, not the best equation nerdery.
My detailed Step 1 guide is here.
Create the processes and code that collects instance IDs, some features that go with those IDs, and the correct labels if you’re doing supervised or semi-supervised learning. Don’t look at the data yet.
Pro tip: consider a dress rehearsal with simulated data before purchasing data or going out into the real world to collect your own.
Set some of your data aside so that you have the opportunity to check how well your pattern-based recipes work outside the data your found them in. It’s crucial that you evaluate performance where it matters: on fresh, relevant data you haven’t used for anything else.
Split your data into 3 datasets: training, validation, and test. (You’ll later split your training dataset further into two pieces for model fitting and debugging, but don’t worry about that just yet.)
Pro tip: implement splitting at the infrastructure level if you can and have tight access control so your test data don’t get misused accidentally.
It’s time for analytics! Look at some (not all!) of your data. Use your training dataset to plot data, complete sanity checks, and engineer new features. Never forget that real world data are messy, so trust no one and trust nothing. Instead, think of your dataset as a textbook you’re using to teach your machine students. Only a daft teacher assigns a textbook they haven’t looked inside.
Pro tip: don’t forget to apply the code you write to clean your data and create new features to your validation and test datasets… without poking around in them.
This is where you make friends with your ML/AI toolbox and get to know all the pattern-finding algorithms you’re going to try running. Don’t expect your data to be in a format those packages will accept — you’ll likely need to do a bunch of setup and code wrangling to get those algorithms to accept your data.
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
From Text to Value: Pairing Text Analytics and Generative AI
21 May 2024
5 PM CET – 6 PM CET
Read More