Why software engineering processes and tools don’t work for machine learning

Why software engineering processes and tools don’t work for machine learning

While AI may be the new electricity significant challenges remain to realize AI potential. Here we examine why data scientists and teams can’t rely on Software engineering tools and processes for machine learning.

“AI is the new electricity.” At least, that’s what Andrew Ng suggested at this year’s Amazon re:MARS conference. In his keynote address, Ng discussed the rapid growth of Artificial Intelligence (AI) — its steady march into industry after industry; the unrelenting presence of AI breakthroughs, technologies, or fears in the headlines each day; the tremendous amount of investment, both from established enterprises seeking to modernize (see: Sony, a couple of weeks ago) as well as from venture investors parachuting into the market riding a wave of AI-focused founders.

“AI is the next big transformation,” Ng insists, and we’re watching the transformation unfold.

While AI may be the new electricity (and as a Data Scientist at Comet, I don’t need much convincing), significant challenges remain for the field to realize this potential. In this blog post, I’m going to talk about why data scientists and teams can’t rely on the tools and processes that Software engineering teams have been using for the last 20 years for machine learning (ML). 

The reliance on the tools and processes of software engineering makes sense – data science and software engineering are both disciplines whose principal tool is code. Yet what is being done in data science teams is radically different from what is being done in software engineering teams. An inspection of the core differences between the two disciplines is a helpful exercise in clarifying how we should think about structuring our tools and processes for doing AI.

At Comet, we believe the adoption of tools and processes designed specifically for AI will help practitioners unlock and enable the type of revolutionary transformation Ng is speaking about.

  Software engineering is a discipline whose aim is, considered broadly, the design and implementation of programs that a computer can execute to perform a defined function. Assuming the input to a software program is within the expected (or constrained) range of inputs, its behavior is knowable. In a talk at ICML in 2015, Leon Bottou formulated this well: in software engineering an algorithm or program can be proven correct, in the sense that given particular assumptions about the input, certain properties will be true when the algorithm or program terminates.

The provable correctness of software programs has shaped the tools and processes we have built for doing software engineering. Consider one corollary characteristic of software programming that follows from provable correctness: if a program is provably correct for some input values, then the program contains sub-programs that are also provably correct for those input values. This is why engineering processes like Agile are, broadly speaking, successful and productive for software teams. Breaking apart these projects into sub-tasks works. Most waterfall and scrum implementations also include sub-tasking as well.

We see a lot of data science teams using workflow processes that are identical or broadly similar to these software methodologies. Unfortunately, they don’t work very well. The reason? The provable correctness of software engineering does not extend to AI and machine learning. In (supervised) machine learning, the only guarantee we have about a model we’ve built is that if the training set is an iid (independent and identically distributed) sample from some distribution, then performance on another iid sample from the same distribution will be close to the performance on the training set. Because uncertainty is an intrinsic property of machine learning, sub-tasking can lead to unforeseeable downstream effects.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

Five Challenges to IoT Analytics Success

8 May, 2018

The Internet of Things (IoT) is an ecosystem of ever-increasing complexity; it’s the next wave of innovation that will humanize …

Read more

Just get rid of your billion dollar Data

20 Jan, 2017

Data isn’t your business.It’s just by-product of doing business.I know you want to get rid of those redundant data which …

Read more

Four Steps to a Modern Data Management Architecture

23 Jan, 2020

Data architecture is a challenging and sometimes confusing field. It can be confusing because data architecture means different things to …

Read more

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

Remote (United States (Nationwide))

9 May, 2024

Read More

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.