Why software engineering processes and tools don’t work for machine learning Blog

Why software engineering processes and tools don’t work for machine learning

by 7wData
February 18, 2020

While AI may be the new electricity significant challenges remain to realize AI potential. Here we examine why data scientists and teams can’t rely on Software engineering tools and processes for machine learning.

“AI is the new electricity.” At least, that’s what Andrew Ng suggested at this year’s Amazon re:MARS conference. In his keynote address, Ng discussed the rapid growth of Artificial Intelligence (AI) — its steady march into industry after industry; the unrelenting presence of AI breakthroughs, technologies, or fears in the headlines each day; the tremendous amount of investment, both from established enterprises seeking to modernize (see: Sony, a couple of weeks ago) as well as from venture investors parachuting into the market riding a wave of AI-focused founders.

“AI is the next big transformation,” Ng insists, and we’re watching the transformation unfold.

While AI may be the new electricity (and as a Data Scientist at Comet, I don’t need much convincing), significant challenges remain for the field to realize this potential. In this blog post, I’m going to talk about why data scientists and teams can’t rely on the tools and processes that Software engineering teams have been using for the last 20 years for machine learning (ML).

The reliance on the tools and processes of software engineering makes sense – data science and software engineering are both disciplines whose principal tool is code. Yet what is being done in data science teams is radically different from what is being done in software engineering teams. An inspection of the core differences between the two disciplines is a helpful exercise in clarifying how we should think about structuring our tools and processes for doing AI.

At Comet, we believe the adoption of tools and processes designed specifically for AI will help practitioners unlock and enable the type of revolutionary transformation Ng is speaking about.

Software engineering is a discipline whose aim is, considered broadly, the design and implementation of programs that a computer can execute to perform a defined function. Assuming the input to a software program is within the expected (or constrained) range of inputs, its behavior is knowable. In a talk at ICML in 2015, Leon Bottou formulated this well: in software engineering an algorithm or program can be proven correct, in the sense that given particular assumptions about the input, certain properties will be true when the algorithm or program terminates.

The provable correctness of software programs has shaped the tools and processes we have built for doing software engineering. Consider one corollary characteristic of software programming that follows from provable correctness: if a program is provably correct for some input values, then the program contains sub-programs that are also provably correct for those input values. This is why engineering processes like Agile are, broadly speaking, successful and productive for software teams. Breaking apart these projects into sub-tasks works. Most waterfall and scrum implementations also include sub-tasking as well.

We see a lot of data science teams using workflow processes that are identical or broadly similar to these software methodologies. Unfortunately, they don’t work very well. The reason? The provable correctness of software engineering does not extend to AI and machine learning. In (supervised) machine learning, the only guarantee we have about a model we’ve built is that if the training set is an iid (independent and identically distributed) sample from some distribution, then performance on another iid sample from the same distribution will be close to the performance on the training set. Because uncertainty is an intrinsic property of machine learning, sub-tasking can lead to unforeseeable downstream effects.

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Why software engineering processes and tools don’t work for machine learning

Leave a Reply Cancel reply

Upcoming Events

MarkLogic World | Amsterdam

Knowledge Graph — The Ultimate Center of Excellence

From Text to Value: Pairing Text Analytics and Generative AI

Bringing Data Closer to Decision Makers with Data Fabric

Categories

Tags

You Might Be Interested In

Five Challenges to IoT Analytics Success

Just get rid of your billion dollar Data

Four Steps to a Modern Data Management Architecture

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

IT Engineer

Data Engineer

Applications Developer

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

Why software engineering processes and tools don’t work for machine learning

Leave a Reply Cancel reply

Upcoming Events

Categories

Tags

You Might Be Interested In

Recent Jobs

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

To Drive Analytics Adoption
And manage change