Enabling Citizen Data Scientists to Reach Their Full Potential

Enabling Citizen Data Scientists to Reach Their Full Potential

With data scientists regularly topping the charts as one of the most in-demand roles globally, many organizations are increasingly turning to non-traditional employees to help make sense of their most valuable asset: data.

These so-called citizen data scientists, typically self-taught specialists in any given field with a penchant for analysis, are likewise becoming champions for important projects with business-defining impact. They’re often leading the charge when it comes to the global adoption of machine learning (ML) and artificial intelligence (AI), for example, and can arm senior leaders with the intelligence needed to navigate business disruption.

Chances are you’ve seen several articles from industry luminaries and analysts talking about how important these roles are for the future. But seemingly every opinion piece overlooks the most crucial challenge facing citizen data scientists today: collecting better data.

The most pressing concern is not about tooling or using R or Python2 but, instead, something more foundational. By neglecting to address data collection and preparation, many citizen data scientists do not have the most basic building blocks needed to accomplish their goals. And without better data, it becomes much more challenging to turn potentially great ideas into tangible business outcomes in a simple, repeatable, and cost-efficient way.

Quality Data is at the Heart of ML Deployment

When it comes to how machine learning models are operationalized (or not), otherwise known as the path to deployment, we see the same three patterns crop up repeatedly. Often, success is determined by the quality of the data collected and how difficult it is to set up and maintain these models.

The first category occurs in data-savvy companies where the business identifies a machine learning requirement. A team of engineers and data scientists is assembled to get started, and these teams spend extraordinary amounts of time building data pipelines, creating training data sets, moving and transforming data, building models, and eventually deploying the model into production. This process typically takes six to 12 months. It is expensive to operationalize, fragile to maintain, and difficult to evolve.

The second category is where a citizen data scientist creates a prototype ML model. This model is often the result of a moment of inspiration, insight, or even an intuitive hunch. The model shows some encouraging results, and it is proposed to the business. The problem is that to get this prototype model into production requires all the painful steps highlighted in the first category. Unless the model shows something extraordinary, it is put on a backlog and is rarely seen again.

The last, and perhaps the most demoralizing category of all, are those ideas that never even get explored because of roadblocks that make it difficult, if not impossible, to operationalize. This category has all sorts of nuances, some of which are not at all obvious. For example, consider the data scientist who wants features in their model that reflect certain behaviors of visitors on their website or mobile application. How do they get that data? The answer is often to raise a change request with the IT team to tag the applications to collect it.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

Microsoft India launches global research group to develop AI-powered eye care

24 Dec, 2016

Microsoft India is launching a research group that will leverage artificial intelligence to deliver large-scale eye care in collaboration with …

Read more

Digital transformation: securing your company’s future

12 Aug, 2017

When it comes to the marriage between organisations and technology, digital transformation is often the unspoken relationship problem that needs …

Read more

The Power of Data and Collaboration to Improve Traffic Safety

11 May, 2017

Datakind, in collaboration with Microsoft, completed significant data-driven projects to improve traffic safety and help save lives in New York …

Read more

Recent Jobs

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

D365 Business Analyst

South Bend, IN, USA

22 Apr, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.