Enabling Citizen Data Scientists to Reach Their Full Potential
- by 7wData
With data scientists regularly topping the charts as one of the most in-demand roles globally, many organizations are increasingly turning to non-traditional employees to help make sense of their most valuable asset: data.
These so-called citizen data scientists, typically self-taught specialists in any given field with a penchant for analysis, are likewise becoming champions for important projects with business-defining impact. They’re often leading the charge when it comes to the global adoption of machine learning (ML) and artificial intelligence (AI), for example, and can arm senior leaders with the intelligence needed to navigate business disruption.
Chances are you’ve seen several articles from industry luminaries and analysts talking about how important these roles are for the future. But seemingly every opinion piece overlooks the most crucial challenge facing citizen data scientists today: collecting better data.
The most pressing concern is not about tooling or using R or Python2 but, instead, something more foundational. By neglecting to address data collection and preparation, many citizen data scientists do not have the most basic building blocks needed to accomplish their goals. And without better data, it becomes much more challenging to turn potentially great ideas into tangible business outcomes in a simple, repeatable, and cost-efficient way.
Quality Data is at the Heart of ML Deployment
When it comes to how machine learning models are operationalized (or not), otherwise known as the path to deployment, we see the same three patterns crop up repeatedly. Often, success is determined by the quality of the data collected and how difficult it is to set up and maintain these models.
The first category occurs in data-savvy companies where the business identifies a machine learning requirement. A team of engineers and data scientists is assembled to get started, and these teams spend extraordinary amounts of time building data pipelines, creating training data sets, moving and transforming data, building models, and eventually deploying the model into production. This process typically takes six to 12 months. It is expensive to operationalize, fragile to maintain, and difficult to evolve.
The second category is where a citizen data scientist creates a prototype ML model. This model is often the result of a moment of inspiration, insight, or even an intuitive hunch. The model shows some encouraging results, and it is proposed to the business. The problem is that to get this prototype model into production requires all the painful steps highlighted in the first category. Unless the model shows something extraordinary, it is put on a backlog and is rarely seen again.
The last, and perhaps the most demoralizing category of all, are those ideas that never even get explored because of roadblocks that make it difficult, if not impossible, to operationalize. This category has all sorts of nuances, some of which are not at all obvious. For example, consider the data scientist who wants features in their model that reflect certain behaviors of visitors on their website or mobile application. How do they get that data? The answer is often to raise a change request with the IT team to tag the applications to collect it.
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
From Text to Value: Pairing Text Analytics and Generative AI
21 May 2024
5 PM CET – 6 PM CET
Read More