How (& Why) Data Scientists and Data Engineers Should Share a Platform

How (& Why) Data Scientists and Data Engineers Should Share a Platform

Sharing one platform has some obvious benefits for data Science and data Engineering teams, but technical, language and process challenges often make this a challenge. Learn how one company implemented single cloud platform for R, Python and other workloads – and some of the unexpected benefits they discovered along the way.

Attending analytic conferences really exposes the range and sophistication of analytic techniques that people with the right skills can apply to data. An example of this is the EARL Boston conference, which focuses on how to best use the R programming language to produce analytic outcomes. At the most recent conference, I led a session that was based on many conversations my industry colleagues and I have had with companies trying to accelerate their analytics programs. And a specific type of problem that tends to arise over and over again. It’s how two distinct groups – Data Engineers and Data Scientists – can work more collaboratively despite the stark differences between their skills.

These teams often work independently. The Data Engineering team typically works on a shared platform, with a toolset and associated processes that optimize their flows, while the Data Science teams tend to have their own, separate set of tools and processes and generally work locally on their laptops. This creates inefficiencies, which I heard about firsthand from the Data Scientists at the EARL conference.

Many of them described extracting data from central systems with varying degrees of pain and compliance and then spending time refactoring that data to fit the analysis they wanted to do – and then (only then)starting the analysis process. This works but it’s not the most efficient overall flow because it leads to costly duplication of efforts as multiple users may extract data and waste time doing the same refactoring or transformations of data.For these reasons, it’s not surprising that more organizations are trying to have both teams work on a single platform.

The Challenges and Benefits of a Single Platform for Data Engineering & Data Science

While the concept of a single platform is a familiar topic in data strategy discussions, the flexibility of the cloud now makes it possible, though not necessarily easy. Ideally, everyone should be able to use their own tools and a variety of languages and be supported by a common underlying data and compute platform. Some of the reasons that this is challenging are related to delivering secure access to data across a variety of teams and locations, as well as having a common governance model across the disparate set of tools and processes.

Consider this real-world example from a relatively advanced Data Science team that I work with at a large corporation. The Data Engineering team predominantly uses Python for their data wrangling processes, while the Data Science team predominantly prefers R.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

Scientists unleash graphene’s innate superconductivity

26 Jan, 2017

Already renowned for its potential to revolutionize everything from light bulbs and dental fillings through to semiconductors and motorcycle helmets, …

Read more

Study: AI Behind ChatGPT Could Help Spot Early Signs of Alzheimer’s Disease

11 Jan, 2023

The artificial intelligence algorithms behind the chatbot program ChatGPT — which has drawn attention for its ability to generate humanlike …

Read more

Big data challenges impacting data-driven business goals

28 May, 2019

The exponential explosion of digital data has forced researchers to find new ways of seeing and analyzing the world. It’s …

Read more

Recent Jobs

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

D365 Business Analyst

South Bend, IN, USA

22 Apr, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.