Why Data Scientists aren’t Data Engineers

Why Data Scientists aren’t Data Engineers

As AI continues to become a focus for an increasing number of enterprises, these organizations are realizing how important it is to have the right people and skills in place. In particular, there has recently been a significant increase in demand for data scientists in organizations as AI, various applications of machine learning (ML), non-ML predictive analytics, and other so-called “big data” approaches continue to gain traction in the enterprise. In fact, the significant demand for data scientists has led to the talent crunch that we’re seeing across many enterprises and organizations. However, given that80% of an AI project has to do with data preparation and data engineering activities, perhaps organizations should really be searching for data engineers even more than data scientists?

Companies are searching for and competing for increasingly scarce data scientists. Salaries and signing bonuses for skilled data scientists continue to skyrocket, and the sheer number of code academies that are now focusing on data science is evidence of the significant demand for data science skills. However, are data scientists always needed by these organizations? Many enterprises, vendors, and startups often confuse the role of data scientist and data engineers.  While these different roles share some traits and skills, at their core these are job descriptions that have two very different skill sets that are not easily interchangeable.

In the mid-2000s, we saw the emergence of the Data Scientist position. As cited in the O’Reilly article: “This increase in the demand for data scientists has been driven by the success of the major Internet companies. Google, Facebook, LinkedIn, and Amazon have all made their marks by using data creatively: not just warehousing data, but turning it into something of value.” Not surprisingly, any organization that has data of value is looking at data science and data scientists to increasingly extract more value from that information.

Originating from roots in statistical modeling and data analysis, data scientists have backgrounds in advanced math and statistics, advanced analytics, and increasingly machine learning / AI.  The focus of data scientists is, unsurprisingly, data science — that is to say, how to extract useful information from a sea of data, and how to translate business and scientific informational needs into the language of information and math. Data scientists need to be masters of statistics, probability, mathematics, and algorithms that help to glean useful insights from huge piles of information. These data scientists usually have learned programming out of necessity more than anything else in order to run programs and run advanced analysis on data.  As a result, the code that data scientists have usually been tasked to write, is of a minimal nature – only as necessary to accomplish a data science task (R is a common language for them to use) and work best when they are provided clean data to run advanced analytics on. A data scientist is a scientist who creates hypothesis, runs tests and analysis of the data, and then translates their results for someone else in the organization to easily view and understand.

On the other hand, data scientists can’t perform their jobs without access to large volumes of clean data. Extracting, cleaning, and moving data is not really the role of a data scientist, but rather that of a data engineer. Data Engineers have programming and technology expertise, and have previously been involved with data integration, middleware, analytics, business data portal, and extract-transform-load (ETL) operations. The data engineer’s center of gravity and skills are focused around big data and distributed systems, and experience with programming languages such as Java, Python, Scala, and scripting tools and techniques.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

How to become a data scientist, and how to create a data science team

26 Dec, 2017

It is difficult to define data science these days: every company claims to be doing data science and everyone claims …

Read more

AI in Data Wrangling

31 Mar, 2022

Data scientists spend more than half of their time wrangling data. That’s down from about 70% 15 years ago but …

Read more

Why more employees need data literacy skills

11 Mar, 2022

Research shows that data-driven organizations are more successful, but employees often lack needed data literacy skills. According to a 2020 …

Read more

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

Remote (United States (Nationwide))

9 May, 2024

Read More

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.