The Difference Between Managing Large and Small Data Science Teams

The Difference Between Managing Large and Small Data Science Teams

As advanced analytics and data science have matured into must-have skills, data science groups within large companies have themselves become much larger.  This has led to some unique problems and solutions that you’ll want to consider as your own DS group grows larger.

It seems like only two or three years ago you wouldn’t have had to ask this question.  Unless you were Google, Amazon, or an equally big player your data science teams were small, maybe in the range of 3 to 12, and were still trying to find their place in your organization.

Fast forward to today and it’s not unusual to find teams of 20 or 40 or even more and that’s a game changer.  It’s no longer like Cheers where everyone knows your name.  In larger organizations more order, organization, and process becomes necessary.

The large analytic platform providers clearly understand this.  I’m thinking IBM, Microsoft, SAS, Alteryx and similar.  Over the last year or so there’s been an increasing focus on elephant hunting, slang I’m sure you’ll recognize for trying to get a foothold where the big teams live.

Here are some topics and questions that seem to arise as common ground from trying to manage larger DS orgs.

You’re Going to Want to Standardize on a Process Which Probably Means Standardizing on a Platform

If you’ve got 40 people in a data science team that implies a large number of projects and as a result a large number of models or product features to keep track of and maintain.  You can’t have everyone freelancing in tools and project structure or you’ll never keep up.

As you’ve approached this scale you probably tried to adhere to a common process such as CRISP-DM and may even have written some internal standards about how that’s implemented.  Another common situation is for a DS group to have coalesced around a comprehensive platform.

Take Alteryx for example that enables the process from data blending through modeling.  You’re all using the platform so it’s easy to communicate where you are in a modeling project and there can be project-by-project discussion of when and if, for example, you’re going to use custom code as opposed to the built-in tools.

That will get you part of the way there and some will be happy with this semi-formal level of formalization.  However, the lessons we take from the project management process and application development disciplines like Agile are that more organization can be better and doesn’t necessarily bog things down.

Recently both IBM and Microsoft have created offerings to template this level of organization for you in hopes of getting you to focus on their DS and cloud offerings.  IBM has the Data Science Experience and Watson Studio.  Microsoft introduced the Team Data Science Process.

Some Examples from the Microsoft Team Data Science Process (TDSP)

Of the ‘systems’ we were able to identify, the Microsoft TDSP seems to be the most comprehensive, literally defining project steps and individual roles and responsibilities.  TDSP includes:

The high level process diagram is not all that different from CRISP-DM but easy to use when describing the steps.

Going down one more level, the TDSP even lays out specific roles and responsibilities for each common DS role including Solution Architect, Project Manager, Project Lead, and Data Scientists.

The entire package is quite comprehensive and detailed.  If you haven’t addressed this level of detail for your organization this would be a good starting point.

Another common pain point in larger DS organizations is acquiring and maintaining data pipelines.  Probably as your organization grew this was originally assigned to junior data scientists.  As you grew you realized this wasn’t a good use of resources.

Over the last two or three years the separate discipline of Data Engineer has emerged as a formal and separate supporting role for the data science process.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

Making big data manageable

18 Dec, 2016

One way to handle big data is to shrink it. If you can identify a small subset of your data …

Read more

Machine Learning as a microservice in a Docker container on a Kubernetes cluster — say what?

10 Nov, 2017

It is always fascinating to see the versatile ways in which machine learning can be used. At Outfittery, algorithms help …

Read more

The complementary strengths of AI and human intelligence

27 Feb, 2022

When the pandemic forced millions of people into working and collaborating remotely, it not only caused an explosion in the …

Read more

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

Remote (United States (Nationwide))

9 May, 2024

Read More

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.