The Difference Between Managing Large and Small Data Science Teams
- by 7wData
As advanced analytics and data science have matured into must-have skills, data science groups within large companies have themselves become much larger. This has led to some unique problems and solutions that you’ll want to consider as your own DS group grows larger.
It seems like only two or three years ago you wouldn’t have had to ask this question. Unless you were Google, Amazon, or an equally big player your data science teams were small, maybe in the range of 3 to 12, and were still trying to find their place in your organization.
Fast forward to today and it’s not unusual to find teams of 20 or 40 or even more and that’s a game changer. It’s no longer like Cheers where everyone knows your name. In larger organizations more order, organization, and process becomes necessary.
The large analytic platform providers clearly understand this. I’m thinking IBM, Microsoft, SAS, Alteryx and similar. Over the last year or so there’s been an increasing focus on elephant hunting, slang I’m sure you’ll recognize for trying to get a foothold where the big teams live.
Here are some topics and questions that seem to arise as common ground from trying to manage larger DS orgs.
You’re Going to Want to Standardize on a Process Which Probably Means Standardizing on a Platform
If you’ve got 40 people in a data science team that implies a large number of projects and as a result a large number of models or product features to keep track of and maintain. You can’t have everyone freelancing in tools and project structure or you’ll never keep up.
As you’ve approached this scale you probably tried to adhere to a common process such as CRISP-DM and may even have written some internal standards about how that’s implemented. Another common situation is for a DS group to have coalesced around a comprehensive platform.
Take Alteryx for example that enables the process from data blending through modeling. You’re all using the platform so it’s easy to communicate where you are in a modeling project and there can be project-by-project discussion of when and if, for example, you’re going to use custom code as opposed to the built-in tools.
That will get you part of the way there and some will be happy with this semi-formal level of formalization. However, the lessons we take from the project management process and application development disciplines like Agile are that more organization can be better and doesn’t necessarily bog things down.
Recently both IBM and Microsoft have created offerings to template this level of organization for you in hopes of getting you to focus on their DS and cloud offerings. IBM has the Data Science Experience and Watson Studio. Microsoft introduced the Team Data Science Process.
Some Examples from the Microsoft Team Data Science Process (TDSP)
Of the ‘systems’ we were able to identify, the Microsoft TDSP seems to be the most comprehensive, literally defining project steps and individual roles and responsibilities. TDSP includes:
The high level process diagram is not all that different from CRISP-DM but easy to use when describing the steps.
Going down one more level, the TDSP even lays out specific roles and responsibilities for each common DS role including Solution Architect, Project Manager, Project Lead, and Data Scientists.
The entire package is quite comprehensive and detailed. If you haven’t addressed this level of detail for your organization this would be a good starting point.
Another common pain point in larger DS organizations is acquiring and maintaining data pipelines. Probably as your organization grew this was originally assigned to junior data scientists. As you grew you realized this wasn’t a good use of resources.
Over the last two or three years the separate discipline of Data Engineer has emerged as a formal and separate supporting role for the data science process.
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
From Text to Value: Pairing Text Analytics and Generative AI
21 May 2024
5 PM CET – 6 PM CET
Read More