Agile Data Scientists Do Scale

 

Due to the hype and rapid growth of Big Data Engineering and Data Science, it seems many companies and practitioners have gotten so excited by hiring, building infrastructure, fashionable models and shiny technology that one crucial part of the field seems to be missing - delivery. I hear of countless stories, in both small & large companies, where teams are built, clusters bought, prototype algorithms written and software is installed, but it then takes months or even longer to deliver working data driven applications, or for insights to be acted on. Hype is thick in the air but delivery is thin on the ground.

The review and related blogs correctly point out that we should focus on AI applications, that is automation. My addition is that these applications can not always be easily bought in for many domains. In such cases they should be built in-house and the builders ought to be Agile Big Data Engineers and Data Scientists that understand the importance of weekly or fortnightly iteration. The title of Data Scientist is not dead, but keeping Data Science alive will mean shifting the focus of the Data Scientist away from hacking, ad hoc analysis and prototyping and on to high quality code, automation, applications and Agile methodologies. Let's remember the technology industry has a habit of finding ways to automate the job of those that lack the imagination to transition to automators, i.e. those that cannot be scaled.

Why Agile methodologies are so lacking in Data Science and Big Data is confusing. Perhaps it's the age of the industry? To be frank I believe a smidgen of elitism aims to distinguish the industries from regular software development as though the practices and principles are beneath the concerns of mighty data minds. One other issue seems to be a big misconception that "exploratory" work precludes frequent iteration over automated end to end applications, that is Data Scientists claims they need to "explore" for a month or two before they can deliver. This I see as ironic since the tension between exploratory work and continuous delivery is exactly what Agile solves. Finally another recurring misconception is that the day to day practices of Agile, like tests, automation, clean code and clean structure are "time consuming" and will slow down "exploratory" work. This is also ironic since again Agile aims to make exploratory work faster and less laborious. Hopefully the details of my posts will flesh out why these objections are misconceptions.

Automatic tests are absolutely critical in correctly practicing Agile, and from TDD evolved more acronyms and terms than many Data Scientists have written tests; TDD, BDD, DDD, ATDD, SDD, EDD, CDD, unit tests, integrations tests, black box tests, end-to-end tests, systems tests, acceptance tests, property based tests, example based tests, functional tests, contract based tests, etc. At a glance things like interactive work, long running jobs, unclear objectives, peculiar development environments, etc preclude *DD approaches. Nevertheless, if one strips away the unnecessary verbosity of *DD the remaining core can easily accommodate such problems.

Sometimes Data Science can feel like academia except much better paid. So until the bubble bursts, which still doesn't seem to be any time soon, should we just have as much fun as possible?

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

What will enterprise architecture look like in 5 years?

19 Jun, 2015

  When we look back from 2020 how will the world view enterprise architects? They’re currently some of the most …

Read more

Should the BI Scrum Master report to the team or the PMO?

21 Nov, 2014

  In most enterprise organizations where do the Scrum Masters report? Project Management Office (PMO) or within the same management structure …

Read more

Strategies Coca-Cola used to become a famous brand

14 Jun, 2015

  Coca-Cola went from a cocaine-infused elixir in 1886 to a ubiquitous sugary drink by 1929. Now people in more …

Read more

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

Remote (United States (Nationwide))

9 May, 2024

Read More

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.