How We Improved Data Discovery for Data Scientists at Spotify

How We Improved Data Discovery for Data Scientists at Spotify

At Spotify, we believe strongly in data-informed decision making. Whether we’re considering a big shift in our product strategy or we’re making a relatively quick decision about which track to add to one of our editorially-programmed playlists, data provides a foundation for sound decision making. An insight is a conclusion drawn from data that can help influence decisions and drive change. To enable Spotifiers to make faster, smarter decisions, we’ve developed a suite of internal products to accelerate the production and consumption of insights. One of these products is Lexikon, a library of data and insights that help employees find and understand the data and knowledge generated by members of our insights community.

We’ve learned a lot since we first launched this product. In this blog post, we want to share the story of how we iterated on Lexikon to better support data discovery.

In 2016, as we started migrating to the Google Cloud Platform, we saw an explosion of dataset creation in BigQuery. At this time, we also drastically increased our hiring of insights specialists (data scientists, analysts, user researchers, etc.) at Spotify, resulting in more research and insights being produced across the company. However, research would often only have a localized impact in certain parts of the business, going unseen by others that might find it useful to influence their decision making. Datasets lacked clear ownership or documentation making it difficult for data scientists to find them. We believed that the crux of the problem was that we lacked a centralized catalog of these data and insights resources.

In early 2017, we released Lexikon, a library for data and insights, as the solution to this problem. The first release allowed users to search and browse available BigQuery tables (i.e. datasets)— as well as discover knowledge generated through past research and analysis. The insights community at Spotify was quite excited to have this new tool and it quickly became one of the most widely used tools amongst data scientists, with ~75% of data scientists using it regularly, and ~550 monthly active users.

However, months after the initial launch, we surveyed the insights community and learned that data scientists still reported data discovery as a major pain point, reporting significant time spent on finding the right dataset. The typical data scientist at Spotify works with ~25-30 different datasets in a month. If data discovery is time-consuming, it significantly increases the time it takes to produce insights, which means either it might take longer to make a decision informed by those insights, or worse, we won’t have enough data and insights to inform a decision.

Our team decided to focus on this specific issue by iterating on Lexikon, with the goal to improve the data discovery experience for data scientists and ultimately accelerate insights production. We were able to significantly improve the data discovery experience by (1) gaining a better understanding of our users intent, (2) enabling knowledge exchange through people, and (3) helping users get started with a dataset they’ve discovered.

To kick things off, we spent time conducting user research to learn more about our users, their needs, and their specific pain points regarding data discovery. In doing so, we were able to gain a better understanding of our users intent within the context of data discovery, and use this understanding to drive product development.

Let’s say you’re having a rough day and you want to listen to some music to lift your spirit. So, you open up Spotify, browse some of the mood playlists, and put on the Mood Booster playlist. You’ve just had a low-intent discovery experience! You had some broad goal to lift your mood and you didn’t have extremely strict requirements on what you wanted to listen to.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

How big data can save lives by diagnosing healthcare failings

25 Sep, 2016

National Health Service England’s Tim Kelsey tells a story of the human cost of mismanaging data. Some years ago a …

Read more

Understanding AI vs Machine Learning vs Deep Learning

15 Apr, 2019

Artificial Intelligence (AI) is working its way into almost every industry you can think of – including video games, healthcare, …

Read more

How Will Artificial Intelligence Change The Future Of Hiring And Recruiting?

24 Feb, 2018

Tractica, this task-reducing technology can and will make your work responsibilities easier. One of the ways that AI is expected …

Read more

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

Remote (United States (Nationwide))

9 May, 2024

Read More

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.