How Does Spotify Know You So Well? – Member Feature Stories – Medium

How Does Spotify Know You So Well? – Member Feature Stories – Medium

Let’s dive into how each of these recommendation models work!

Recommendation Model #1: Collaborative filtering

First, some background: When people hear the words “Collaborative filtering,” they generally think of Netflix, as it was one of the first companies to use this method to power a recommendation model, taking users’ star-based movie ratings to inform its understanding of which movies to recommend to other similar users.
After Netflix was successful, the use of collaborative filtering spread quickly, and is now often the starting point for anyone trying to make a recommendation model.

Unlike Netflix, Spotify doesn’t have a star-based system with which users rate their music. Instead, Spotify’s data is implicit feedback — specifically, the stream counts of the tracks and additional streaming data, such as whether a user saved the track to their own playlist, or visited the artist’s page after listening to a song.
But what is collaborative filtering, truly, and how does it work? Here’s a high-level rundown, explained in a quick conversation:

What’s going on here? Each of these individuals has track preferences: the one on the left likes tracks P, Q, R, and S, while the one on the right likes tracks Q, R, S, and T.

Collaborative filtering then uses that data to say:

“Hmmm… You both like three of the same tracks — Q, R, and S — so you are probably similar users. Therefore, you’re each likely to enjoy other tracks that the other person has listened to, that you haven’t heard yet.”
Therefore, it suggests that the one on the right check out track P — the only track not mentioned, but that his “similar” counterpart enjoyed — and the one on the left check out track T, for the same reasoning. Simple, right?
But how does Spotify actually use that concept in practice to calculate millions of users’ suggested tracks based on millions of other users’ preferences?

With matrix math, done with Python libraries!
In actuality, this matrix you see here is gigantic. Each row represents one of Spotify’s 140 million users — if you use Spotify, you yourself are a row in this matrix — and each column represents one of the 30 million songs in Spotify’s database.
Then, the Python library runs this long, complicated matrix factorization formula:
Some complicated math…

When it finishes, we end up with two types of vectors, represented here by X and Y. X is a user vector, representing one single user’s taste, and Y is a song vector, representing one single song’s profile.
The User/Song matrix produces two types of vectors: user vectors and song vectors. Image source: From Idea to Execution: Spotify’s Discover Weekly , by Chris Johnson, ex-Spotify.

Now we have 140 million user vectors and 30 million song vectors. The actual content of these vectors is just a bunch of numbers that are essentially meaningless on their own, but are hugely useful when compared.
To find out which users’ musical tastes are most similar to mine, collaborative filtering compares my vector with all of the other users’ vectors, ultimately spitting out which users are the closest matches. The same goes for the Y vector, songs: you can compare a single song’s vector with all the others, and find out which songs are most similar to the one in question.

Collaborative filtering does a pretty good job, but Spotify knew they could do even better by adding another engine. Enter NLP.

Recommendation Model #2: Natural Language Processing (NLP)
The second type of recommendation models that Spotify employs are Natural Language Processing (NLP) models. The source data for these models, as the name suggests, are regular ol’ words: track metadata, news articles, blogs, and other text around the internet.

Natural Language Processing, which is the ability of a computer to understand human speech as it is spoken, is a vast field unto itself, often harnessed through sentiment analysis APIs .

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

Why Extreme Programming can be an asset or a liability for data scientists

8 May, 2017

Is Extreme Programming (aka XP) a good match for data science and big data analytics? It seems like a leading …

Read more

Top Reasons Why Big Data, Data Science, Analytics Initiatives Fail

5 Dec, 2016

We examine the main reason for failure in Big Data, Data Science, and Analytics projects which include lack of clear …

Read more

Big data…meet big government in the data-driven age

1 Aug, 2017

According to the International Data Corporation (IDC), “Over the next three years, Digital Transformation will reshape the entire macro-economy, as …

Read more

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

Remote (United States (Nationwide))

9 May, 2024

Read More

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.