Music Discovery & the Cold Start Problem

114k

Spotify tracks analyzed

R² 0.51

Best model — tuned Random Forest

3

Intervention points proposed

Big data analysis

EDA · feature engineering · ML modeling

Product strategy

Internal tool design

This project originated as the final project for CIS 5450: Big Data Analytics at the University of Pennsylvania, completed as a three-person team, led full analysis.
The product extension was developed independently afterward.

Why Do Promising Songs Disappear?

This project started as a data question and then a product one:

  • what makes a track popular? →

  • what happens to a good song before it has any plays at all?

The Observation

From data analytics we found lots of songs haven’t been meaningfully played.

The Reframe

I turn my eyes from popularity prediction to a cold start problem.

total tracks

114,000

tracks score of 0

20,170

mean popularity

33.2/100

~30% of tracks have never been meaningfully played

From Data To Intervention

Finding 1

Individual audio feature carry no signal for popularity.

Finding 2

The useful signal lives in how features combine

Finding 3

Genre structurally influences popularity

Audio features barely predict popularity

DATA

Across 114,000 tracks, almost every audio feature correlates with popularity at r < 0.05. The baseline linear model explains just R² = 0.03 of variance.

INTERPRETATION

Most songs stuck at popularity = 0 cannot be explained by their audio characteristics alone.

DESIGN DECISION

Single-feature signals should not drive playlist promotion decisions.

strongest signal

instrumentalness

r=-0.10 tracks without vocals tend to underperform

most counterintuitive

danceability & energy

r ≈ 0.00 · no meaningful relationship with popularity

Combined features explain 51% of popularity variance

DATA

We tested five model classes across two rounds of feature engineering.

  • Linear models plateaued at R² = 0.26 regardless of regularization.

  • Tuned Random Forest reached R² = 0.51, a 17× improvement over the linear baseline.

INTERPRETATION

The useful signal lives in feature combinations.

DESIGN DECISION

A curator tool should surface interaction-based predictions.

linear models

top out at R² = 0.26

regularization makes no difference

random forests

R² 0.26 → 0.51

interactions captured after tuned

Genre structurally influences popularity

DATA

Median popularity ranges from 58 (Hip-Hop) to 0 (Jazz/Blues), a 58-point gap that no audio feature can explain.

Jazz/Blues(1,976 tracks) has the median of 0 while the top scores 84 .

INTERPRETATION

Some genres have strong songs but are not being surfaced. They occupy a structurally disadvantaged position on the platform.

DESIGN DECISION

Without genre context, a curator will unconsciously reinforce these gaps. The tool must surface it.

largest median gap

58 points

Hip-Hop (58) vs Jazz/Blues (0)

Jazz/Blues ~ 2,000 tracks

median = 0, top = 84

good songs exist, they're not being surfaced

The popularity distribution demonstrates:

popularity = 0

18% of songs in 114k tracks have popularity of 0

A large portion of tracks never gain meaningful exposure

Our best model reveals:

Feature Combination

Genre Structure

Unobservable variables

Platform exposure gap

Model explain 51% of popularity variance, which will be the signals to evaluate track potential.

49% remains unexplained. The exact split is unknown, but platform exposure is the part we can act on.

The 51% tells me what signal to use. The 49% tells me where to act.

A Discovery Problem, Not A Quality Problem

Spotify's platform has no proactive mechanism to surface high-potential new tracks before they disappear.

Problem Statement

HOW MIGHT WE

help Spotify's editorial team proactively identify high-potential tracks stuck in the cold start loop?

A structural platform information gap

Proactively surface cold-start tracks to editorial curators

Proactive Discovery

Data-backed Decision

An internal monitoring tool that proactively pushes high-potential but never-exposed tracks to the right editorial curators, and assists their decisions with data-based analysis.

“Create new opportunities for potentially popular tracks that have never been exposed before.”

Start From System Detection

SYSTEM

The system continuously scans newly released tracks with no play history. When a track's feature interaction score indicates high potential, it is automatically flagged and sent to the relevant curator, along with the supporting data analysis.

USER (CURATORS)

The curator receives a notification when a high-potential cold-start track has been identified.

Review the Queue

SYSTEM

The Cold Start Queue is automatically populated and ranked daily. Tracks are selected based on feature interaction score, filtered for zero play history, and tagged with genre equity flags where applicable.

USER (CURATORS)

The curator receives a notification when a high-potential cold-start track has been identified.

Listen. Analyze. Decide.

Listen and decide

The curator can preview the track directly within the interface, hearing the music alongside the data.

Music waveform

Visual representation of the track's audio profile. Mood tags derived from artist pitch data give the curator immediate tonal context.

Action button

Always accessible. The curator can decide at any point.

Similar tracks

Tracks with similar feature interaction patterns. Gives curators a real-world reference point for what this track could become.

Data-backed analysis

The right panel translates the model's findings into three curator-facing insights. Each module maps directly to a data finding from the analysis, telling curators not just what the data says, but what to do with it.

Audio featurs

An EQ-style breakdown of the track's audio features against the genre average.

Predict score

Discovery Potential score based on feature interaction model (R²=0.51). Model explains 51% of popularity variance, human judgment fills the rest.

Genre context

Jazz/Blues median popularity = 0. This panel surfaces the structural disadvantage of the genre, so curators can make a more informed and equitable decision.

What I Learned From Letting Data Ask Questions

One step further than required

This project began as a big data analysis assignment. After leading the team through the full modeling pipeline and presenting our findings, most people would have stopped there.

I didn't.

After the course ended, I kept thinking about what the data was actually pointing at, not as a modeling result, but as a signal about how people and platforms interact. That's when I started asking a different question: if these findings are real, what should exist that doesn't yet?

The shift from 'what does the data say' to 'what should be act one' is how I understand the purpose of data analysis. Numbers and models are not cold abstractions. They are the direct, objective record of human behavior at scale.

Reading that signal, and turning it into something useful, is what this project became.

Limitations

The biggest gap in this project is one I knew about the whole time: I have never spoken to an actual Spotify curator.

Everything I know about how they work comes from blog posts, Spotify's own documentation, and a reasonable amount of inference. That's enough to build a coherent product argument. It's not enough to know if I got it right.

There's a version of this project where the curator looks at my interface and says 'this is exactly what I needed.' There's also a version where they say 'interesting, but you've misunderstood how we actually think about this.' I genuinely don't know which one it is yet.

If I had more time

If I could do one more thing, it would be talking to actual Spotify curators. Not to show them the designs, but to find out if I've even framed the problem correctly.

The data half of this project is done. The human half hasn't started yet.