Music Discovery & the Cold Start Problem
114k
Spotify tracks analyzed
R² 0.51
Best model — tuned Random Forest
3
Intervention points proposed
Big data analysis
EDA · feature engineering · ML modeling
Product strategy
Internal tool design

This project originated as the final project for CIS 5450: Big Data Analytics at the University of Pennsylvania, completed as a three-person team, led full analysis.
The product extension was developed independently afterward.
Data resource: Spotify Tracks Dataset via Kaggle
The dataset contains 114,000+ songs with metadata and audio features extracted from Spotify’s Web API.
Full data analysis report: CIS 5450_Final_Project.ipynb
Why Do Promising Songs Disappear?
This project started as a data question and then a product one:
what makes a track popular? →
what happens to a good song before it has any plays at all?
The Observation
From data analytics we found lots of songs haven’t been meaningfully played.
The Reframe
I turn my eyes from popularity prediction to a cold start problem.
total tracks
114,000
tracks score of 0
20,170
mean popularity
33.2/100

~30% of tracks have never been meaningfully played
From Data To Intervention
Finding 1
Individual audio feature carry no signal for popularity.
Finding 2
The useful signal lives in how features combine
Finding 3
Genre structurally influences popularity
Audio features barely predict popularity
DATA
Across 114,000 tracks, almost every audio feature correlates with popularity at r < 0.05. The baseline linear model explains just R² = 0.03 of variance.
INTERPRETATION
Most songs stuck at popularity = 0 cannot be explained by their audio characteristics alone.
DESIGN DECISION
Single-feature signals should not drive playlist promotion decisions.

strongest signal
instrumentalness
r=-0.10 tracks without vocals tend to underperform
most counterintuitive
danceability & energy
r ≈ 0.00 · no meaningful relationship with popularity
Combined features explain 51% of popularity variance
DATA
We tested five model classes across two rounds of feature engineering.
Linear models plateaued at R² = 0.26 regardless of regularization.
Tuned Random Forest reached R² = 0.51, a 17× improvement over the linear baseline.
INTERPRETATION
The useful signal lives in feature combinations.
DESIGN DECISION
A curator tool should surface interaction-based predictions.

linear models
top out at R² = 0.26
regularization makes no difference
random forests
R² 0.26 → 0.51
interactions captured after tuned
Genre structurally influences popularity
DATA
Median popularity ranges from 58 (Hip-Hop) to 0 (Jazz/Blues), a 58-point gap that no audio feature can explain.
Jazz/Blues(1,976 tracks) has the median of 0 while the top scores 84 .
INTERPRETATION
Some genres have strong songs but are not being surfaced. They occupy a structurally disadvantaged position on the platform.
DESIGN DECISION
Without genre context, a curator will unconsciously reinforce these gaps. The tool must surface it.

largest median gap
58 points
Hip-Hop (58) vs Jazz/Blues (0)
Jazz/Blues ~ 2,000 tracks
median = 0, top = 84
good songs exist, they're not being surfaced
The popularity distribution demonstrates:
popularity = 0

18% of songs in 114k tracks have popularity of 0
A large portion of tracks never gain meaningful exposure
Our best model reveals:
Feature Combination
Genre Structure

Unobservable variables
Platform exposure gap
Model explain 51% of popularity variance, which will be the signals to evaluate track potential.
49% remains unexplained. The exact split is unknown, but platform exposure is the part we can act on.
The 51% tells me what signal to use. The 49% tells me where to act.
A Discovery Problem, Not A Quality Problem
Spotify's platform has no proactive mechanism to surface high-potential new tracks before they disappear.
Problem Statement
HOW MIGHT WE
help Spotify's editorial team proactively identify high-potential tracks stuck in the cold start loop?

A structural platform information gap
Proactively surface cold-start tracks to editorial curators
Proactive Discovery
Data-backed Decision
An internal monitoring tool that proactively pushes high-potential but never-exposed tracks to the right editorial curators, and assists their decisions with data-based analysis.
“Create new opportunities for potentially popular tracks that have never been exposed before.”
Start From System Detection

SYSTEM
The system continuously scans newly released tracks with no play history. When a track's feature interaction score indicates high potential, it is automatically flagged and sent to the relevant curator, along with the supporting data analysis.
USER (CURATORS)
The curator receives a notification when a high-potential cold-start track has been identified.
Review the Queue

SYSTEM
The Cold Start Queue is automatically populated and ranked daily. Tracks are selected based on feature interaction score, filtered for zero play history, and tagged with genre equity flags where applicable.
USER (CURATORS)
The curator receives a notification when a high-potential cold-start track has been identified.
Listen. Analyze. Decide.

Listen and decide
The curator can preview the track directly within the interface, hearing the music alongside the data.
Music waveform
Visual representation of the track's audio profile. Mood tags derived from artist pitch data give the curator immediate tonal context.

Action button
Always accessible. The curator can decide at any point.
Similar tracks
Tracks with similar feature interaction patterns. Gives curators a real-world reference point for what this track could become.
Data-backed analysis
The right panel translates the model's findings into three curator-facing insights. Each module maps directly to a data finding from the analysis, telling curators not just what the data says, but what to do with it.
Audio featurs
An EQ-style breakdown of the track's audio features against the genre average.

Predict score
Discovery Potential score based on feature interaction model (R²=0.51). Model explains 51% of popularity variance, human judgment fills the rest.
Genre context
Jazz/Blues median popularity = 0. This panel surfaces the structural disadvantage of the genre, so curators can make a more informed and equitable decision.
What I Learned From Letting Data Ask Questions
One step further than required
This project began as a big data analysis assignment. After leading the team through the full modeling pipeline and presenting our findings, most people would have stopped there.
I didn't.
After the course ended, I kept thinking about what the data was actually pointing at, not as a modeling result, but as a signal about how people and platforms interact. That's when I started asking a different question: if these findings are real, what should exist that doesn't yet?
The shift from 'what does the data say' to 'what should be act one' is how I understand the purpose of data analysis. Numbers and models are not cold abstractions. They are the direct, objective record of human behavior at scale.
Reading that signal, and turning it into something useful, is what this project became.
Limitations
The biggest gap in this project is one I knew about the whole time: I have never spoken to an actual Spotify curator.
Everything I know about how they work comes from blog posts, Spotify's own documentation, and a reasonable amount of inference. That's enough to build a coherent product argument. It's not enough to know if I got it right.
There's a version of this project where the curator looks at my interface and says 'this is exactly what I needed.' There's also a version where they say 'interesting, but you've misunderstood how we actually think about this.' I genuinely don't know which one it is yet.
If I had more time
If I could do one more thing, it would be talking to actual Spotify curators. Not to show them the designs, but to find out if I've even framed the problem correctly.
The data half of this project is done. The human half hasn't started yet.