π£ Pipeline Walkthrough
1οΈβ£ Data Preparation and Ingestion
We start by loading data from MovieLens (small or 1M).
- Userβitem interactions are cleaned and indexed.
- Metadata like title, genres, and tags is merged in.
- Missing information is handled smoothly (important for cold-start or incomplete data).
π Example:
- Input:
Nixon (1995)
β Genres: Drama, Biography β Description: (missing) - Output (after LLM enrichment): βNixon (1995) explores the troubled psyche and political career of America's 37th president...β
We also tag each movie by popularity:
- Head (top 10%)
- Mid-tail (next 40%)
- Long-tail (bottom 50%)
2οΈβ£ Multimodal Embedding Extraction
For each item, we generate embeddings (vector representations) from different sources:
- Text β From movie descriptions using models like OpenAI Ada, MiniLM, or LLaMA.
- Visual β From trailer/keyframes using ResNet-50 (2048-dim vectors).
- Audio (optional) β Extract MFCC features (128-dim).
-
Fusion β Combine modalities using methods like:
-
Concat (stack vectors)
- PCA (reduce to 128-dim)
- CCA (align text & visual spaces)
- Avg (simple average)
π Example:
- Text embedding:
[... -0.31, 0.54, 1.02 ...]
- Visual embedding:
[... 0.11, -0.22, 0.91 ...]
- CCA fusion β joint 64-dim vector:
[... 0.44, 0.08, -0.32 ...]
3οΈβ£ Embedding Swap & Re-embedding
If movie info changes (e.g., after augmentation), we re-run the embedding step. β Ensures all experiments always use up-to-date representations.
4οΈβ£ User Embedding Construction
We create user profiles in the same space as items. Options:
- Random β sanity check baseline.
- Average β mean of all liked items.
- Temporal β recent interactions get more weight.
π Example: User 42 watched Nixon (1995), The Post, Frost/Nixon.
- More recent movies (e.g., The Post) count more in their profile.
5οΈβ£ Candidate Retrieval
With user embeddings ready:
- We build a kNN index of all items.
- For each user β find top-N most similar items.
This step reflects all upstream choices (embedding model, fusion method, user strategy).
6οΈβ£ Profile Augmentation & LLM Prompting
We enhance user profiles with structured info:
- Manual: Extracted from history (genres, top movies, tags).
- LLM-based: Generated with a short natural-language summary.
π Example Profile:
{
"Genres": ["Drama", "Biography"],
"Top items": ["Nixon (1995)", "The Post"],
"Taste": "Prefers political dramas exploring real historical events."
}
This profile + candidate movies β passed to the LLM with instructions.
7οΈβ£ LLM Re-ranking
The LLM receives:
- User profile
- Candidate movies
- Instructions
It outputs a ranked list.
- ID-only mode β Just movie IDs (privacy-friendly).
- Explainable mode β IDs + reasoning.
If parsing fails β fallback to kNN list.
8οΈβ£ Evaluation & Logging
Every run is measured using:
- Accuracy β Recall\@K, nDCG\@K, MAP, MRR.
- Beyond-accuracy β Coverage, novelty, diversity, long-tail %.
- Fairness/robustness β Cold-start, exposure balance.
Results are logged per user, averaged, and exported (CSV/Parquet). β Intermediate artifacts (embeddings, candidates, logs) are checkpointed for reproducibility.