How it works
TasteGraph's recommender logic, signals, and how to interpret results. Updated as the system evolves.
How the pipeline works
Inputs
Features & signals
Recommendation paths
Compare & interpret
Outputs
Your library data (ratings, watchlist, metadata) feeds every mode on Home. High-Fit scores explicit overlap with taste signals; ML learns weights from your rated history; Search and catalogs add grounded retrieval and external snapshot pools.
1. Recommendation modes on Home
- Explore your favorites — Titles you already rated 8+, browsable by genre, country, decade, and type. Not "new" picks; it's a filterable view of your strong favorites.
- Watchlist — Unrated items you saved, filtered the same way. Optional "include rated" for comparison.
- High-Fit — Rule-based taste alignment: overlap with genres, countries, decades, people, and lists derived from your 8+ history. Explainable scores and reasons per title.
- ML — Same watchlist pool, ranked by a model that estimates P(rate 8+ | title) from your past ratings and metadata. Answers "what might land as a strong favorite?" rather than "what matches these tags?"
- Search — Natural-language queries over your watchlist or watched titles only. Not open web search: results always come from rows already in your library.
- BritBox · MUBI (Providers) — Separate tabs that rank titles from a provider catalog snapshot (e.g. from Watchmode), matched to your local
TitleMetadata, then scored with the same taste machinery. High-Fit and ML are both available there as scoring styles (see below).
Provider catalogs (BritBox, MUBI)
These modes start from an on-disk snapshot of what's on the service—not your watchlist. Only titles with an IMDb id in the snapshot and matching rows in your database can be ranked. Decade, country, genre, type, and "similar to" narrow that pool before scoring.
If a catalog title has no (or thin) local metadata, it contributes less to ranking and may not appear at all. Enrichment improves coverage; snapshots need periodic refresh to stay current with the real catalog.
High-Fit vs ML (same system, different question)
High-Fit is interpretable: fixed rules and bonuses for overlap with signals you can see (genres you love, lift-based countries, favorite people, etc.).
ML is a logistic model trained on which titles you actually rated 8+. It outputs a probability, not a story—use it when you want a learned ordering from history.
On watchlist and on provider tabs, you can narrow the pool first (decade, country, genres, type, similar-to where offered), then switch between High-Fit and ML to rank inside that slice. Disagreement between the two is normal and informative.
Taste signals: 8+ and 7-rated titles
8+ remains the core definition of "strong favorite" for building genres, decades, lift-based countries, and most taste signals.
7 is still a good rating—not a penalty. In some heuristic paths, titles you rated exactly 7 contribute a softer layer (smaller weights, separate caps) so overlap with "things you found fine" can nudge explanations and scores without diluting how 8+ signals are built.
The watchlist ML model is still trained as binary 8+ vs not; it does not treat 7 as a separate class. Watchlist, favorite people, favorite list, and enriched metadata round out the data. No collaborative filtering.
"Similar to" and embeddings
When Search (or similar-to hints) resolve a real title in your data, optional title + plot embeddings add cosine-similarity scores on top of metadata and taste overlap. Hard filters (type, decade, etc.) still apply. If embeddings are missing, metadata-backed behavior still runs. Quality is improved over metadata-only for many queries, not perfect; fully personalized "similar for me" is still on the roadmap.
2. Current ML snapshot
What it is
Logistic regression on your rated history, target = rated 8+ vs not. Outputs P(rate 8+ | title) for candidates. Used on the Watchlist ML tab and, when the model files are present, as the ML scoring mode inside provider catalog tabs (same trained weights applied to catalog titles that have feature rows).
Features
Genres, countries, decade, title type (support-thresholded), plus taste flags such as favorite-people match and favorite-list membership.
Not the same as Search similarity
"Similar to X" in Search uses the embedding layer when available, not this classifier.
Train: python -m app.ml.train_8plus_baseline. Inspect coefficients and ML vs High-Fit overlap on Model Lab. Deeper reference: docs/ml-current-snapshot.md.
3. How to interpret results
Heuristic / High-Fit
Higher overlap with your signals usually means a better story for why something fits—not a guarantee you'll rate it 8+.
ML probabilities
Treat percentages as ordering hints from past behavior, not promises. The model is binary (8+ vs not) and metadata-sparse rows score weaker.
Studies / lift
Lift compares your 8+ rate when a feature appears to your overall 8+ rate. Min-support cuts noise. Association ≠ causation.
4. How Search works
- Scope — Watchlist or Watched only. Nothing outside your imported library.
- UI pool — You can optionally constrain by release decade before search runs; that limit applies regardless of wording in the query.
- Intent — Groq maps text to filters (genres, countries, type, similar-to, min rating on watched, etc.). If
GROQ_API_KEYis missing, a heuristic fallback still searches your data. - Similar-to — Resolves to a real title, then blends metadata/taste overlap with embedding cosine similarity when artifacts exist.
- Output — Ranked rows from your DB, with explanations drawn from real metadata. Not a web-wide or open-ended chat.
5. Where else to look
- Insights & Studies — Distributions, evolution, lift, and creator stats from your ratings and watchlist.
- Model Lab — ML diagnostics, coefficients, side-by-side ML vs High-Fit on watchlist, and notes on embeddings and catalog data.
6. What's next
- Richer blending of semantic similarity with personal taste ("similar for me")
- Stronger or additional models (e.g. ordinal / "likely to enjoy" targets) alongside today's 8+ baseline
- Tighter integration between Search ranking and catalog/provider modes where it makes sense
Model Lab — coefficients, ML vs High-Fit comparison, embeddings notes, and catalog snapshot caveats.