Inputs

What feeds the research. Five sources — one primary scrape, one historical seed, one optional dump, hand-written editorial backbone, and the LLM-extracted derived layer.

BoardGameGeek XML API

Manual, on demand

Primary source

Game metadata (name, year, weight, ratings, mechanism/category tags). Pulled via /thing endpoint with a Bearer token; rate-limited at 2 req/sec. Only scripts/refresh_bgg.py talks to it.

primaryrate-limited

2019 historical rank seed

One-time historical

Auxiliary input

data/seed/bgg_ranks_2019.csv from the beefsack/bgg-ranking-historicals dump. Used as a stable list of ~17k IDs for the scraper to seed against, AND joined in pipeline 3 to compute rank_delta_since_2019 (rank trajectory).

historical

Kaggle dump

When dropped in

Optional input

data/kaggle/games.csv from threnjen/board-games-database-from-boardgamegeek. Watch-folder: when present, pipeline 3 joins it automatically. Currently empty.

optional

Hand-written notes

Continuous, by hand

Research backbone

notes/mechanics.md (mechanism taxonomy + translation-difficulty heuristic), notes/lineages.md (game-to-game inheritance graph), and 8 per-game deep dives. These set the prior for what good translation candidates look like.

editorial

LLM extractions

Per pipeline 2 run

Derived input

v2 wide-pass terse phrases over 522 games (game_llm_primitives) and v3 deep-pass paragraphs + theme axis over 40 top candidates (game_llm_deep). Driven by web-search-equipped agents using prompts/extraction_v{2,3}.md.

derived