Tell us what's in your machine, and we'll rank every model that fits — by real benchmarks (LiveBench, Aider, Arena), with the math shown. No fluff, no upsell.
| Source | Fields | Source date | Status |
|---|---|---|---|
| Aider Polyglot ↗ | polyglot pass_rate · total_cost_usd · edit_format | 2025-11 | ✓ Fresh |
| LiveBench ↗ Scores aggregated client-side from per-question parquet; verify against livebench.ai if numbers differ by >2%. | Reasoning · Coding · Math · Data · Language · IFEval (aggregated to overall) | 2026-01-08 | ✓ Fresh |
| Chatbot Arena (legacy MT-bench/MMLU) ↗ The HF Space CSVs are the old leaderboard format — NOT current Arena ELO. Real ELO scraper (fboulnois mirror) lands in next refresh. | MT-bench · MMLU · License · Organization · Link | 2025-08-04 | ⚠ Stale |
| HuggingFace API ↗ | model_id · downloads · params · context · license · tags | live | ✓ Live |
| Open LLM Leaderboard v2 ↗ Project effectively abandoned · we dropped this source. | (not used — project archived) | 2024-08-07 | ✗ Dropped |
Why some sources lag: real-time benchmark scraping is on the v1.1 roadmap (GitHub Actions daily cron). Until then we ship a known-good snapshot and mark its date so you can compare against fresher sources yourself.