🛠️ RunLocalAI — local AI leaderboard & catalog

Reproducible benchmark scores and the full open-weight model catalog for running AI on your own hardware. Every benchmark is measured first-party with a public run log and a one-line reproduction command — no vibes, no leaderboard laundering.

Source of truth: runlocalai.co · Data license: CC-BY-4.0 · Click any model name for the full operator-grade page.

Ranked, reproducible quality scores on real consumer GPUs. Pick a benchmark to see the head-to-head ranking.

Benchmark

What these scores mean

  • HumanEval+ (EvalPlus) — pass@1 /100 · coding · Liu et al., 2023 (NeurIPS). Extension of Chen et al. (2021) HumanEval. dataset
  • TurkishMMLU (Generative) — accuracy /100 · turkish-language, knowledge, multilingual · Yuksel et al., 2024 dataset
  • MBPP+ (EvalPlus) — pass@1 /100 · coding · Liu et al., 2023 (NeurIPS). Extension of Austin et al. (2021) MBPP. dataset

Every run is measured first-party on real consumer hardware and carries a public run log + a one-line reproduction command. Methodology: runlocalai.co/benchmarks/methodology.


Catalog hubs: Small LMs · Embeddings · Audio · Image · Coding · Turkish · Benchmarks

Machine-readable: models · quality-benchmarks · OpenAPI

Data licensed CC-BY-4.0 — attribute to runlocalai.co with a link.