The Big Benchmarks Collection

open-llm-leaderboard 's Collections

Open LLM Leaderboard 2

Open LLM Leaderboard best models ❤️‍🔥

updated Nov 18, 2024

Gathering benchmark spaces on the hub (beyond the Open LLM Leaderboard)

Upvote

193

Running on CPU Upgrade

12.4k

12.4k

Open LLM Leaderboard

🏆

Track, rank and evaluate open LLMs and chatbots

Note 📐 The 🤗 Open LLM Leaderboard aims to track, rank and evaluate open LLMs and chatbots. 🤗 Submit a model for automated evaluation on the 🤗 GPU cluster on the “Submit” page!
Running on CPU Upgrade

4.7k

4.7k

MTEB Leaderboard

🥇

Select and filter benchmarks for text embedding tasks

Note Massive Text Embedding Benchmark (MTEB) Leaderboard.
Running

3.96k

3.96k

Chatbot Arena Leaderboard

🏆

Note 🏆 This leaderboard is based on the following three benchmarks: Chatbot Arena - a crowdsourced, randomized battle platform. We use 70K+ user votes to compute Elo ratings. MT-Bench - a set of challenging multi-turn questions. We use GPT-4 to grade the model responses. MMLU (5-shot) - a test to measure a model’s multitask accuracy on 57 tasks.
Running

416

416

LLM-Perf Leaderboard

🏆

Explore hardware performance for language models

Note The 🤗 LLM-Perf Leaderboard 🏋️ aims to benchmark the performance (latency, throughput & memory) of Large Language Models (LLMs) with different hardwares, backends and optimizations using Optimum-Benchmark and Optimum flavors. Anyone from the community can request a model or a hardware/backend/optimization configuration for automated benchmarking:
Running

1.11k

1.11k

Big Code Models Leaderboard

📈

Submit code models for evaluation on benchmarks

Note Compare performance of base multilingual code generation models on HumanEval benchmark and MultiPL-E. We also measure throughput and provide information about the models. We only compare open pre-trained multilingual code models, that people can start from as base models for their trainings.
Running on CPU Upgrade

609

609

Open ASR Leaderboard

🏆

Request evaluation results for a speech model

Note The 🤗 Open ASR Leaderboard ranks and evaluates speech recognition models on the Hugging Face Hub. We report the Average WER (⬇️) and RTF (⬇️) - lower the better. Models are ranked based on their Average WER, from lowest to highest
Running

179

179

MT Bench

📊

Compare model answers to questions

Note The MT-Bench Browser (see Chatbot arena)
Running

65

65

Toolbench Leaderboard

⚡

Display ToolBench model performance results
Running

89

89

OpenCompass LLM Leaderboard

🚀

Display a web page
Running

20

20

MMBench Leaderboard

🚀

View and filter MMBench leaderboard data
Running on CPU Upgrade

514

514

Open Ko-LLM Leaderboard

📉

Explore and filter language model benchmark results
Running

18

18

Subquadratic LLM Leaderboard

🏆

Submit and filter LLM models for evaluation
Running

50

50

Open Persian LLM Leaderboard

🏅

Open Persian LLM Leaderboard

Upvote

193

Open LLM Leaderboard

MTEB Leaderboard

Chatbot Arena Leaderboard

LLM-Perf Leaderboard

Big Code Models Leaderboard

Open ASR Leaderboard

MT Bench

Toolbench Leaderboard

OpenCompass LLM Leaderboard

MMBench Leaderboard

Open Ko-LLM Leaderboard

Subquadratic LLM Leaderboard

Open Persian LLM Leaderboard