Eval Leaderboards - a andrewrreed Collection

andrewrreed 's Collections

Hallucination Detection

Eval Leaderboards

Small, but mighty chat models

Eval Leaderboards

updated Dec 16, 2024

Running

3.96k

3.96k

Chatbot Arena Leaderboard

🏆
Running on CPU Upgrade

12.4k

12.4k

Open LLM Leaderboard

🏆

Track, rank and evaluate open LLMs and chatbots
Running on CPU Upgrade

4.7k

4.7k

MTEB Leaderboard

🥇

Select and filter benchmarks for text embedding tasks
Running

416

416

LLM-Perf Leaderboard

🏆

Explore hardware performance for language models
Running on CPU Upgrade

609

609

Open ASR Leaderboard

🏆

Request evaluation results for a speech model
Running

1.11k

1.11k

Big Code Models Leaderboard

📈

Submit code models for evaluation on benchmarks
Running on CPU Upgrade

128

128

Hallucinations Leaderboard

🔥

View and submit LLM evaluations
Runtime error

104

104

Enterprise Scenarios Leaderboard

🥇
Running on CPU Upgrade

87

87

LLM Safety Leaderboard

🥇

View and submit machine learning model evaluations
Running

222

222

AI2 WildBench Leaderboard (V2)

🦁

Display and explore model leaderboards and chat history
Running

152

152

Open Object Detection Leaderboard

🏆

Request model evaluation on COCO val 2017 dataset
Running

41

41

Redteaming Resistance Leaderboard

💻

Display model benchmark results
Runtime error

30

30

Contextual Leaderboard

🐨
Running

185

185

Yet Another LLM Leaderboard

🌖

Run a Streamlit web app
Running on CPU Upgrade

595

595

Open VLM Leaderboard

🌎

VLMEvalKit Evaluation Results Collection
Running

539

539

Vision Arena (Testing VLMs side-by-side)

🖼

Analyze images to detect and label objects
Configuration error

34

34

Leaderboard

🐠
Running on CPU Upgrade

332

332

Open Medical-LLM Leaderboard

🥇

Browse and submit LLM evaluations
Running on CPU Upgrade

50

50

Open CoT Leaderboard

🥇

Track, rank and evaluate open LLMs' CoT quality
Running

23

23

MM-UPD Leaderboard

🥇

Submit and evaluate model results for the MM-AAD leaderboard
Running

176

176

BigCodeBench Leaderboard

🥇

Explore and analyze code evaluation data
Running

10

10

MJ Bench Leaderboard

🥇

Display and filter multimodal model leaderboard results
Running

325

325

Reward Bench Leaderboard

📐

Explore and analyze RewardBench leaderboard data