Eval Leaderboards
- 3.96kπ
- 12.4k
Open LLM Leaderboard
πTrack, rank and evaluate open LLMs and chatbots
- 4.7k
MTEB Leaderboard
π₯Select and filter benchmarks for text embedding tasks
- 416
LLM-Perf Leaderboard
πExplore hardware performance for language models
- 609
Open ASR Leaderboard
πRequest evaluation results for a speech model
- 1.11k
Big Code Models Leaderboard
πSubmit code models for evaluation on benchmarks
- 128
Hallucinations Leaderboard
π₯View and submit LLM evaluations
- 104
Enterprise Scenarios Leaderboard
π₯ - 87
LLM Safety Leaderboard
π₯View and submit machine learning model evaluations
- 222
AI2 WildBench Leaderboard (V2)
π¦Display and explore model leaderboards and chat history
- 152
Open Object Detection Leaderboard
πRequest model evaluation on COCO val 2017 dataset
- 41
Redteaming Resistance Leaderboard
π»Display model benchmark results
- 30
Contextual Leaderboard
π¨ - 185
Yet Another LLM Leaderboard
πRun a Streamlit web app
- 595
Open VLM Leaderboard
πVLMEvalKit Evaluation Results Collection
- 539
Vision Arena (Testing VLMs side-by-side)
πΌAnalyze images to detect and label objects
- 34
Leaderboard
π - 332
Open Medical-LLM Leaderboard
π₯Browse and submit LLM evaluations
- 50
Open CoT Leaderboard
π₯Track, rank and evaluate open LLMs' CoT quality
- 23
MM-UPD Leaderboard
π₯Submit and evaluate model results for the MM-AAD leaderboard
- 176
BigCodeBench Leaderboard
π₯Explore and analyze code evaluation data
- 10
MJ Bench Leaderboard
π₯Display and filter multimodal model leaderboard results
- 325
Reward Bench Leaderboard
πExplore and analyze RewardBench leaderboard data