184
MMLU-Pro Leaderboard
🥇
More advanced and challenging multi-task evaluation
More advanced and challenging multi-task evaluation
Visualize LLM progress with interactive filters
View how beam search decoding works, in detail!
Leaderboard for long LLM on In-context Learning
Submit and evaluate models on a leaderboard