Open LLM Leaderboard
Track, rank and evaluate open LLMs and chatbots
Track, rank and evaluate open LLMs and chatbots
Select and filter benchmarks for text embedding tasks
Explore hardware performance for language models
Request evaluation results for a speech model
Submit code models for evaluation on benchmarks
Request model evaluation on COCO val 2017 dataset
Display ToolBench model performance results
Display a web page
Browse and compare AI model evaluations
View and submit LLM evaluations
Submit model evaluation and view leaderboard
Explore GenAI model efficiency on ML.ENERGY leaderboard
Explore and compare LLM models through a leaderboard
Upload and evaluate video models
Run a Streamlit web app
Evaluate LLM cybersecurity risks
Search for model performance across languages and benchmarks
Browse and filter leaderboard of language models
VLMEvalKit Evaluation Results Collection
Explore and analyze RewardBench leaderboard data
Jailbreak the LLM and privacy guardrails
Filter and display leaderboards based on selected criteria
Filter data for contamination in datasets or models
Track, rank and evaluate open Arabic LLMs and chatbots
Explore benchmark results for QA and long doc models
Submit and evaluate model results for the MM-AAD leaderboard
Explore and analyze code evaluation data
Evaluate open LLMs in the languages of LATAM and Spain.