Running on CPU Upgrade 184 184 MMLU-Pro Leaderboard π₯ More advanced and challenging multi-task evaluation
Running 542 542 Vision Arena (Testing VLMs side-by-side) πΌ Analyze images to detect and label objects