-
181
MMLU Pro
๐ฅMore advanced and challenging multi-task evaluation
-
36
Stick To Your Role! Leaderboard
๐ญCompare LLMs on role consistency across contexts
-
50
ZeroEval Leaderboard
๐Embed and use ZeroEval for evaluation tasks
-
24
Decentralized Arena Leaderboard
๐ฅDisplay model leaderboard evaluations
Hristo Panev
hppdqdq
AI & ML interests
None yet
Recent Activity
liked
a model
about 13 hours ago
bartowski/simplescaling_s1-32B-GGUF
upvoted
an
article
2 days ago
Open-source DeepResearch โ Freeing our search agents
upvoted
an
article
4 days ago
Open-R1: Update #1
Organizations
None yet
Collections
1
models
None public yet
datasets
None public yet