SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published 1 day ago • 61
view article Article Yay! Organizations can now publish blog Articles By huggingface and 3 others • 17 days ago • 30
view article Article Crowd-sourced Open Preference Dataset for Text-to-Image Generation By RapidataAI and 4 others • 30 days ago • 18
view article Article FineWeb2-C: Help Build Better Language Models in Your Language By davanstrien and 5 others • Dec 23, 2024 • 18
view article Article Fine-tune ModernBERT for text classification using synthetic data By davidberenstein1957 • Dec 30, 2024 • 31
Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation Paper • 2412.03304 • Published Dec 4, 2024 • 17
view article Article Let’s make a generation of amazing image generation models By burtenshaw and 4 others • Nov 26, 2024 • 34
Dataset Creation Collection Spaces and utilities for creating datasets and getting them on the Hub • 3 items • Updated Nov 10, 2024 • 10
view article Article Halo: Open Source Health Tracking with Wearables By cyrilzakka • Nov 19, 2024 • 105
view article Article How to build a custom text classifier without days of human labeling By sdiazlor and 4 others • Oct 17, 2024 • 55
view article Article How to optimize your data labelling project with custom interfaces By burtenshaw and 9 others • Oct 16, 2024 • 18
LLM Reasoning Papers Collection Papers to improve reasoning capabilities of LLMs • 20 items • Updated 22 days ago • 113
Critique-out-Loud Reward Models Collection Paper: https://arxiv.org/abs/2408.11791 | Code: https://github.com/zankner/CLoud • 7 items • Updated Sep 5, 2024 • 3
view article Article ColPali: Efficient Document Retrieval with Vision Language Models 👀 By manu • Jul 5, 2024 • 196