AI-assisted German Employment Contract Review: A Benchmark Dataset Paper • 2501.17194 • Published 10 days ago • 1
Familiarity: Better Evaluation of Zero-Shot Named Entity Recognition by Quantifying Label Shifts in Synthetic Training Data Paper • 2412.10121 • Published Dec 13, 2024 • 1
MorphBPE: A Morpho-Aware Tokenizer Bridging Linguistic Complexity for Efficient LLM Training Across Morphologies Paper • 2502.00894 • Published 4 days ago • 1
oberbics/Multilingual_Topic-Specific_Article-Extraction_and_Classification Viewer • Updated 6 days ago • 874 • 99 • 1
view post Post 1981 🌍 Big step for multilingual AI data!The Hugging Face community has rated educational content in languages spoken by 1.6 billion people! New additions:• Japanese• Italian• Old High GermanLearn more and contribute: https://huggingface.co/blog/davanstrien/fineweb2-communityThese ratings can help enhance training data for major world languages. See translation 1 reply · 👀 5 5 🔥 3 3 + Reply
Analyzing the Effect of Linguistic Similarity on Cross-Lingual Transfer: Tasks and Experimental Setups Matter Paper • 2501.14491 • Published 13 days ago • 1
Hierarchical Autoregressive Transformers: Combining Byte-~and Word-Level Processing for Robust, Adaptable Language Models Paper • 2501.10322 • Published 20 days ago • 1 • 2
Hierarchical Autoregressive Transformers: Combining Byte-~and Word-Level Processing for Robust, Adaptable Language Models Paper • 2501.10322 • Published 20 days ago • 1