Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2406.11794

DataComp-LM: In search of the next generation of training sets for language models

Paper • 2406.11794 • Published Jun 17, 2024 • 50
Training Language Models on Synthetic Edit Sequences Improves Code Synthesis

Paper • 2410.02749 • Published Oct 3, 2024 • 12
Fewer Truncations Improve Language Modeling

Paper • 2404.10830 • Published Apr 16, 2024 • 3
How to Train Long-Context Language Models (Effectively)

Paper • 2410.02660 • Published Oct 3, 2024 • 2

LLM Tech Report

Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 345
Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published Sep 18, 2024 • 140
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement

Paper • 2409.12122 • Published Sep 18, 2024 • 3
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

Paper • 2401.02954 • Published Jan 5, 2024 • 43

Training-related

DataComp-LM: In search of the next generation of training sets for language models

Paper • 2406.11794 • Published Jun 17, 2024 • 50
griffin/chain_of_density

Viewer • Updated Sep 8, 2023 • 1.1k • 255 • 70
HuggingFaceFV/finevideo

Viewer • Updated Dec 16, 2024 • 39.5k • 4.07k • 293

Relevant-Papers-Midterm

Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models

Paper • 2402.14848 • Published Feb 19, 2024 • 18
The Prompt Report: A Systematic Survey of Prompting Techniques

Paper • 2406.06608 • Published Jun 6, 2024 • 58
CRAG -- Comprehensive RAG Benchmark

Paper • 2406.04744 • Published Jun 7, 2024 • 45
Transformers meet Neural Algorithmic Reasoners

Paper • 2406.09308 • Published Jun 13, 2024 • 44

about 10 hours ago

MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels

Paper • 2405.07526 • Published May 13, 2024 • 19
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach

Paper • 2405.15613 • Published May 24, 2024 • 15
A Touch, Vision, and Language Dataset for Multimodal Alignment

Paper • 2402.13232 • Published Feb 20, 2024 • 15
How Do Large Language Models Acquire Factual Knowledge During Pretraining?

Paper • 2406.11813 • Published Jun 17, 2024 • 31

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7, 2024 • 17
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Paper • 2406.17557 • Published Jun 25, 2024 • 91
DataComp-LM: In search of the next generation of training sets for language models

Paper • 2406.11794 • Published Jun 17, 2024 • 50
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

Paper • 2402.14905 • Published Feb 22, 2024 • 126

cerebras/SlimPajama-627B

Preview • Updated Jul 7, 2023 • 56.3k • 451
JeanKaddour/minipile

Viewer • Updated Jun 20, 2023 • 1.01M • 2.94k • 119
nampdn-ai/tiny-textbooks

Viewer • Updated Jul 3, 2024 • 420k • 297 • 151
open-phi/textbooks

Viewer • Updated Oct 8, 2023 • 1.8k • 732 • 85

Getting it Right: Improving Spatial Consistency in Text-to-Image Models

Paper • 2404.01197 • Published Apr 1, 2024 • 31
CosmicMan: A Text-to-Image Foundation Model for Humans

Paper • 2404.01294 • Published Apr 1, 2024 • 16
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus

Paper • 2406.08707 • Published Jun 13, 2024 • 15
DataComp-LM: In search of the next generation of training sets for language models

Paper • 2406.11794 • Published Jun 17, 2024 • 50

Daily paper that is inspiring (abstract is enough)

World Model on Million-Length Video And Language With RingAttention

Paper • 2402.08268 • Published Feb 13, 2024 • 38
Improving Text Embeddings with Large Language Models

Paper • 2401.00368 • Published Dec 31, 2023 • 80
Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15, 2024 • 105
FiT: Flexible Vision Transformer for Diffusion Model

Paper • 2402.12376 • Published Feb 19, 2024 • 48

Dataset pruning/cleaning/dedup

AlpaGasus: Training A Better Alpaca with Fewer Data

Paper • 2307.08701 • Published Jul 17, 2023 • 23
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset

Paper • 2303.03915 • Published Mar 7, 2023 • 7
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset

Paper • 2309.04662 • Published Sep 9, 2023 • 23
SlimPajama-DC: Understanding Data Combinations for LLM Training

Paper • 2309.10818 • Published Sep 19, 2023 • 10

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs