-
Chain-of-Verification Reduces Hallucination in Large Language Models
Paper • 2309.11495 • Published • 37 -
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 77 -
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Paper • 2309.09400 • Published • 85 -
Language Modeling Is Compression
Paper • 2309.10668 • Published • 83
Collections
Discover the best community collections!
Collections including paper arxiv:2410.05258
-
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution
Paper • 2412.15213 • Published • 26 -
No More Adam: Learning Rate Scaling at Initialization is All You Need
Paper • 2412.11768 • Published • 41 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 131 -
Autoregressive Video Generation without Vector Quantization
Paper • 2412.14169 • Published • 14
-
CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation
Paper • 2410.23090 • Published • 54 -
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
Paper • 2410.13276 • Published • 26 -
Personalized Visual Instruction Tuning
Paper • 2410.07113 • Published • 70 -
Differential Transformer
Paper • 2410.05258 • Published • 169
-
Differential Transformer
Paper • 2410.05258 • Published • 169 -
Baichuan-Omni Technical Report
Paper • 2410.08565 • Published • 85 -
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss
Paper • 2410.17243 • Published • 89 -
FrugalNeRF: Fast Convergence for Few-shot Novel View Synthesis without Learned Priors
Paper • 2410.16271 • Published • 81
-
Differential Transformer
Paper • 2410.05258 • Published • 169 -
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Paper • 2410.20672 • Published • 6 -
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
Paper • 2410.23168 • Published • 24
-
Differential Transformer
Paper • 2410.05258 • Published • 169 -
PaliGemma 2: A Family of Versatile VLMs for Transfer
Paper • 2412.03555 • Published • 126 -
VisionZip: Longer is Better but Not Necessary in Vision Language Models
Paper • 2412.04467 • Published • 107 -
o1-Coder: an o1 Replication for Coding
Paper • 2412.00154 • Published • 43