-
Moral Foundations of Large Language Models
Paper • 2310.15337 • Published • 1 -
Specific versus General Principles for Constitutional AI
Paper • 2310.13798 • Published • 3 -
Contrastive Prefence Learning: Learning from Human Feedback without RL
Paper • 2310.13639 • Published • 25 -
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Paper • 2309.00267 • Published • 47
Collections
Discover the best community collections!
Collections including paper arxiv:2402.00396
-
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models
Paper • 2402.13064 • Published • 48 -
Textbooks Are All You Need II: phi-1.5 technical report
Paper • 2309.05463 • Published • 87 -
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
Paper • 2402.10379 • Published • 31 -
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
Paper • 2312.06585 • Published • 29
-
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 83 -
Efficient Exploration for LLMs
Paper • 2402.00396 • Published • 22 -
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
Transforming and Combining Rewards for Aligning Large Language Models
Paper • 2402.00742 • Published • 12
-
HexaGen3D: StableDiffusion is just one step away from Fast and Diverse Text-to-3D Generation
Paper • 2401.07727 • Published • 10 -
Efficient Exploration for LLMs
Paper • 2402.00396 • Published • 22 -
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Paper • 2403.05135 • Published • 42 -
Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs
Paper • 2403.20041 • Published • 35
-
PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models
Paper • 2402.08714 • Published • 12 -
Data Engineering for Scaling Language Models to 128K Context
Paper • 2402.10171 • Published • 24 -
RLVF: Learning from Verbal Feedback without Overgeneralization
Paper • 2402.10893 • Published • 11 -
Coercing LLMs to do and reveal (almost) anything
Paper • 2402.14020 • Published • 13
-
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Paper • 2312.08578 • Published • 17 -
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Paper • 2312.08583 • Published • 9 -
Vision-Language Models as a Source of Rewards
Paper • 2312.09187 • Published • 12 -
StemGen: A music generation model that listens
Paper • 2312.08723 • Published • 48
-
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Paper • 2311.03285 • Published • 29 -
Tailoring Self-Rationalizers with Multi-Reward Distillation
Paper • 2311.02805 • Published • 4 -
Ultra-Long Sequence Distributed Transformer
Paper • 2311.02382 • Published • 3 -
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data
Paper • 2309.11235 • Published • 15