Each Rank Could be an Expert: Single-Ranked Mixture of Experts LoRA for Multi-Task Learning Paper • 2501.15103 • Published 12 days ago
From Drafts to Answers: Unlocking LLM Potential via Aggregation Fine-Tuning Paper • 2501.11877 • Published 17 days ago
Understanding Humans in Crowded Scenes: Deep Nested Adversarial Learning and A New Benchmark for Multi-Human Parsing Paper • 1804.03287 • Published Apr 10, 2018
view article Article Process Reinforcement through Implicit Rewards By ganqu and 1 other • Jan 3 • 22
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback Paper • 2501.12895 • Published 15 days ago • 55
What makes your model a low-empathy or warmth person: Exploring the Origins of Personality in LLMs Paper • 2410.10863 • Published Oct 7, 2024 • 1
DLO: Dynamic Layer Operation for Efficient Vertical Scaling of LLMs Paper • 2407.11030 • Published Jul 3, 2024
LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training Paper • 2411.15708 • Published Nov 24, 2024
RoRA-VLM: Robust Retrieval-Augmented Vision Language Models Paper • 2410.08876 • Published Oct 11, 2024
Touchstone Benchmark: Are We on the Right Way for Evaluating AI Algorithms for Medical Segmentation? Paper • 2411.03670 • Published Nov 6, 2024
Diving into Self-Evolving Training for Multimodal Reasoning Paper • 2412.17451 • Published Dec 23, 2024 • 43
PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models Paper • 2501.03124 • Published Jan 6 • 14
Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark Paper • 2501.05444 • Published 28 days ago
SURf: Teaching Large Vision-Language Models to Selectively Utilize Retrieved Information Paper • 2409.14083 • Published Sep 21, 2024
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback Paper • 2501.12895 • Published 15 days ago • 55
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published 23 days ago • 272