Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs Paper • 2501.18585 • Published 7 days ago • 49
MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding Paper • 2501.18362 • Published 7 days ago • 19
Large Language Models Think Too Fast To Explore Effectively Paper • 2501.18009 • Published 8 days ago • 22
DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation Paper • 2501.16764 • Published 9 days ago • 21
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published 9 days ago • 100
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong Paper • 2501.09775 • Published 21 days ago • 28
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published 23 days ago • 272
A Multi-Modal AI Copilot for Single-Cell Analysis with Instruction Following Paper • 2501.08187 • Published 23 days ago • 24
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature Paper • 2501.07171 • Published 24 days ago • 49
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation Paper • 2409.18964 • Published Sep 27, 2024 • 26
HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks Paper • 2410.12381 • Published Oct 16, 2024 • 43
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations Paper • 2410.02707 • Published Oct 3, 2024 • 48
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor? Paper • 2409.15277 • Published Sep 23, 2024 • 35
MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications Paper • 2409.07314 • Published Sep 11, 2024 • 53
Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems Paper • 2408.16293 • Published Aug 29, 2024 • 26