Submitted by akhaliq 185 GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection · 6 authors 15
Submitted by akhaliq 62 ShortGPT: Layers in Large Language Models are More Redundant Than You Expect · 8 authors 21
Submitted by akhaliq 39 PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation · 10 authors 1
Submitted by akhaliq 20 Learning to Decode Collaboratively with Multiple Language Models · 5 authors 6
Submitted by akhaliq 15 Stop Regressing: Training Value Functions via Classification for Scalable Deep RL · 12 authors 1
Submitted by akhaliq 13 Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling · 6 authors 1