Submitted by akhaliq 82 Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs · 8 authors 3
Submitted by akhaliq 33 MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators · 9 authors 2
Submitted by akhaliq 25 ByteEdit: Boost, Comply and Accelerate Generative Image Editing · 14 authors 1
Submitted by akhaliq 25 SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing · 10 authors
Submitted by akhaliq 21 BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion · 5 authors
Submitted by akhaliq 21 MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding · 8 authors
Submitted by akhaliq 17 PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations · 11 authors
Submitted by akhaliq 14 MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation · 6 authors 2
Submitted by akhaliq 12 Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models · 5 authors