Submitted by akhaliq 34 Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold · 6 authors 74
Submitted by akhaliq 11 Tree of Thoughts: Deliberate Problem Solving with Large Language Models · 7 authors 1
Submitted by akhaliq 6 OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding · 9 authors 4
Submitted by akhaliq 4 SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities · 7 authors 2
Submitted by akhaliq 3 CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-training · 8 authors 4
Submitted by akhaliq 3 VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks · 11 authors 5
Submitted by akhaliq 3 UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild · 13 authors 1
Submitted by akhaliq 3 Discriminative Diffusion Models as Few-shot Vision and Language Learners · 9 authors
Submitted by akhaliq 2 mLongT5: A Multilingual and Efficient Text-To-Text Transformer for Longer Sequences · 4 authors 1
Submitted by akhaliq 2 TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models · 5 authors
Submitted by akhaliq 2 Learning the Visualness of Text Using Large Vision-Language Models · 5 authors
Submitted by akhaliq 2 GETMusic: Generating Any Music Tracks with a Unified Representation and Diffusion Framework · 7 authors 1
Submitted by akhaliq 1 VideoFactory: Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation · 7 authors
Submitted by akhaliq 1 Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models · 10 authors