-
Mesh2NeRF: Direct Mesh Supervision for Neural Radiance Field Representation and Generation
Paper • 2403.19319 • Published • 14 -
Getting it Right: Improving Spatial Consistency in Text-to-Image Models
Paper • 2404.01197 • Published • 31 -
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model
Paper • 2404.01331 • Published • 25 -
LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models
Paper • 2404.03118 • Published • 24
Collections
Discover the best community collections!
Collections including paper arxiv:2404.01331
-
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows
Paper • 2107.00652 • Published • 2 -
Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering
Paper • 2403.09622 • Published • 17 -
Veagle: Advancements in Multimodal Representation Learning
Paper • 2403.08773 • Published • 9 -
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Paper • 2304.14178 • Published • 3
-
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Paper • 2310.04406 • Published • 8 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 105 -
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization
Paper • 2402.09320 • Published • 6 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 116
-
PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models
Paper • 2402.08714 • Published • 12 -
Data Engineering for Scaling Language Models to 128K Context
Paper • 2402.10171 • Published • 24 -
RLVF: Learning from Verbal Feedback without Overgeneralization
Paper • 2402.10893 • Published • 11 -
Coercing LLMs to do and reveal (almost) anything
Paper • 2402.14020 • Published • 13
-
PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models
Paper • 2312.13964 • Published • 19 -
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 259 -
StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation
Paper • 2312.12491 • Published • 70 -
LLaVA-φ: Efficient Multi-Modal Assistant with Small Language Model
Paper • 2401.02330 • Published • 16