-
VideoTetris: Towards Compositional Text-to-Video Generation
Paper • 2406.04277 • Published • 24 -
SF-V: Single Forward Video Generation Model
Paper • 2406.04324 • Published • 24 -
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Paper • 2406.04325 • Published • 73
Collections
Discover the best community collections!
Collections including paper arxiv:2406.04325
-
LanguageBind/MoE-LLaVA-Phi2-2.7B-4e
Text Generation • Updated • 373 • 38 -
LanguageBind/LanguageBind_Video_FT
Zero-Shot Image Classification • Updated • 22.3k • 4 -
stabilityai/stable-video-diffusion-img2vid-xt
Image-to-Video • Updated • 399k • 2.85k -
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Paper • 2406.04325 • Published • 73
-
ShareGPT4Video/ShareGPT4Video
Viewer • Updated • 40.2k • 3.37k • 189 -
Lin-Chen/sharegpt4video-8b
Visual Question Answering • Updated • 2.32k • 43 -
Lin-Chen/ShareCaptioner-Video
Text Generation • Updated • 355 • 17 -
27
ShareGPT4Video 8B
🎬Send video and text for explanation or action
-
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Paper • 2405.08748 • Published • 22 -
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Paper • 2405.10300 • Published • 28 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 130 -
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Paper • 2405.11143 • Published • 36
-
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
Paper • 2404.16994 • Published • 36 -
VideoMamba: State Space Model for Efficient Video Understanding
Paper • 2403.06977 • Published • 27 -
VideoAgent: Long-form Video Understanding with Large Language Model as Agent
Paper • 2403.10517 • Published • 33 -
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
Paper • 2403.09626 • Published • 14
-
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Paper • 2310.04406 • Published • 8 -
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Paper • 2305.10601 • Published • 11 -
Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models
Paper • 2404.02575 • Published • 48 -
Voyager: An Open-Ended Embodied Agent with Large Language Models
Paper • 2305.16291 • Published • 9
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 146 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 116 -
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
Paper • 2402.07456 • Published • 43 -
Learning From Mistakes Makes LLM Better Reasoner
Paper • 2310.20689 • Published • 29
-
BLINK: Multimodal Large Language Models Can See but Not Perceive
Paper • 2404.12390 • Published • 26 -
TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Paper • 2404.12803 • Published • 30 -
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
Paper • 2404.13013 • Published • 31 -
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
Paper • 2404.06512 • Published • 30
-
RegionGPT: Towards Region Understanding Vision Language Model
Paper • 2403.02330 • Published • 2 -
Grounded Language-Image Pre-training
Paper • 2112.03857 • Published • 3 -
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Paper • 2406.04325 • Published • 73