Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2406.04325

VideoTetris: Towards Compositional Text-to-Video Generation

Paper • 2406.04277 • Published Jun 6, 2024 • 24
SF-V: Single Forward Video Generation Model

Paper • 2406.04324 • Published Jun 6, 2024 • 24
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6, 2024 • 73

LanguageBind/MoE-LLaVA-Phi2-2.7B-4e

Text Generation • Updated Feb 1, 2024 • 373 • 38
LanguageBind/LanguageBind_Video_FT

Zero-Shot Image Classification • Updated Feb 1, 2024 • 22.3k • 4
stabilityai/stable-video-diffusion-img2vid-xt

Image-to-Video • Updated Jul 10, 2024 • 399k • 2.85k
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6, 2024 • 73

EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture

Paper • 2405.18991 • Published May 29, 2024 • 12
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6, 2024 • 73

ShareGPT4Video/ShareGPT4Video

Viewer • Updated Jul 8, 2024 • 40.2k • 3.37k • 189
Lin-Chen/sharegpt4video-8b

Visual Question Answering • Updated Jul 1, 2024 • 2.32k • 43
Lin-Chen/ShareCaptioner-Video

Text Generation • Updated Jun 11, 2024 • 355 • 17
Running on Zero

27

27

ShareGPT4Video 8B

🎬

Send video and text for explanation or action

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Paper • 2405.08748 • Published May 14, 2024 • 22
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

Paper • 2405.10300 • Published May 16, 2024 • 28
Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16, 2024 • 130
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

Paper • 2405.11143 • Published May 20, 2024 • 36

video llm works

PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

Paper • 2404.16994 • Published Apr 25, 2024 • 36
VideoMamba: State Space Model for Efficient Video Understanding

Paper • 2403.06977 • Published Mar 11, 2024 • 27
VideoAgent: Long-form Video Understanding with Large Language Model as Agent

Paper • 2403.10517 • Published Mar 15, 2024 • 33
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding

Paper • 2403.09626 • Published Mar 14, 2024 • 14

LM Prompt Engineering

Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models

Paper • 2310.04406 • Published Oct 6, 2023 • 8
Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Paper • 2305.10601 • Published May 17, 2023 • 11
Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models

Paper • 2404.02575 • Published Apr 3, 2024 • 48
Voyager: An Open-Ended Embodied Agent with Large Language Models

Paper • 2305.16291 • Published May 25, 2023 • 9

Synthetic Data and Self-Improvement

Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 146
Self-Discover: Large Language Models Self-Compose Reasoning Structures

Paper • 2402.03620 • Published Feb 6, 2024 • 116
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement

Paper • 2402.07456 • Published Feb 12, 2024 • 43
Learning From Mistakes Makes LLM Better Reasoner

Paper • 2310.20689 • Published Oct 31, 2023 • 29

Vision Language Models

BLINK: Multimodal Large Language Models Can See but Not Perceive

Paper • 2404.12390 • Published Apr 18, 2024 • 26
TextSquare: Scaling up Text-Centric Visual Instruction Tuning

Paper • 2404.12803 • Published Apr 19, 2024 • 30
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models

Paper • 2404.13013 • Published Apr 19, 2024 • 31
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

Paper • 2404.06512 • Published Apr 9, 2024 • 30

Papers - Image - Captioning

RegionGPT: Towards Region Understanding Vision Language Model

Paper • 2403.02330 • Published Mar 4, 2024 • 2
Grounded Language-Image Pre-training

Paper • 2112.03857 • Published Dec 7, 2021 • 3
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6, 2024 • 73

Previous
1
2
3
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs