DisPose: Disentangling Pose Guidance for Controllable Human Image Animation Paper • 2412.09349 • Published Dec 12, 2024 • 8
ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting Paper • 2410.17856 • Published Oct 23, 2024 • 49
Distilling an End-to-End Voice Assistant Without Instruction Training Data Paper • 2410.02678 • Published Oct 3, 2024 • 22
Real-time Holistic Robot Pose Estimation with Unknown States Paper • 2402.05655 • Published Feb 8, 2024
UniTR: A Unified and Efficient Multi-Modal Transformer for Bird's-Eye-View Representation Paper • 2308.07732 • Published Aug 15, 2023 • 2
GiT: Towards Generalist Vision Transformer through Universal Language Interface Paper • 2403.09394 • Published Mar 14, 2024 • 26
FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation Paper • 2403.06775 • Published Mar 11, 2024 • 3
PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization Paper • 2306.05087 • Published Jun 8, 2023 • 6