Video-P2P: Video Editing with Cross-attention Control Paper โข 2303.04761 โข Published Mar 8, 2023 โข 2
Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code Paper โข 2310.01506 โข Published Oct 2, 2023
RL-GPT: Integrating Reinforcement Learning and Code-as-policy Paper โข 2402.19299 โข Published Feb 29, 2024
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models Paper โข 2403.18814 โข Published Mar 27, 2024 โข 47
Multi-modal Cooking Workflow Construction for Food Recipes Paper โข 2008.09151 โข Published Aug 20, 2020 โข 1
VisionZip: Longer is Better but Not Necessary in Vision Language Models Paper โข 2412.04467 โข Published Dec 5, 2024 โข 106