Submitted by akhaliq 34 CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching · 8 authors 4
Submitted by akhaliq 26 MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens · 7 authors 3
Submitted by akhaliq 25 AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent · 11 authors 3
Submitted by akhaliq 24 LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models · 10 authors 1
Submitted by akhaliq 16 CodeEditorBench: Evaluating Code Editing Capability of Large Language Models · 16 authors 1
Submitted by akhaliq 9 Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks? · 8 authors
Submitted by akhaliq 8 RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis · 11 authors