-
GLIGEN: Open-Set Grounded Text-to-Image Generation
Paper • 2301.07093 • Published • 3 -
YOLO-World: Real-Time Open-Vocabulary Object Detection
Paper • 2401.17270 • Published • 36 -
DETRs Beat YOLOs on Real-time Object Detection
Paper • 2304.08069 • Published • 13 -
RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer
Paper • 2407.17140 • Published • 1
Collections
Discover the best community collections!
Collections including paper arxiv:2401.17270
-
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Paper • 2404.07973 • Published • 31 -
COCONut: Modernizing COCO Segmentation
Paper • 2404.08639 • Published • 28 -
GLIGEN: Open-Set Grounded Text-to-Image Generation
Paper • 2301.07093 • Published • 3 -
Grounded Language-Image Pre-training
Paper • 2112.03857 • Published • 3
-
YOLO-World: Real-Time Open-Vocabulary Object Detection
Paper • 2401.17270 • Published • 36 -
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Paper • 2401.14405 • Published • 13 -
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 17 -
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper • 2404.15653 • Published • 27
-
Masked Audio Generation using a Single Non-Autoregressive Transformer
Paper • 2401.04577 • Published • 43 -
YOLO-World: Real-Time Open-Vocabulary Object Detection
Paper • 2401.17270 • Published • 36 -
LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation
Paper • 2402.05054 • Published • 26