Submitted by akhaliq 58 Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model · 10 authors 3
Submitted by zhangysk 52 TableBench: A Comprehensive and Complex Benchmark for Table Question Answering · 13 authors 3
Submitted by akhaliq 42 To Code, or Not To Code? Exploring Impact of Code in Pre-training · 9 authors 2
Submitted by yukimasano 13 NeCo: Improving DINOv2's spatial representations in 19 GPU hours with Patch Neighbor Consistency · 5 authors 2
Submitted by akhaliq 13 MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding · 7 authors 3
Submitted by soujanyaporia 12 Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique · 4 authors 2
Submitted by haoningwu 12 MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning · 6 authors 2
Submitted by keminglu 9 Predicting Rewards Alongside Tokens: Non-disruptive Parameter Insertion for Efficient Inference Intervention in Large Language Model · 7 authors 2
Submitted by akhaliq 9 Audio Match Cutting: Finding and Creating Matching Audio Transitions in Movies and Videos · 4 authors 2
Submitted by IAMJB 8 PhysBERT: A Text Embedding Model for Physics Scientific Literature · 3 authors 1
Submitted by amanchadha 7 The Brittleness of AI-Generated Image Watermarking Techniques: Examining Their Robustness Against Visual Paraphrasing Attacks · 10 authors 2
Submitted by akhaliq 7 MambaEVT: Event Stream based Visual Object Tracking using State Space Model · 7 authors 2
Submitted by akhaliq 4 RP1M: A Large-Scale Motion Dataset for Piano Playing with Bi-Manual Dexterous Robot Hands · 8 authors 2
Submitted by akhaliq 3 ShapeSplat: A Large-scale Dataset of Gaussian Splats and Their Self-Supervised Pretraining · 8 authors 2
Submitted by IAMJB 2 Recent Surge in Public Interest in Transportation: Sentiment Analysis of Baidu Apollo Go Using Weibo Data · 9 authors 1