Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction Paper • 2410.21169 • Published Oct 28, 2024 • 30
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture Paper • 2409.02889 • Published Sep 4, 2024 • 55
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding Paper • 2411.04952 • Published Nov 7, 2024 • 28
PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling Paper • 2410.05970 • Published Oct 8, 2024
READoc: A Unified Benchmark for Realistic Document Structured Extraction Paper • 2409.05137 • Published Sep 8, 2024
M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework Paper • 2411.06176 • Published Nov 9, 2024 • 45
VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation Paper • 2412.10704 • Published Dec 14, 2024 • 15
DoPTA: Improving Document Layout Analysis using Patch-Text Alignment Paper • 2412.12902 • Published Dec 17, 2024
Predicting the Original Appearance of Damaged Historical Documents Paper • 2412.11634 • Published Dec 16, 2024 • 4
SynFinTabs: A Dataset of Synthetic Financial Tables for Information and Table Extraction Paper • 2412.04262 • Published Dec 5, 2024 • 4
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering Paper • 2408.09174 • Published Aug 17, 2024 • 52