new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

by AK and the research community

Dec 19

Submitted by

jph00

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

·
14 authors

Submitted by

Yhmeng1106

AniDoc: Animation Creation Made Easier

·
9 authors

Submitted by

yuexiang96

TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks

·
21 authors

Submitted by

CodexXiang

No More Adam: Learning Rate Scaling at Initialization is All You Need

·
4 authors

Submitted by

Franck-Dernoncourt

GUI Agents: A Survey

·
29 authors

Submitted by

ShushengYang

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

·
6 authors

Submitted by

pengxiang

Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN

·
3 authors

Submitted by

guozonghao96

LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer

·
12 authors

Submitted by

xichenhku

FashionComposer: Compositional Fashion Image Generation

·
6 authors

Submitted by

Bitterdhg

Autoregressive Video Generation without Vector Quantization

·
9 authors

Submitted by

hpouransari

FastVLM: Efficient Vision Encoding for Vision Language Models

·
11 authors

Submitted by

bykang

Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation

·
10 authors

Submitted by

g-astruc

AnySat: An Earth Observation Model for Any Resolutions, Scales, and Modalities

·
4 authors

Submitted by

mbreuss

Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning

·
4 authors

Submitted by

OliverZhao

Learning from Massive Human Videos for Universal Humanoid Pose Control

·
10 authors

Submitted by

jinzhuoran

RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment

·
7 authors

Submitted by

lhhuang

ChatDiT: A Training-Free Baseline for Task-Agnostic Free-Form Chatting with Diffusion Transformers

·
10 authors

Submitted by

deeptimhe

VidTok: A Versatile and Open-Source Video Tokenizer

·
6 authors

Submitted by

akhaliq

Alignment faking in large language models

·
20 authors

Submitted by

filapro

CAD-Recode: Reverse Engineering CAD Code from Point Clouds

·
6 authors

Submitted by

yeungchenwa0106

Predicting the Original Appearance of Damaged Historical Documents

·
6 authors

Submitted by

bobxwu

AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge

·
10 authors