new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

by AK and the research community

Dec 16

Submitted by

orrzohar

Apollo: An Exploration of Video Understanding in Large Multimodal Models

·
12 authors

Submitted by

jienengchen

GenEx: Generating an Explorable World

·
11 authors

Submitted by

wzk1015

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

·
11 authors

Submitted by

vyokky

Large Action Models: From Inception to Implementation

·
18 authors

Submitted by

sahalshajim

BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities

·
11 authors

Submitted by

MoonQiu

FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion

·
8 authors

Submitted by

jaywalnut310

Efficient Generative Modeling with Residual Vector Quantization-Based Tokens

·
4 authors

Submitted by

AnonMegumi

InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption

·
9 authors

Submitted by

yedid

ObjectMate: A Recurrence Prior for Object Insertion and Subject-Driven Generation

·
7 authors

Submitted by

MagicBag

FireFlow: Fast Inversion of Rectified Flow for Image Semantic Editing

·
5 authors

Submitted by

hongjiewang

LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity

·
13 authors

Submitted by

ydalva

FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers

·
3 authors

Submitted by

iofu728

SCBench: A KV Cache-Centric Analysis of Long-Context Methods

·
11 authors

Submitted by

JackyZhuo

Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation

·
10 authors

Submitted by

sarathismg

GReaTer: Gradients over Reasoning Makes Smaller Language Models Strong Prompt Optimizers

·
6 authors

Submitted by

SultanR

SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs

·
1 authors

Submitted by

rzheng12

TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies

·
8 authors

Submitted by

moein99

Prompt2Perturb (P2P): Text-Guided Diffusion-Based Adversarial Attacks on Breast Ultrasound Images

·
5 authors