6 134 178

Inui

Norm

https://normxu.github.io/

AI & ML interests

Video Diffusion; Large Language Model; Object Detection; OCR

Recent Activity

updated a collection 1 day ago

Image / Video Gen

upvoted a paper 1 day ago

VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models

liked a model 5 days ago

Alpha-VLLM/Lumina-Image-2.0

View all activity

Organizations

Norm's activity

upvoted a paper 1 day ago

VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models

Paper • 2502.02492 • Published 2 days ago • 37

upvoted a paper 14 days ago

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published 15 days ago • 301

upvoted a paper 15 days ago

VideoWorld: Exploring Knowledge Learning from Unlabeled Videos

Paper • 2501.09781 • Published 21 days ago • 24

upvoted 2 papers 17 days ago

Do generative video models learn physical principles from watching videos?

Paper • 2501.09038 • Published 23 days ago • 31

Textoon: Generating Vivid 2D Cartoon Characters from Text Descriptions

Paper • 2501.10020 • Published 20 days ago • 22

upvoted a paper 20 days ago

Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps

Paper • 2501.09732 • Published 21 days ago • 67

upvoted 3 papers 22 days ago

upvoted a paper 24 days ago

LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs

Paper • 2501.06186 • Published 27 days ago • 60

upvoted a paper 29 days ago

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Paper • 2501.04001 • Published 30 days ago • 42

upvoted a collection 30 days ago

Cosmos

Collection

The collection of Cosmos models • 31 items • Updated 20 days ago • 254

upvoted a paper about 1 month ago

Large Motion Video Autoencoding with Cross-modal Video VAE

Paper • 2412.17805 • Published Dec 23, 2024 • 24

upvoted 4 papers about 2 months ago

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published Dec 6, 2024 • 129

Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 345

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 139

STIV: Scalable Text and Image Conditioned Video Generation

Paper • 2412.07730 • Published Dec 10, 2024 • 71

upvoted 3 papers 2 months ago

PaliGemma 2: A Family of Versatile VLMs for Transfer

Paper • 2412.03555 • Published Dec 4, 2024 • 126

Open-Sora Plan: Open-Source Large Video Generation Model

Paper • 2412.00131 • Published Nov 28, 2024 • 33

Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations

Paper • 2410.10792 • Published Oct 14, 2024 • 29