2 242 7

Jaehyun Jun

btjhjeon

https://btjhjeon.github.io/

btjhjeon

AI & ML interests

Multimodal

Recent Activity

upvoted a paper about 21 hours ago

SARChat-Bench-2M: A Multi-Task Vision-Language Benchmark for SAR Image Interpretation

updated a collection about 21 hours ago

Multimodal Benchmarks

updated a collection about 21 hours ago

Multimodal Dataset

View all activity

Organizations

btjhjeon's activity

upvoted a paper about 21 hours ago

SARChat-Bench-2M: A Multi-Task Vision-Language Benchmark for SAR Image Interpretation

Paper • 2502.08168 • Published 2 days ago • 10

upvoted 3 papers 2 days ago

Scaling Pre-training to One Hundred Billion Data for Vision Language Models

Paper • 2502.07617 • Published 3 days ago • 22

Competitive Programming with Large Reasoning Models

Paper • 2502.06807 • Published 11 days ago • 53

Éclair -- Extracting Content and Layout with Integrated Reading Order for Documents

Paper • 2502.04223 • Published 8 days ago • 9

upvoted 3 papers 3 days ago

EVEv2: Improved Baselines for Encoder-Free Vision-Language Models

Paper • 2502.06788 • Published 4 days ago • 11

The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering

Paper • 2502.03628 • Published 9 days ago • 11

Show-o Turbo: Towards Accelerated Unified Multimodal Understanding and Generation

Paper • 2502.05415 • Published 6 days ago • 16

upvoted a paper 4 days ago

VideoRoPE: What Makes for Good Video Rotary Position Embedding?

Paper • 2502.05173 • Published 7 days ago • 60

upvoted a paper 7 days ago

Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment

Paper • 2502.04328 • Published 8 days ago • 21

upvoted 2 papers 8 days ago

Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Paper • 2502.02339 • Published 10 days ago • 19

MetaMorph: Multimodal Understanding and Generation via Instruction Tuning

Paper • 2412.14164 • Published Dec 18, 2024 • 4

upvoted 2 papers 9 days ago

MM-IQ: Benchmarking Human-Like Abstraction and Reasoning in Multimodal Models

Paper • 2502.00698 • Published 12 days ago • 22

AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding

Paper • 2502.01341 • Published 11 days ago • 33

upvoted 2 papers 12 days ago

PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding

Paper • 2501.16411 • Published 18 days ago • 17

o3-mini vs DeepSeek-R1: Which One is Safer?

Paper • 2501.18438 • Published 15 days ago • 22

upvoted a collection 17 days ago

Qwen2.5-VL

Collection

Vision-language model series based on Qwen2.5 • 3 items • Updated 18 days ago • 340

upvoted a paper 17 days ago

Baichuan-Omni-1.5 Technical Report

Paper • 2501.15368 • Published 19 days ago • 56

upvoted a collection 18 days ago

SmolVLM 256M & 500M

Collection

Collection for models & demos for even smoller SmolVLM release • 12 items • Updated 22 days ago • 68

upvoted 2 papers 20 days ago

EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents

Paper • 2501.11858 • Published 24 days ago • 5

Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos

Paper • 2501.13826 • Published 22 days ago • 24