Rufy992
's Collections
Articoli PHD
updated
2.5 Years in Class: A Multimodal Textbook for Vision-Language
Pretraining
Paper
•
2501.00958
•
Published
•
99
CodeElo: Benchmarking Competition-level Code Generation of LLMs with
Human-comparable Elo Ratings
Paper
•
2501.01257
•
Published
•
48
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent
Diffusion Models
Paper
•
2501.01423
•
Published
•
36
REDUCIO! Generating 1024times1024 Video within 16 Seconds using
Extremely Compressed Motion Latents
Paper
•
2411.13552
•
Published
Generative Modeling with Explicit Memory
Paper
•
2412.08781
•
Published
CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers
Up
Paper
•
2412.16112
•
Published
•
22
TinyFusion: Diffusion Transformers Learned Shallow
Paper
•
2412.01199
•
Published
•
14
Efficient Scaling of Diffusion Transformers for Text-to-Image Generation
Paper
•
2412.12391
•
Published
•
1
ASGDiffusion: Parallel High-Resolution Generation with Asynchronous
Structure Guidance
Paper
•
2412.06163
•
Published
On the Surprising Effectiveness of Attention Transfer for Vision
Transformers
Paper
•
2411.09702
•
Published
•
1
Four-Plane Factorized Video Autoencoders
Paper
•
2412.04452
•
Published
SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer
Paper
•
2412.10958
•
Published
•
1
Nested Diffusion Models Using Hierarchical Latent Priors
Paper
•
2412.05984
•
Published
ScaleKD: Strong Vision Transformers Could Be Excellent Teachers
Paper
•
2411.06786
•
Published
FlexDiT: Dynamic Token Density Control for Diffusion Transformer
Paper
•
2412.06028
•
Published
Paper
•
2412.08905
•
Published
•
106
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training
Paper
•
2411.15124
•
Published
•
59
Training and Evaluating Language Models with Template-based Data
Generation
Paper
•
2411.18104
•
Published
•
3
Paper
•
2411.05281
•
Published
•
1
ALMA: Alignment with Minimal Annotation
Paper
•
2412.04305
•
Published
Training Data for Large Language Model
Paper
•
2411.07715
•
Published
•
1
TransformLLM: Adapting Large Language Models via LLM-Transformed Reading
Comprehension Text
Paper
•
2410.21479
•
Published
TinyLLaVA: A Framework of Small-scale Large Multimodal Models
Paper
•
2402.14289
•
Published
•
19
TinyLlama: An Open-Source Small Language Model
Paper
•
2401.02385
•
Published
•
92
TinyLLM: Learning a Small Student from Multiple Large Language Models
Paper
•
2402.04616
•
Published
TinyEmo: Scaling down Emotional Reasoning via Metric Projection
Paper
•
2410.07062
•
Published
•
4
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation
Paper
•
2408.15881
•
Published
•
21
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model
Paper
•
2404.04167
•
Published
•
13
Rethinking Optimization and Architecture for Tiny Language Models
Paper
•
2402.02791
•
Published
•
13
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
Paper
•
2312.16862
•
Published
•
31
ProgCo: Program Helps Self-Correction of Large Language Models
Paper
•
2501.01264
•
Published
•
25
GReaTer: Gradients over Reasoning Makes Smaller Language Models Strong
Prompt Optimizers
Paper
•
2412.09722
•
Published
•
5
Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning
Paper
•
2412.09078
•
Published
AlphaVerus: Bootstrapping Formally Verified Code Generation through
Self-Improving Translation and Treefinement
Paper
•
2412.06176
•
Published
MC-NEST -- Enhancing Mathematical Reasoning in Large Language Models
with a Monte Carlo Nash Equilibrium Self-Refine Tree
Paper
•
2411.15645
•
Published
PerfCodeGen: Improving Performance of LLM Generated Code with Execution
Feedback
Paper
•
2412.03578
•
Published
•
1
Enhancing LLM Reasoning via Critique Models with Test-Time and
Training-Time Supervision
Paper
•
2411.16579
•
Published
•
2
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for
Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper
•
2412.13663
•
Published
•
126
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
Framework
Paper
•
2308.08155
•
Published
•
6
Virgo: A Preliminary Exploration on Reproducing o1-like MLLM
Paper
•
2501.01904
•
Published
•
31
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Paper
•
2501.01957
•
Published
•
42
SDPO: Segment-Level Direct Preference Optimization for Social Agents
Paper
•
2501.01821
•
Published
•
18
VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning
for Image and Video Generation
Paper
•
2412.21059
•
Published
•
18
Graph Generative Pre-trained Transformer
Paper
•
2501.01073
•
Published
•
17
LUSIFER: Language Universal Space Integration for Enhanced Multilingual
Embeddings with Large Language Models
Paper
•
2501.00874
•
Published
•
13
BoxingGym: Benchmarking Progress in Automated Experimental Design and
Model Discovery
Paper
•
2501.01540
•
Published
•
6
Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity,
Bias and Propensity for Hallucinations
Paper
•
2404.09785
•
Published
Gemma 2: Improving Open Language Models at a Practical Size
Paper
•
2408.00118
•
Published
•
76
Dispider: Enabling Video LLMs with Active Real-Time Interaction via
Disentangled Perception, Decision, and Reaction
Paper
•
2501.03218
•
Published
•
35
BoostStep: Boosting mathematical capability of Large Language Models via
improved single-step reasoning
Paper
•
2501.03226
•
Published
•
37
Test-time Computing: from System-1 Thinking to System-2 Thinking
Paper
•
2501.02497
•
Published
•
41
Personalized Graph-Based Retrieval for Large Language Models
Paper
•
2501.02157
•
Published
•
28
Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large
Language Models
Paper
•
2501.01830
•
Published
•
18
ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models
in Multi-Hop Tool Use
Paper
•
2501.02506
•
Published
•
11
Alpaca against Vicuna: Using LLMs to Uncover Memorization of LLMs
Paper
•
2403.04801
•
Published
Battle of the Large Language Models: Dolly vs LLaMA vs Vicuna vs Guanaco
vs Bard vs ChatGPT -- A Text-to-SQL Parsing Comparison
Paper
•
2310.10190
•
Published
MiniCPM: Unveiling the Potential of Small Language Models with Scalable
Training Strategies
Paper
•
2404.06395
•
Published
•
22
LLM Teacher-Student Framework for Text Classification With No Manually
Annotated Data: A Case Study in IPTC News Topic Classification
Paper
•
2411.19638
•
Published
•
6
Performance-Guided LLM Knowledge Distillation for Efficient Text
Classification at Scale
Paper
•
2411.05045
•
Published
Selecting Between BERT and GPT for Text Classification in Political
Science Research
Paper
•
2411.05050
•
Published
Improving Bilingual Capabilities of Language Models to Support Diverse
Linguistic Practices in Education
Paper
•
2411.04308
•
Published
CoCoP: Enhancing Text Classification with LLM through Code Completion
Prompt
Paper
•
2411.08979
•
Published
Introducing Super RAGs in Mistral 8x7B-v1
Paper
•
2404.08940
•
Published
•
2
OpenDevin: An Open Platform for AI Software Developers as Generalist
Agents
Paper
•
2407.16741
•
Published
•
70
The GAN is dead; long live the GAN! A Modern GAN Baseline
Paper
•
2501.05441
•
Published
•
87
On Computational Limits and Provably Efficient Criteria of Visual
Autoregressive Models: A Fine-Grained Complexity Analysis
Paper
•
2501.04377
•
Published
•
14
Are VLMs Ready for Autonomous Driving? An Empirical Study from the
Reliability, Data, and Metric Perspectives
Paper
•
2501.04003
•
Published
•
25
Entropy-Guided Attention for Private LLMs
Paper
•
2501.03489
•
Published
•
14
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep
Thinking
Paper
•
2501.04519
•
Published
•
253
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta
Chain-of-Though
Paper
•
2501.04682
•
Published
•
90
Agent Laboratory: Using LLM Agents as Research Assistants
Paper
•
2501.04227
•
Published
•
84
URSA: Understanding and Verifying Chain-of-thought Reasoning in
Multimodal Mathematics
Paper
•
2501.04686
•
Published
•
50
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Paper
•
2501.05366
•
Published
•
90
LLM4SR: A Survey on Large Language Models for Scientific Research
Paper
•
2501.04306
•
Published
•
33
InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning
and Reflection
Paper
•
2501.04575
•
Published
•
23
GeAR: Generation Augmented Retrieval
Paper
•
2501.02772
•
Published
•
23
Multi-task retriever fine-tuning for domain-specific and efficient RAG
Paper
•
2501.04652
•
Published
•
10
DPO Kernels: A Semantically-Aware, Kernel-Enhanced, and Divergence-Rich
Paradigm for Direct Preference Optimization
Paper
•
2501.03271
•
Published
•
11
o1-Coder: an o1 Replication for Coding
Paper
•
2412.00154
•
Published
•
43
Fast & Slow Learning: Incorporating Synthetic Gradients in Neural Memory
Controllers
Paper
•
2011.05438
•
Published
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
Paper
•
2501.06186
•
Published
•
60
Tensor Product Attention Is All You Need
Paper
•
2501.06425
•
Published
•
80
WebWalker: Benchmarking LLMs in Web Traversal
Paper
•
2501.07572
•
Published
•
19
Transformer^2: Self-adaptive LLMs
Paper
•
2501.06252
•
Published
•
53
Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains
Paper
•
2501.05707
•
Published
•
19
Demystifying Domain-adaptive Post-training for Financial LLMs
Paper
•
2501.04961
•
Published
•
11
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper
•
2501.08313
•
Published
•
272
A Multi-Modal AI Copilot for Single-Cell Analysis with Instruction
Following
Paper
•
2501.08187
•
Published
•
24
Diffusion Adversarial Post-Training for One-Step Video Generation
Paper
•
2501.08316
•
Published
•
32
FramePainter: Endowing Interactive Image Editing with Video Diffusion
Priors
Paper
•
2501.08225
•
Published
•
18
OpenCSG Chinese Corpus: A Series of High-quality Chinese Datasets for
LLM Training
Paper
•
2501.08197
•
Published
•
7
Potential and Perils of Large Language Models as Judges of Unstructured
Textual Data
Paper
•
2501.08167
•
Published
•
6
AfriHate: A Multilingual Collection of Hate Speech and Abusive Language
Datasets for African Languages
Paper
•
2501.08284
•
Published
•
6
HALoGEN: Fantastic LLM Hallucinations and Where to Find Them
Paper
•
2501.08292
•
Published
•
17
MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents
Paper
•
2501.08828
•
Published
•
30
Parameter-Inverted Image Pyramid Networks for Visual Perception and
Multimodal Understanding
Paper
•
2501.07783
•
Published
•
7
Multimodal LLMs Can Reason about Aesthetics in Zero-Shot
Paper
•
2501.09012
•
Published
•
10
Ouroboros-Diffusion: Exploring Consistent Content Generation in
Tuning-free Long Video Diffusion
Paper
•
2501.09019
•
Published
•
12
OmniThink: Expanding Knowledge Boundaries in Machine Writing through
Thinking
Paper
•
2501.09751
•
Published
•
47
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising
Steps
Paper
•
2501.09732
•
Published
•
67
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with
Large Language Models
Paper
•
2501.09686
•
Published
•
36
Evolving Deeper LLM Thinking
Paper
•
2501.09891
•
Published
•
105
Agent-R: Training Language Model Agents to Reflect via Iterative
Self-Training
Paper
•
2501.11425
•
Published
•
90
Demons in the Detail: On Implementing Load Balancing Loss for Training
Specialized Mixture-of-Expert Models
Paper
•
2501.11873
•
Published
•
63
Reasoning Language Models: A Blueprint
Paper
•
2501.11223
•
Published
•
31
Critique Fine-Tuning: Learning to Critique is More Effective than
Learning to Imitate
Paper
•
2501.17703
•
Published
•
50
Atla Selene Mini: A General Purpose Evaluation Model
Paper
•
2501.17195
•
Published
•
30
Exploring the sustainable scaling of AI dilemma: A projective study of
corporations' AI environmental impacts
Paper
•
2501.14334
•
Published
•
17
Early External Safety Testing of OpenAI's o3-mini: Insights from the
Pre-Deployment Evaluation
Paper
•
2501.17749
•
Published
•
12
Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing
Guardrail Moderation
Paper
•
2501.17433
•
Published
•
8
FastKV: KV Cache Compression for Fast Long-Context Processing with
Token-Selective Propagation
Paper
•
2502.01068
•
Published
•
14