The Next Step Forward in Multimodal LLM Alignment

[2025/02/10] ๐Ÿ”ฅ We are proud to open-source MM-RLHF, a comprehensive project for aligning Multimodal Large Language Models (MLLMs) with human preferences. This release includes:

  • A high-quality MLLM alignment dataset.
  • A strong Critique-Based MLLM reward model and its training algorithm.
  • A novel alignment algorithm MM-DPO.
  • Two new benchmarks.

Our dataset and algorithms enable consistent performance improvements across 10 dimensions and 27 benchmarks.

Use

Intended use

The model was trained on MM-RLHF data and have the ability to interact with images, multi-image and videos.

image/png

Feel free to share your generations in the Community tab!

Generation

We provide the simple generation process for using our model. For more details, you could refer to Github.

Citation

If you find it useful for your research and applications, please cite related papers/blogs using this BibTeX:


Downloads last month
49
Safetensors
Model size
8.04B params
Tensor type
BF16
ยท
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.