RLHFlow

university

RLHFlow

RLHFlow

AI & ML interests

Workflow of Reinforcement Learning from Human Feedback (RLHF). Blog: https://rlhflow.github.io/

Recent Activity

Min-Li updated a collection 1 day ago

Decision-Tree Reward Models

Min-Li updated a dataset 1 day ago

RLHFlow/LLM-Preferences-HelpSteer2

Min-Li published a dataset 1 day ago

RLHFlow/LLM-Preferences-HelpSteer2

View all activity

Collections 9

models 21

RLHFlow/Decision-Tree-Reward-Gemma-2-27B

Text Classification • Updated 13 days ago • 62 • 2

RLHFlow/Decision-Tree-Reward-Llama-3.1-8B

Text Classification • Updated 13 days ago • 326 • 1

RLHFlow/Llama3.1-8B-PRM-Mistral-Data

Text Generation • Updated Nov 9, 2024 • 297 • 8

RLHFlow/Llama3.1-8B-PRM-Deepseek-Data

Text Generation • Updated Nov 9, 2024 • 16.8k • 33

RLHFlow/Llama3.1-8B-ORM-Deepseek-Data

Text Generation • Updated Nov 9, 2024 • 672

RLHFlow/Llama3.1-8B-ORM-Mistral-Data

Text Generation • Updated Nov 9, 2024 • 130

RLHFlow/Llama3-v2-iterative-DPO-iter3

Text Generation • Updated Nov 4, 2024 • 183 • 1

RLHFlow/Llama3-v2-iterative-DPO-iter2

Text Generation • Updated Nov 4, 2024 • 14

RLHFlow/Llama3-v2-iterative-DPO-iter1

Text Generation • Updated Nov 4, 2024 • 15

RLHFlow/LLaMA3-SFT-v2

Text Generation • Updated Nov 3, 2024 • 401 • 2

datasets 65

RLHFlow/LLM-Preferences-HelpSteer2

Viewer • Updated 1 day ago • 9.13k • 29

RLHFlow/DS-and-Mistral-PRM-Data

Viewer • Updated Nov 10, 2024 • 526k • 49

RLHFlow/Deepseek-MATH500-Test

Viewer • Updated Nov 9, 2024 • 500 • 168

RLHFlow/Mistral-MATH500-Test

Viewer • Updated Nov 9, 2024 • 500 • 153

RLHFlow/Deepseek-ORM-Data

Viewer • Updated Nov 9, 2024 • 253k • 61 • 3

RLHFlow/Deepseek-PRM-Data

Viewer • Updated Nov 9, 2024 • 253k • 183 • 12

RLHFlow/Mistral-ORM-Data

Viewer • Updated Nov 9, 2024 • 273k • 125 • 2

RLHFlow/Mistral-PRM-Data

Viewer • Updated Nov 9, 2024 • 273k • 143 • 10

RLHFlow/Mistral-MATH500-Test-Result-of-Mistral-PRM

Viewer • Updated Nov 8, 2024 • 500 • 37

RLHFlow/Mistral-MATH500-Test-Result-of-Mistral-ORM

Viewer • Updated Nov 8, 2024 • 500 • 37