RLHFlow

university

RLHFlow

Activity Feed

AI & ML interests

Workflow of Reinforcement Learning from Human Feedback (RLHF). Blog: https://rlhflow.github.io/

Recent Activity

Min-Li updated a collection 1 day ago

Decision-Tree Reward Models

Min-Li updated a dataset 1 day ago

RLHFlow/LLM-Preferences-HelpSteer2

Min-Li published a dataset 1 day ago

RLHFlow/LLM-Preferences-HelpSteer2

View all activity

RLHFlow's activity

Min-Li

updated a collection 1 day ago

Decision-Tree Reward Models

Collection

4 items • Updated 1 day ago • 1

Min-Li

updated a dataset 1 day ago

RLHFlow/LLM-Preferences-HelpSteer2

Viewer • Updated 1 day ago • 9.13k • 29

Min-Li

published a dataset 1 day ago

RLHFlow/LLM-Preferences-HelpSteer2

Viewer • Updated 1 day ago • 9.13k • 29

hendrydong

authored a paper 3 days ago

Reward-Guided Speculative Decoding for Efficient LLM Reasoning

Paper • 2501.19324 • Published 6 days ago • 32

Min-Li

updated a collection 9 days ago

Decision-Tree Reward Models

Collection

4 items • Updated 1 day ago • 1

Min-Li

updated 2 models 13 days ago

RLHFlow/Decision-Tree-Reward-Gemma-2-27B

Text Classification • Updated 13 days ago • 62 • 2

RLHFlow/Decision-Tree-Reward-Llama-3.1-8B

Text Classification • Updated 13 days ago • 326 • 1

Min-Li

published 2 models 15 days ago

RLHFlow/Decision-Tree-Reward-Gemma-2-27B

Text Classification • Updated 13 days ago • 62 • 2

RLHFlow/Decision-Tree-Reward-Llama-3.1-8B

Text Classification • Updated 13 days ago • 326 • 1

hendrydong

authored a paper about 2 months ago

Offline Reinforcement Learning for LLM Multi-Step Reasoning

Paper • 2412.16145 • Published Dec 20, 2024 • 38

hendrydong

in RLHFlow/LLaMA3.2-1B-SFT 3 months ago

the training data for this model?

#1 opened 3 months ago by

AIR-hl

weqweasdas

updated 3 datasets 3 months ago

weqweasdas

updated 2 models 3 months ago

RLHFlow/Llama3.1-8B-PRM-Mistral-Data

Text Generation • Updated Nov 9, 2024 • 297 • 8

RLHFlow/Llama3.1-8B-PRM-Deepseek-Data

Text Generation • Updated Nov 9, 2024 • 16.8k • 33

weqweasdas

updated 2 datasets 3 months ago

RLHFlow/Deepseek-ORM-Data

Viewer • Updated Nov 9, 2024 • 253k • 61 • 3

RLHFlow/Deepseek-PRM-Data

Viewer • Updated Nov 9, 2024 • 253k • 183 • 12

AI & ML interests

Recent Activity

Team members 7

RLHFlow's activity

the training data for this model?