AI & ML interests
Workflow of Reinforcement Learning from Human Feedback (RLHF). Blog: https://rlhflow.github.io/
Recent Activity
View all activity
models
21
![](https://cdn-avatars.huggingface.co/v1/production/uploads/638fb8cf2380ffd99caf8c2a/xTHSf1YDQDriY5eZ7cn_1.jpeg)
RLHFlow/Decision-Tree-Reward-Gemma-2-27B
Text Classification
•
Updated
•
62
•
2
![](https://cdn-avatars.huggingface.co/v1/production/uploads/638fb8cf2380ffd99caf8c2a/xTHSf1YDQDriY5eZ7cn_1.jpeg)
RLHFlow/Decision-Tree-Reward-Llama-3.1-8B
Text Classification
•
Updated
•
326
•
1
![](https://cdn-avatars.huggingface.co/v1/production/uploads/638fb8cf2380ffd99caf8c2a/xTHSf1YDQDriY5eZ7cn_1.jpeg)
RLHFlow/Llama3.1-8B-PRM-Mistral-Data
Text Generation
•
Updated
•
297
•
8
![](https://cdn-avatars.huggingface.co/v1/production/uploads/638fb8cf2380ffd99caf8c2a/xTHSf1YDQDriY5eZ7cn_1.jpeg)
RLHFlow/Llama3.1-8B-PRM-Deepseek-Data
Text Generation
•
Updated
•
16.8k
•
33
![](https://cdn-avatars.huggingface.co/v1/production/uploads/638fb8cf2380ffd99caf8c2a/xTHSf1YDQDriY5eZ7cn_1.jpeg)
RLHFlow/Llama3.1-8B-ORM-Deepseek-Data
Text Generation
•
Updated
•
672
![](https://cdn-avatars.huggingface.co/v1/production/uploads/638fb8cf2380ffd99caf8c2a/xTHSf1YDQDriY5eZ7cn_1.jpeg)
RLHFlow/Llama3.1-8B-ORM-Mistral-Data
Text Generation
•
Updated
•
130
![](https://cdn-avatars.huggingface.co/v1/production/uploads/638fb8cf2380ffd99caf8c2a/xTHSf1YDQDriY5eZ7cn_1.jpeg)
RLHFlow/Llama3-v2-iterative-DPO-iter3
Text Generation
•
Updated
•
183
•
1
![](https://cdn-avatars.huggingface.co/v1/production/uploads/638fb8cf2380ffd99caf8c2a/xTHSf1YDQDriY5eZ7cn_1.jpeg)
RLHFlow/Llama3-v2-iterative-DPO-iter2
Text Generation
•
Updated
•
14
![](https://cdn-avatars.huggingface.co/v1/production/uploads/638fb8cf2380ffd99caf8c2a/xTHSf1YDQDriY5eZ7cn_1.jpeg)
RLHFlow/Llama3-v2-iterative-DPO-iter1
Text Generation
•
Updated
•
15
![](https://cdn-avatars.huggingface.co/v1/production/uploads/638fb8cf2380ffd99caf8c2a/xTHSf1YDQDriY5eZ7cn_1.jpeg)
RLHFlow/LLaMA3-SFT-v2
Text Generation
•
Updated
•
401
•
2
datasets
65
RLHFlow/LLM-Preferences-HelpSteer2
Viewer
•
Updated
•
9.13k
•
29
RLHFlow/DS-and-Mistral-PRM-Data
Viewer
•
Updated
•
526k
•
49
RLHFlow/Deepseek-MATH500-Test
Viewer
•
Updated
•
500
•
168
RLHFlow/Mistral-MATH500-Test
Viewer
•
Updated
•
500
•
153
RLHFlow/Deepseek-ORM-Data
Viewer
•
Updated
•
253k
•
61
•
3
RLHFlow/Deepseek-PRM-Data
Viewer
•
Updated
•
253k
•
183
•
12
RLHFlow/Mistral-ORM-Data
Viewer
•
Updated
•
273k
•
125
•
2
RLHFlow/Mistral-PRM-Data
Viewer
•
Updated
•
273k
•
143
•
10
RLHFlow/Mistral-MATH500-Test-Result-of-Mistral-PRM
Viewer
•
Updated
•
500
•
37
RLHFlow/Mistral-MATH500-Test-Result-of-Mistral-ORM
Viewer
•
Updated
•
500
•
37