Resources for the paper Preference Optimization for Reasoning with Pseudo Feedback (ICLR 2025)
-
Preference Optimization for Reasoning with Pseudo Feedback
Paper • 2411.16345 • Published • 1 -
chitanda/mathscale4o-800k
Viewer • Updated • 492k • 1 -
Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing
Paper • 2402.00658 • Published -
chitanda/code-synthetic-test-cases
Preview • Updated • 2