PKU-SafeRLHF Collection A safety alignment preference dataset for llama family models • 4 items • Updated Jul 16, 2024 • 1
ProgressGym: Alignment with a Millennium of Moral Progress Paper • 2406.20087 • Published Jun 28, 2024 • 3
Safe RLHF: Safe Reinforcement Learning from Human Feedback Paper • 2310.12773 • Published Oct 19, 2023 • 28
Safe RLHF: Safe Reinforcement Learning from Human Feedback Paper • 2310.12773 • Published Oct 19, 2023 • 28