pythia-helpful-epoch2 Collection Pythia-2.8b supervised finetuned and DPO finetuned with the helpful subset of Anthropic-hh-rlhf dataset for a second epoch. • 6 items • Updated Mar 12, 2024