BramVanroy/CommonCrawl-CreativeCommons-v0.1
Viewer
β’
Updated
β’
64.7M
β’
13
β’
2
None defined yet.
scandeval
) to run your own benchmarks with. As part of project "Leesplank" (with Michiel Buisman and Maarten Lens-FitzGerald) we recently added GPT-4-1106-preview scores to add a good "target" to the leaderboard.load_dataset("BramVanroy/hplt_mono_v1_2", "nl_cleaned")
/*****/
). There were some useful brainstorms in that thread. I think the dataset is relatively easy for the model, leading it to quickly overfit when the beta is very small, which allows the model to step away further from its initially outputs./*****/
. So despite the good performance on the DPO objective and strong scores on the validation set (no overfitting), something seems to go wrong. Perhaps the outputs are too different and the task is too easy, in which case DPO is not useful. But why then would the model start hallucinating and repeating the same token over and over again?