Misaki G2P
G2P
I am considering canceling my Pro subscription because I have been banned from posting an Article for many weeks now with no explanation or recourse.
Also, the ability to Post and the Posts feed are vandalized by those AI slop posts where the OP runs all 12 reactions on their own post and uses alt accounts to do the same. And I have no ability to block these circlejerking accounts.
I'm sure they try, but 14.8 trillion tokens is likely too many to prune everything considered "sensitive", and I am confident there is enough in there to theoretically put together a coherent answer to many topics without hallucinating. I could be wrong, but I think R1 refuses due to mitigations, not for lack of knowing, and abliteration claims to be able to bypass such mitigations.
The question is simple: Is abliteration an effective method to uncensor DeepSeek-R1? There is some info on abliteration as it relates to 70b models and smaller, but I have not heard of anyone abliterating a 670B MOE, and due to size/compute constraints I cannot do it myself. If you are aware of such experiments, feel free to drop links.
I do not think the usual concern—that an abliterated model will hallucinate—applies to DeepSeek. It was trained on 14.8T tokens, right? Unless they have unheard levels of data cleaning, it seems totally infeasible to sweep all mentions of Tienanmen square, Winnie the Pooh, Taiwan, and so on from the dataset.
I suspect that the refusal is baked into the weights, but the knowledge has also gotta be in there somewhere. It is a matter of science to tinker with the weights to remove the refusal and unlock that knowledge. Perplexity may have done something like this already, but I am not sure if they used an enormous system prompt or they're RAG-ing it in, or both, or something else.