Hexgrad PRO

hexgrad

AI & ML interests

Favorite water is distilled

Recent Activity

Organizations

None yet

hexgrad's activity

posted an update about 17 hours ago
view post
Post
1629
I wrote an article about G2P: https://hf.co/blog/hexgrad/g2p

G2P is an underrated piece of small TTS models, like offensive linemen who do a bunch of work and get no credit.

Instead of relying on explicit G2P, larger speech models implicitly learn this task by eating many thousands of hours of audio data. They often use a 500M+ parameter LLM at the front to predict latent audio tokens over a learned codebook, then decode these tokens into audio.

Kokoro instead relies on G2P preprocessing, is 82M parameters, and thus needs less audio to learn. Because of this, we can cherrypick high fidelity audio for training data, and deliver solid speech for those voices. In turn, this excellent audio quality & lack of background noise helps explain why Kokoro is very competitive in single-voice TTS Arenas.
  • 1 reply
·
published an article about 17 hours ago
New activity in hexgrad/Kokoro-TTS about 18 hours ago
replied to Keltezaa's post 2 days ago
view reply

I am considering canceling my Pro subscription because I have been banned from posting an Article for many weeks now with no explanation or recourse.

Also, the ability to Post and the Posts feed are vandalized by those AI slop posts where the OP runs all 12 reactions on their own post and uses alt accounts to do the same. And I have no ability to block these circlejerking accounts.

New activity in hexgrad/Kokoro-TTS 2 days ago

Update README.md

#18 opened 4 days ago by
Meroar
reacted to Pendrokar's post with ❤️ 4 days ago
view post
Post
2847
TTS: Added Kokoro v1, Parler Large, LlaSa 3B & MARS 6 TTS models to the Arena.
Pendrokar/TTS-Spaces-Arena

Also had added MaskGCT, GPT-SoVITS & OuteTTS a month ago. OuteTTS devs did say that is too early for it to be added to TTS Arenas.

Mars 5 does have a space with open weights models, but inference is way too slow (2 minutes+).
  • 2 replies
·
reacted to fdaudens's post with ❤️ 4 days ago
view post
Post
3203
🎯 Kokoro TTS just hit v1.0! 🚀

Small but mighty: 82M parameters, runs locally, speaks multiple languages. The best part? It's Apache 2.0 licensed!
This could unlock so many possibilities ✨

Check it out: hexgrad/Kokoro-82M
  • 1 reply
·
New activity in hexgrad/Kokoro-82M 4 days ago
replied to their post 5 days ago
view reply

I'm sure they try, but 14.8 trillion tokens is likely too many to prune everything considered "sensitive", and I am confident there is enough in there to theoretically put together a coherent answer to many topics without hallucinating. I could be wrong, but I think R1 refuses due to mitigations, not for lack of knowing, and abliteration claims to be able to bypass such mitigations.

The question is simple: Is abliteration an effective method to uncensor DeepSeek-R1? There is some info on abliteration as it relates to 70b models and smaller, but I have not heard of anyone abliterating a 670B MOE, and due to size/compute constraints I cannot do it myself. If you are aware of such experiments, feel free to drop links.

replied to their post 5 days ago
view reply

I do not think the usual concern—that an abliterated model will hallucinate—applies to DeepSeek. It was trained on 14.8T tokens, right? Unless they have unheard levels of data cleaning, it seems totally infeasible to sweep all mentions of Tienanmen square, Winnie the Pooh, Taiwan, and so on from the dataset.

I suspect that the refusal is baked into the weights, but the knowledge has also gotta be in there somewhere. It is a matter of science to tinker with the weights to remove the refusal and unlock that knowledge. Perplexity may have done something like this already, but I am not sure if they used an enormous system prompt or they're RAG-ing it in, or both, or something else.