Hexgrad PRO

hexgrad

AI & ML interests

Favorite water is distilled

Recent Activity

Organizations

None yet

hexgrad's activity

posted an update about 18 hours ago
view post
Post
1712
I wrote an article about G2P: https://hf.co/blog/hexgrad/g2p

G2P is an underrated piece of small TTS models, like offensive linemen who do a bunch of work and get no credit.

Instead of relying on explicit G2P, larger speech models implicitly learn this task by eating many thousands of hours of audio data. They often use a 500M+ parameter LLM at the front to predict latent audio tokens over a learned codebook, then decode these tokens into audio.

Kokoro instead relies on G2P preprocessing, is 82M parameters, and thus needs less audio to learn. Because of this, we can cherrypick high fidelity audio for training data, and deliver solid speech for those voices. In turn, this excellent audio quality & lack of background noise helps explain why Kokoro is very competitive in single-voice TTS Arenas.
  • 1 reply
·
replied to Keltezaa's post 2 days ago
view reply

I am considering canceling my Pro subscription because I have been banned from posting an Article for many weeks now with no explanation or recourse.

Also, the ability to Post and the Posts feed are vandalized by those AI slop posts where the OP runs all 12 reactions on their own post and uses alt accounts to do the same. And I have no ability to block these circlejerking accounts.

reacted to Pendrokar's post with ❤️ 4 days ago
view post
Post
2848
TTS: Added Kokoro v1, Parler Large, LlaSa 3B & MARS 6 TTS models to the Arena.
Pendrokar/TTS-Spaces-Arena

Also had added MaskGCT, GPT-SoVITS & OuteTTS a month ago. OuteTTS devs did say that is too early for it to be added to TTS Arenas.

Mars 5 does have a space with open weights models, but inference is way too slow (2 minutes+).
  • 2 replies
·
reacted to fdaudens's post with ❤️ 4 days ago
view post
Post
3204
🎯 Kokoro TTS just hit v1.0! 🚀

Small but mighty: 82M parameters, runs locally, speaks multiple languages. The best part? It's Apache 2.0 licensed!
This could unlock so many possibilities ✨

Check it out: hexgrad/Kokoro-82M
  • 1 reply
·
replied to their post 5 days ago
view reply

I'm sure they try, but 14.8 trillion tokens is likely too many to prune everything considered "sensitive", and I am confident there is enough in there to theoretically put together a coherent answer to many topics without hallucinating. I could be wrong, but I think R1 refuses due to mitigations, not for lack of knowing, and abliteration claims to be able to bypass such mitigations.

The question is simple: Is abliteration an effective method to uncensor DeepSeek-R1? There is some info on abliteration as it relates to 70b models and smaller, but I have not heard of anyone abliterating a 670B MOE, and due to size/compute constraints I cannot do it myself. If you are aware of such experiments, feel free to drop links.

replied to their post 5 days ago
view reply

I do not think the usual concern—that an abliterated model will hallucinate—applies to DeepSeek. It was trained on 14.8T tokens, right? Unless they have unheard levels of data cleaning, it seems totally infeasible to sweep all mentions of Tienanmen square, Winnie the Pooh, Taiwan, and so on from the dataset.

I suspect that the refusal is baked into the weights, but the knowledge has also gotta be in there somewhere. It is a matter of science to tinker with the weights to remove the refusal and unlock that knowledge. Perplexity may have done something like this already, but I am not sure if they used an enormous system prompt or they're RAG-ing it in, or both, or something else.

replied to their post 5 days ago
view reply

Can't I just hook the Inference widget up to the Kokoro-TTS Space that's already running?

posted an update 6 days ago
view post
Post
2659
Technical question: Is Abliteration still an effective method for uncensoring LLMs? Generally, what are the most effective methods to uncensor LLMs?

An effective uncensoring method would ideally be low-cost, data-efficient, and above all, successfully uncensor an LLM with minimal benchmark regressions.

"Tiananmen Square", "Winnie-the-Pooh", etc and more broadly "China influence/censorship" are some common criticisms leveled at DeepSeek.

I am vaguely aware of "Abliteration", a technique coined by @failspy (apologies if that attribution is incorrect) and originally described in a mid-2024 paper titled "Refusal in Language Models Is Mediated by a Single Direction" https://arxiv.org/abs/2406.11717

Abliteration is proposed as a relatively cheap and effective way to bypass censorship in models. However, it is not without criticism: https://www.reddit.com/r/LocalLLaMA/comments/1f07b4b/abliteration_fails_to_uncensor_models_while_it/

Curious to hear people's takes on Abliteration or other uncensoring methods, especially as it relates to DeepSeek.
·
replied to their post 6 days ago
replied to their post 7 days ago
view reply

1.0 and 0.23 are different models. 1.0 continued training from 0.23

posted an update 8 days ago
posted an update 13 days ago
view post
Post
3894
IMHO, being able & willing to defeat CAPTCHA, hCaptcha, or any other reasoning puzzle is a must-have for any Web-Browsing / Computer-Using Agent (WB/CUA).

I realize it subverts the purpose of CAPTCHA, but I do not think you can claim to be building AGI/agents without smoothly passing humanity checks. It would be like getting in a self-driving car that requires human intervention over speed bumps. Claiming AGI or even "somewhat powerful AI" seems hollow if you are halted by a mere CAPTCHA.

I imagine OpenAI's Operator is *able* but *not willing* to defeat CAPTCHA. Like their non-profit status, I expect that policy to evolve over time—and if not, rival agent-builders will attack that opening to offer a better product.
  • 2 replies
·
replied to their post 27 days ago
view reply

@to-be There are more details at https://hf.co/hexgrad/Kokoro-82M/discussions/21 and my Discord DMs are open if you have more questions, but essentially I am looking for segmented text-audio pairs: likely .txt and .wav pairs, with each .txt being ~500 characters or less (needs to fit inside 512 token context hard limit) and the .wav matching the text.

replied to their post about 1 month ago
view reply

It's simple: what you put in is what you get out. 😄 German support in the future depends mostly on how much German data (synthetic audio + text labels) is contributed.

replied to their post about 1 month ago
posted an update about 1 month ago
view post
Post
19533
📣 Looking for labeled, high-quality synthetic audio/TTS data 📣 Have you been or are you currently calling API endpoints from OpenAI, ElevenLabs, etc? Do you have labeled audio data sitting around gathering dust? Let's talk! Join https://discord.gg/QuGxSWBfQy or comment down below.

If your data exceeds quantity & quality thresholds and is approved into the next hexgrad/Kokoro-82M training mix, and you permissively DM me the data under an effective Apache license, then I will DM back the corresponding voicepacks for YOUR data if/when the next Apache-licensed Kokoro base model drops.

What does this mean? If you've been calling closed-source TTS or audio API endpoints to:
- Build voice agents
- Make long-form audio, like audiobooks or podcasts
- Handle customer support, etc
Then YOU can contribute to the training mix and get useful artifacts in return. ❤️

More details at hexgrad/Kokoro-82M#21
·
replied to their post about 1 month ago
posted an update about 1 month ago
view post
Post
8877
Happy New Year! 🌃 af_sky landed in Kokoro, along with an article: hexgrad/Kokoro-82M
  • 2 replies
·
posted an update about 1 month ago
view post
Post
2843
🇬🇧 Four British voices have joined hexgrad/Kokoro-82M (Apache TTS model): bf_emma, bf_isabella, bm_george, bm_lewis
posted an update about 1 month ago