jinwon kim's picture

2 2 11

jinwon kim

whooray

·

AI & ML interests

None yet

Recent Activity

reacted to hexgrad's post with 👍 about 3 hours ago

I wrote an article about G2P: https://hf.co/blog/hexgrad/g2p G2P is an underrated piece of small TTS models, like offensive linemen who do a bunch of work and get no credit. Instead of relying on explicit G2P, larger speech models implicitly learn this task by eating many thousands of hours of audio data. They often use a 500M+ parameter LLM at the front to predict latent audio tokens over a learned codebook, then decode these tokens into audio. Kokoro instead relies on G2P preprocessing, is 82M parameters, and thus needs less audio to learn. Because of this, we can cherrypick high fidelity audio for training data, and deliver solid speech for those voices. In turn, this excellent audio quality & lack of background noise helps explain why Kokoro is very competitive in single-voice TTS Arenas.

updated a model 2 months ago

ai-human-lab/EEVE-Korean-10.8B-RAFT

liked a dataset 2 months ago

HuggingFaceFV/finevideo

View all activity

Organizations

spaces 3

Demucs Cpu

Live Portrait

Apply the motion of a video on a portrait

No application file

Mean Iou

models 6

whooray/segformer-b0-finetuned-occludedface

Updated Oct 27, 2023

whooray/stable_diffusion_tensorrt_inpaint

Updated Jul 22, 2023

whooray/bald_hairstyle_lora

Updated Jul 16, 2023

whooray/stable_diffusion_fill_inpaint

Updated Jul 14, 2023

whooray/realistic-vision-1.3-inpainting

Text-to-Image • Updated May 9, 2023 • 3

whooray/face-parsing.PyTorch

Updated Jan 26, 2023

datasets 2

whooray/idefics2-embeddings

Updated Jul 18, 2024 • 22

whooray/ko_Ultrafeedback_binarized

Viewer • Updated Feb 19, 2024 • 62k • 29