Hi there - we recently fixed this issue and will release a new version for it soon!
Joshua
Xenova
AI & ML interests
None yet
Recent Activity
authored
a paper
about 2 hours ago
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language
Model
upvoted
a
paper
about 2 hours ago
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language
Model
updated
a Space
1 day ago
webml-community/next-client-template
Organizations
Xenova's activity
replied to
their
post
8 days ago
replied to
their
post
20 days ago
Hey! Oh that's awesome - great work! Feel free to adapt any code/logic of mine as you'd like!
posted
an
update
21 days ago
Post
4725
Introducing Kokoro.js, a new JavaScript library for running Kokoro TTS, an 82 million parameter text-to-speech model, 100% locally in the browser w/ WASM. Powered by π€ Transformers.js. WebGPU support coming soon!
π npm i kokoro-js π
Try it out yourself: webml-community/kokoro-web
Link to models/samples: onnx-community/Kokoro-82M-ONNX
You can get started in just a few lines of code!
Huge kudos to the Kokoro TTS community, especially taylorchu for the ONNX exports and Hexgrad for the amazing project! None of this would be possible without you all! π€
The model is also extremely resilient to quantization. The smallest variant is only 86 MB in size (down from the original 326 MB), with no noticeable difference in audio quality! π€―
π npm i kokoro-js π
Try it out yourself: webml-community/kokoro-web
Link to models/samples: onnx-community/Kokoro-82M-ONNX
You can get started in just a few lines of code!
import { KokoroTTS } from "kokoro-js";
const tts = await KokoroTTS.from_pretrained(
"onnx-community/Kokoro-82M-ONNX",
{ dtype: "q8" }, // fp32, fp16, q8, q4, q4f16
);
const text = "Life is like a box of chocolates. You never know what you're gonna get.";
const audio = await tts.generate(text,
{ voice: "af_sky" }, // See `tts.list_voices()`
);
audio.save("audio.wav");
Huge kudos to the Kokoro TTS community, especially taylorchu for the ONNX exports and Hexgrad for the amazing project! None of this would be possible without you all! π€
The model is also extremely resilient to quantization. The smallest variant is only 86 MB in size (down from the original 326 MB), with no noticeable difference in audio quality! π€―
reacted to
hexgrad's
post with π₯
25 days ago
Post
19535
π£ Looking for labeled, high-quality synthetic audio/TTS data π£ Have you been or are you currently calling API endpoints from OpenAI, ElevenLabs, etc? Do you have labeled audio data sitting around gathering dust? Let's talk! Join https://discord.gg/QuGxSWBfQy or comment down below.
If your data exceeds quantity & quality thresholds and is approved into the next hexgrad/Kokoro-82M training mix, and you permissively DM me the data under an effective Apache license, then I will DM back the corresponding voicepacks for YOUR data if/when the next Apache-licensed Kokoro base model drops.
What does this mean? If you've been calling closed-source TTS or audio API endpoints to:
- Build voice agents
- Make long-form audio, like audiobooks or podcasts
- Handle customer support, etc
Then YOU can contribute to the training mix and get useful artifacts in return. β€οΈ
More details at hexgrad/Kokoro-82M#21
If your data exceeds quantity & quality thresholds and is approved into the next hexgrad/Kokoro-82M training mix, and you permissively DM me the data under an effective Apache license, then I will DM back the corresponding voicepacks for YOUR data if/when the next Apache-licensed Kokoro base model drops.
What does this mean? If you've been calling closed-source TTS or audio API endpoints to:
- Build voice agents
- Make long-form audio, like audiobooks or podcasts
- Handle customer support, etc
Then YOU can contribute to the training mix and get useful artifacts in return. β€οΈ
More details at hexgrad/Kokoro-82M#21
posted
an
update
about 1 month ago
Post
8244
First project of 2025: Vision Transformer Explorer
I built a web app to interactively explore the self-attention maps produced by ViTs. This explains what the model is focusing on when making predictions, and provides insights into its inner workings! π€―
Try it out yourself! π
webml-community/attention-visualization
Source code: https://github.com/huggingface/transformers.js-examples/tree/main/attention-visualization
I built a web app to interactively explore the self-attention maps produced by ViTs. This explains what the model is focusing on when making predictions, and provides insights into its inner workings! π€―
Try it out yourself! π
webml-community/attention-visualization
Source code: https://github.com/huggingface/transformers.js-examples/tree/main/attention-visualization
replied to
their
post
about 1 month ago
For this demo, ~150MB if using WebGPU and ~120MB if using WASM.
posted
an
update
about 2 months ago
Post
4078
Introducing Moonshine Web: real-time speech recognition running 100% locally in your browser!
π Faster and more accurate than Whisper
π Privacy-focused (no data leaves your device)
β‘οΈ WebGPU accelerated (w/ WASM fallback)
π₯ Powered by ONNX Runtime Web and Transformers.js
Demo: webml-community/moonshine-web
Source code: https://github.com/huggingface/transformers.js-examples/tree/main/moonshine-web
π Faster and more accurate than Whisper
π Privacy-focused (no data leaves your device)
β‘οΈ WebGPU accelerated (w/ WASM fallback)
π₯ Powered by ONNX Runtime Web and Transformers.js
Demo: webml-community/moonshine-web
Source code: https://github.com/huggingface/transformers.js-examples/tree/main/moonshine-web
reacted to
fdaudens's
post with π
about 2 months ago
Post
1280
π Your AI toolkit just got a major upgrade! I updated the Journalists on Hugging Face community's collection with tools for investigative work, content creation, and data analysis.
Sharing these new additions with the links in case itβs helpful:
- @wendys-llc 's excellent 6-part video series on AI for investigative journalism https://www.youtube.com/playlist?list=PLewNEVDy7gq1_GPUaL0OQ31QsiHP5ncAQ
- @jeremycaplan 's curated AI Spaces on HF https://wondertools.substack.com/p/huggingface
- @Xenova 's Whisper Timestamped (with diarization!) for private, on-device transcription Xenova/whisper-speaker-diarization & Xenova/whisper-word-level-timestamps
- Flux models for image gen & LoRAs autotrain-projects/train-flux-lora-ease
- FineGrain's object cutter finegrain/finegrain-object-cutter and object eraser (this one's cool) finegrain/finegrain-object-eraser
- FineVideo: massive open-source annotated dataset + explorer HuggingFaceFV/FineVideo-Explorer
- Qwen2 chat demos, including 2.5 & multimodal versions (crushing it on handwriting recognition) Qwen/Qwen2.5 & Qwen/Qwen2-VL
- GOT-OCR integration stepfun-ai/GOT_official_online_demo
- HTML to Markdown converter maxiw/HTML-to-Markdown
- Text-to-SQL query tool by @davidberenstein1957 for HF datasets davidberenstein1957/text-to-sql-hub-datasets
There's a lot of potential here for journalism and beyond. Give these a try and let me know what you build!
You can also add your favorite ones if you're part of the community!
Check it out: https://huggingface.co/JournalistsonHF
#AIforJournalism #HuggingFace #OpenSourceAI
Sharing these new additions with the links in case itβs helpful:
- @wendys-llc 's excellent 6-part video series on AI for investigative journalism https://www.youtube.com/playlist?list=PLewNEVDy7gq1_GPUaL0OQ31QsiHP5ncAQ
- @jeremycaplan 's curated AI Spaces on HF https://wondertools.substack.com/p/huggingface
- @Xenova 's Whisper Timestamped (with diarization!) for private, on-device transcription Xenova/whisper-speaker-diarization & Xenova/whisper-word-level-timestamps
- Flux models for image gen & LoRAs autotrain-projects/train-flux-lora-ease
- FineGrain's object cutter finegrain/finegrain-object-cutter and object eraser (this one's cool) finegrain/finegrain-object-eraser
- FineVideo: massive open-source annotated dataset + explorer HuggingFaceFV/FineVideo-Explorer
- Qwen2 chat demos, including 2.5 & multimodal versions (crushing it on handwriting recognition) Qwen/Qwen2.5 & Qwen/Qwen2-VL
- GOT-OCR integration stepfun-ai/GOT_official_online_demo
- HTML to Markdown converter maxiw/HTML-to-Markdown
- Text-to-SQL query tool by @davidberenstein1957 for HF datasets davidberenstein1957/text-to-sql-hub-datasets
There's a lot of potential here for journalism and beyond. Give these a try and let me know what you build!
You can also add your favorite ones if you're part of the community!
Check it out: https://huggingface.co/JournalistsonHF
#AIforJournalism #HuggingFace #OpenSourceAI
posted
an
update
about 2 months ago
Post
3179
Introducing TTS WebGPU: The first ever text-to-speech web app built with WebGPU acceleration! π₯ High-quality and natural speech generation that runs 100% locally in your browser, powered by OuteTTS and Transformers.js. π€ Try it out yourself!
Demo: webml-community/text-to-speech-webgpu
Source code: https://github.com/huggingface/transformers.js-examples/tree/main/text-to-speech-webgpu
Model: onnx-community/OuteTTS-0.2-500M (ONNX), OuteAI/OuteTTS-0.2-500M (PyTorch)
Demo: webml-community/text-to-speech-webgpu
Source code: https://github.com/huggingface/transformers.js-examples/tree/main/text-to-speech-webgpu
Model: onnx-community/OuteTTS-0.2-500M (ONNX), OuteAI/OuteTTS-0.2-500M (PyTorch)
posted
an
update
2 months ago
Post
4062
We just released Transformers.js v3.1 and you're not going to believe what's now possible in the browser w/ WebGPU! π€― Let's take a look:
π Janus from Deepseek for unified multimodal understanding and generation (Text-to-Image and Image-Text-to-Text)
ποΈ Qwen2-VL from Qwen for dynamic-resolution image understanding
π’ JinaCLIP from Jina AI for general-purpose multilingual multimodal embeddings
π LLaVA-OneVision from ByteDance for Image-Text-to-Text generation
π€ΈββοΈ ViTPose for pose estimation
π MGP-STR for optical character recognition (OCR)
π PatchTST & PatchTSMixer for time series forecasting
That's right, everything running 100% locally in your browser (no data sent to a server)! π₯ Huge for privacy!
Check out the release notes for more information. π
https://github.com/huggingface/transformers.js/releases/tag/3.1.0
Demo link (+ source code): webml-community/Janus-1.3B-WebGPU
π Janus from Deepseek for unified multimodal understanding and generation (Text-to-Image and Image-Text-to-Text)
ποΈ Qwen2-VL from Qwen for dynamic-resolution image understanding
π’ JinaCLIP from Jina AI for general-purpose multilingual multimodal embeddings
π LLaVA-OneVision from ByteDance for Image-Text-to-Text generation
π€ΈββοΈ ViTPose for pose estimation
π MGP-STR for optical character recognition (OCR)
π PatchTST & PatchTSMixer for time series forecasting
That's right, everything running 100% locally in your browser (no data sent to a server)! π₯ Huge for privacy!
Check out the release notes for more information. π
https://github.com/huggingface/transformers.js/releases/tag/3.1.0
Demo link (+ source code): webml-community/Janus-1.3B-WebGPU
posted
an
update
3 months ago
Post
5709
Have you tried out π€ Transformers.js v3? Here are the new features:
β‘ WebGPU support (up to 100x faster than WASM)
π’ New quantization formats (dtypes)
π 120 supported architectures in total
π 25 new example projects and templates
π€ Over 1200 pre-converted models
π Node.js (ESM + CJS), Deno, and Bun compatibility
π‘ A new home on GitHub and NPM
Get started with
Learn more in our blog post: https://huggingface.co/blog/transformersjs-v3
β‘ WebGPU support (up to 100x faster than WASM)
π’ New quantization formats (dtypes)
π 120 supported architectures in total
π 25 new example projects and templates
π€ Over 1200 pre-converted models
π Node.js (ESM + CJS), Deno, and Bun compatibility
π‘ A new home on GitHub and NPM
Get started with
npm i @huggingface/transformers
.Learn more in our blog post: https://huggingface.co/blog/transformersjs-v3
reacted to
do-me's
post with ππ
5 months ago
Post
3370
SemanticFinder now supports WebGPU thanks to
@Xenova
's efforts with transformers.js v3!
Expect massive performance gains. Inferenced a whole book with 46k chunks in <5min. If your device doesn't support #WebGPU use the classic Wasm-based version:
- WebGPU: https://do-me.github.io/SemanticFinder/webgpu/
- Wasm: https://do-me.github.io/SemanticFinder/
WebGPU harnesses the full power of your hardware, no longer being restricted to just the CPU. The speedup is significant (4-60x) for all kinds of devices: consumer-grade laptops, heavy Nvidia GPU setups or Apple Silicon. Measure the difference for your device here: Xenova/webgpu-embedding-benchmark
Chrome currently works out of the box, Firefox requires some tweaking.
WebGPU + transformers.js allows to build amazing applications and make them accessible to everyone. E.g. SemanticFinder could become a simple GUI for populating your (vector) DB of choice. See the pre-indexed community texts here: do-me/SemanticFinder
Happy to hear your ideas!
Expect massive performance gains. Inferenced a whole book with 46k chunks in <5min. If your device doesn't support #WebGPU use the classic Wasm-based version:
- WebGPU: https://do-me.github.io/SemanticFinder/webgpu/
- Wasm: https://do-me.github.io/SemanticFinder/
WebGPU harnesses the full power of your hardware, no longer being restricted to just the CPU. The speedup is significant (4-60x) for all kinds of devices: consumer-grade laptops, heavy Nvidia GPU setups or Apple Silicon. Measure the difference for your device here: Xenova/webgpu-embedding-benchmark
Chrome currently works out of the box, Firefox requires some tweaking.
WebGPU + transformers.js allows to build amazing applications and make them accessible to everyone. E.g. SemanticFinder could become a simple GUI for populating your (vector) DB of choice. See the pre-indexed community texts here: do-me/SemanticFinder
Happy to hear your ideas!
We have Transformers.js, the JavaScript/WASM/WebGPU port of the python library, which supports ~100 different architectures.
Docs: https://huggingface.co/docs/transformers.js
Repo: http://github.com/xenova/transformers.js
Is that the kind of thing you're looking for? :)
posted
an
update
6 months ago
Post
13993
I can't believe this... Phi-3.5-mini (3.8B) running in-browser at ~90 tokens/second on WebGPU w/ Transformers.js and ONNX Runtime Web! π€― Since everything runs 100% locally, no messages are sent to a server β a huge win for privacy!
- π€ Demo: webml-community/phi-3.5-webgpu
- π§βπ» Source code: https://github.com/huggingface/transformers.js-examples/tree/main/phi-3.5-webgpu
- π€ Demo: webml-community/phi-3.5-webgpu
- π§βπ» Source code: https://github.com/huggingface/transformers.js-examples/tree/main/phi-3.5-webgpu
posted
an
update
6 months ago
Post
15033
I'm excited to announce that Transformers.js V3 is finally available on NPM! π₯ State-of-the-art Machine Learning for the web, now with WebGPU support! π€―β‘οΈ
Install it from NPM with:
πππ π @πππππππππππ/ππππππππππππ
or via CDN, for example: https://v2.scrimba.com/s0lmm0qh1q
Segment Anything demo: webml-community/segment-anything-webgpu
Install it from NPM with:
πππ π @πππππππππππ/ππππππππππππ
or via CDN, for example: https://v2.scrimba.com/s0lmm0qh1q
Segment Anything demo: webml-community/segment-anything-webgpu
posted
an
update
7 months ago
Post
8028
Introducing Whisper Diarization: Multilingual speech recognition with word-level timestamps and speaker segmentation, running 100% locally in your browser thanks to π€ Transformers.js!
Tested on this iconic Letterman interview w/ Grace Hopper from 1983!
- Demo: Xenova/whisper-speaker-diarization
- Source code: Xenova/whisper-speaker-diarization
Tested on this iconic Letterman interview w/ Grace Hopper from 1983!
- Demo: Xenova/whisper-speaker-diarization
- Source code: Xenova/whisper-speaker-diarization
posted
an
update
7 months ago
Post
6841
Introducing Whisper Timestamped: Multilingual speech recognition with word-level timestamps, running 100% locally in your browser thanks to π€ Transformers.js! Check it out!
π Xenova/whisper-word-level-timestamps π
This unlocks a world of possibilities for in-browser video editing! π€― What will you build? π
Source code: https://github.com/xenova/transformers.js/tree/v3/examples/whisper-word-timestamps
π Xenova/whisper-word-level-timestamps π
This unlocks a world of possibilities for in-browser video editing! π€― What will you build? π
Source code: https://github.com/xenova/transformers.js/tree/v3/examples/whisper-word-timestamps
replied to
their
post
7 months ago
Note: Since the API is experimental, you will need to install Chrome Dev/Canary version 127 or higher, and enable a few flags to get it working (see blog post for more detailed instructions)
posted
an
update
7 months ago
Post
6150
Chrome's new
We've also added experimental support to π€ Transformers.js!
- Demo: Xenova/experimental-built-in-ai-chat
- Blog post: https://huggingface.co/blog/Xenova/run-gemini-nano-in-your-browser
windowβ.ai
feature is going to change the web forever! π€― It allows you to run Gemini Nano, a powerful 3.25B parameter LLM, 100% locally in your browser!We've also added experimental support to π€ Transformers.js!
- Demo: Xenova/experimental-built-in-ai-chat
- Blog post: https://huggingface.co/blog/Xenova/run-gemini-nano-in-your-browser