I really like the style of your 1 minute video. I still remember the one you did for 3DGS a long time ago
Julien BLANCHON PRO
blanchon
AI & ML interests
Math
Recent Activity
liked
a model
1 day ago
ZhengPeng7/BiRefNet_HR
liked
a model
1 day ago
physical-intelligence/fast
upvoted
an
article
1 day ago
Ο0 and Ο0-FAST: Vision-Language-Action Models for General Robot Control
Organizations
blanchon's activity
replied to
dylanebert's
post
9 days ago
reacted to
dylanebert's
post with π₯
9 days ago
Post
2913
I made a 1 minute video explaining the DeepSeek situation
R1: deepseek-ai/DeepSeek-R1
Janus Pro: deepseek-ai/Janus-Pro-7B
R1: deepseek-ai/DeepSeek-R1
Janus Pro: deepseek-ai/Janus-Pro-7B
reacted to
hexgrad's
post with π₯
27 days ago
Post
19535
π£ Looking for labeled, high-quality synthetic audio/TTS data π£ Have you been or are you currently calling API endpoints from OpenAI, ElevenLabs, etc? Do you have labeled audio data sitting around gathering dust? Let's talk! Join https://discord.gg/QuGxSWBfQy or comment down below.
If your data exceeds quantity & quality thresholds and is approved into the next hexgrad/Kokoro-82M training mix, and you permissively DM me the data under an effective Apache license, then I will DM back the corresponding voicepacks for YOUR data if/when the next Apache-licensed Kokoro base model drops.
What does this mean? If you've been calling closed-source TTS or audio API endpoints to:
- Build voice agents
- Make long-form audio, like audiobooks or podcasts
- Handle customer support, etc
Then YOU can contribute to the training mix and get useful artifacts in return. β€οΈ
More details at hexgrad/Kokoro-82M#21
If your data exceeds quantity & quality thresholds and is approved into the next hexgrad/Kokoro-82M training mix, and you permissively DM me the data under an effective Apache license, then I will DM back the corresponding voicepacks for YOUR data if/when the next Apache-licensed Kokoro base model drops.
What does this mean? If you've been calling closed-source TTS or audio API endpoints to:
- Build voice agents
- Make long-form audio, like audiobooks or podcasts
- Handle customer support, etc
Then YOU can contribute to the training mix and get useful artifacts in return. β€οΈ
More details at hexgrad/Kokoro-82M#21
reacted to
Xenova's
post with π₯β€οΈ
about 2 months ago
Post
4078
Introducing Moonshine Web: real-time speech recognition running 100% locally in your browser!
π Faster and more accurate than Whisper
π Privacy-focused (no data leaves your device)
β‘οΈ WebGPU accelerated (w/ WASM fallback)
π₯ Powered by ONNX Runtime Web and Transformers.js
Demo: webml-community/moonshine-web
Source code: https://github.com/huggingface/transformers.js-examples/tree/main/moonshine-web
π Faster and more accurate than Whisper
π Privacy-focused (no data leaves your device)
β‘οΈ WebGPU accelerated (w/ WASM fallback)
π₯ Powered by ONNX Runtime Web and Transformers.js
Demo: webml-community/moonshine-web
Source code: https://github.com/huggingface/transformers.js-examples/tree/main/moonshine-web
reacted to
toshas's
post with ππ₯
about 2 months ago
Post
1252
Introducing β Marigold-DC β our training-free zero-shot approach to monocular Depth Completion with guided diffusion! If you have ever wondered how else a long denoising diffusion schedule can be useful, we have an answer for you!
Depth Completion addresses sparse, incomplete, or noisy measurements from photogrammetry or sensors like LiDAR. Sparse points arenβt just hard for humans to interpret β they also hinder downstream tasks.
Traditionally, depth completion was framed as image-guided depth interpolation. We leverage Marigold, a diffusion-based monodepth model, to reframe it as sparse-depth-guided depth generation. How the turntables! Check out the paper anyway π
π Website: https://marigolddepthcompletion.github.io/
π€ Demo: prs-eth/marigold-dc
π Paper: https://arxiv.org/abs/2412.13389
πΎ Code: https://github.com/prs-eth/marigold-dc
Team ETH ZΓΌrich: Massimiliano Viola ( @mviola ), Kevin Qu ( @KevinQu7 ), Nando Metzger ( @nandometzger ), Bingxin Ke ( @Bingxin ), Alexander Becker, Konrad Schindler, and Anton Obukhov ( @toshas ). We thank
Hugging Face for their continuous support.
Depth Completion addresses sparse, incomplete, or noisy measurements from photogrammetry or sensors like LiDAR. Sparse points arenβt just hard for humans to interpret β they also hinder downstream tasks.
Traditionally, depth completion was framed as image-guided depth interpolation. We leverage Marigold, a diffusion-based monodepth model, to reframe it as sparse-depth-guided depth generation. How the turntables! Check out the paper anyway π
π Website: https://marigolddepthcompletion.github.io/
π€ Demo: prs-eth/marigold-dc
π Paper: https://arxiv.org/abs/2412.13389
πΎ Code: https://github.com/prs-eth/marigold-dc
Team ETH ZΓΌrich: Massimiliano Viola ( @mviola ), Kevin Qu ( @KevinQu7 ), Nando Metzger ( @nandometzger ), Bingxin Ke ( @Bingxin ), Alexander Becker, Konrad Schindler, and Anton Obukhov ( @toshas ). We thank
Hugging Face for their continuous support.
reacted to
Jaward's
post with π₯π
about 2 months ago
Post
1807
Implements from first-principle a discrete flow matching model for code generation- trained a small sized 2D dfm model on two variations of code for binary search. The result was amazing, code in comment:
Code: https://github.com/Jaykef/ai-algorithms/blob/main/dfm.ipynb
Code: https://github.com/Jaykef/ai-algorithms/blob/main/dfm.ipynb
reacted to
mikonvergence's
post with π§ β€οΈ
about 2 months ago
Post
2260
πππ° πππ₯πππ¬π: πππ£π¨π« πππ ππ’π π’πππ₯ ππ₯ππ―πππ’π¨π§ ππ¨πππ₯ ππ±π©ππ§π¬π’π¨π§ πΊοΈ
Dataset: Major-TOM/Core-DEM
Today with European Space Agency - ESA and Adobe Research, we release a global expansion to Major TOM with GLO-30 DEM data.
You can now instantly access nearly 2M of Major TOM samples with elevation data to build your next AI model for EO. π
π Browse the data in our usual viewer app: Major-TOM/MajorTOM-Core-Viewer
Fantastic work championed by Paul Borne--Pons @NewtNewt π
Dataset: Major-TOM/Core-DEM
Today with European Space Agency - ESA and Adobe Research, we release a global expansion to Major TOM with GLO-30 DEM data.
You can now instantly access nearly 2M of Major TOM samples with elevation data to build your next AI model for EO. π
π Browse the data in our usual viewer app: Major-TOM/MajorTOM-Core-Viewer
Fantastic work championed by Paul Borne--Pons @NewtNewt π
reacted to
mkluczek's
post with π₯π
about 2 months ago
Post
1654
First Global and Dense Open Embedding Dataset of Earth! π π€
Introducing the Major TOM embeddings dataset, created in collaboration with CloudFerro S.A. πΆ and Ξ¦-lab at the European Space Agency (ESA) π°οΈ. Together with @mikonvergence and JΔdrzej S. Bojanowski, we present the first open-access dataset of Copernicus embeddings, offering dense, global coverage across the full acquisition areas of Sentinel-1 and Sentinel-2 sensors.
π‘ Highlights:
π Data: Over 8 million Sentinel-1 & Sentinel-2 images processed, distilling insights from 9.368 trillion pixels of raw data.
π§ Models: Foundation models include SigLIP, DINOv2, and SSL4EO.
π¦ Scale: 62 TB of raw satellite data processed into 170M+ embeddings.
This project delivers open and free vectorized expansions of Major-TOM/README datasets, setting a new standard for embedding releases and enabling lightweight, scalable ingestion of Earth Observation (EO) data for countless applications.
π€ Explore the datasets:
Major-TOM/Core-S2L1C-SSL4EO
Major-TOM/Core-S1RTC-SSL4EO
Major-TOM/Core-S2RGB-DINOv2
Major-TOM/Core-S2RGB-SigLIP
π Check paper: Global and Dense Embeddings of Earth: Major TOM Floating in the Latent Space (2412.05600)
π» Code notebook: https://github.com/ESA-PhiLab/Major-TOM/blob/main/05-Generate-Major-TOM-Embeddings.ipynb
Introducing the Major TOM embeddings dataset, created in collaboration with CloudFerro S.A. πΆ and Ξ¦-lab at the European Space Agency (ESA) π°οΈ. Together with @mikonvergence and JΔdrzej S. Bojanowski, we present the first open-access dataset of Copernicus embeddings, offering dense, global coverage across the full acquisition areas of Sentinel-1 and Sentinel-2 sensors.
π‘ Highlights:
π Data: Over 8 million Sentinel-1 & Sentinel-2 images processed, distilling insights from 9.368 trillion pixels of raw data.
π§ Models: Foundation models include SigLIP, DINOv2, and SSL4EO.
π¦ Scale: 62 TB of raw satellite data processed into 170M+ embeddings.
This project delivers open and free vectorized expansions of Major-TOM/README datasets, setting a new standard for embedding releases and enabling lightweight, scalable ingestion of Earth Observation (EO) data for countless applications.
π€ Explore the datasets:
Major-TOM/Core-S2L1C-SSL4EO
Major-TOM/Core-S1RTC-SSL4EO
Major-TOM/Core-S2RGB-DINOv2
Major-TOM/Core-S2RGB-SigLIP
π Check paper: Global and Dense Embeddings of Earth: Major TOM Floating in the Latent Space (2412.05600)
π» Code notebook: https://github.com/ESA-PhiLab/Major-TOM/blob/main/05-Generate-Major-TOM-Embeddings.ipynb
reacted to
davidberenstein1957's
post with ππ
2 months ago
Post
3451
The Data Is Better Together community is set to release the first Apache 2 licensed image preference dataset!
Great work and let's give this a final push :)
@aashish1904 congrats on your month of HF pro. There is more to win during this sprint!
@aashish1904 @AnyaDesdein @davidberenstein1957 @Malalatiana @beta3 @fffiloni @munish0838 @Reza2kn @bbunzeck @Creazycreator @andrei-saceleanu @jafhaponiuk @rca-etl @kf120 @burtenshaw @mmhamdy @grib0ed0v @Doopus @AnyaDes @ttkap @Xceron @Lewox @davanstrien @Azazelle @adirik @Ashish08 @AntonVic @kenantang @sdiazlor @g-ronimo @dennis-rall @prithivMLmods @girtss3 @flozi00 @WaveCut @Taylor658 @Wildminder @Sara9999 @phaelishall @sararob @dvilasuero @pgabrys @plaguss @CDS899 @timajwilliams @rudzinskimaciej @pavel-ai @aggr8 @ignacioct @MouseAI @Leeps @MaksKul @NicolasDmln @Muinez @kusht55 @caiolang @Jakub-Brand24 @loamy @Demijan @eliab96 @Viewegger @JosephCatrambone @p1atdev @mrshu @o639 @Targezed @Aviv-anthonnyolime @thliang01 @Ahmed-Amine @glards @pranaykoppula @nataliaElv @MaPirlet @alvarobartt @gabrielmbmb @zlicastro @Jaydip @Chouettecheveche @lilcheaty @ruyrdiaz @robintema @fdaudens @ggcristian @a-r-r-o-w @pates @joheras @stopsatgreen @bezo97 @chachi902 @iamyann @liamcripwell @dmb23 @korbih @anonymous7743 @akbdx18 @OVAWARE @severo @akontra @lichorosario @lhoestq @SebastianBodza @Vishnou @ameerazam08 @appoose @Mukei @mearco @joaquincabezas @Fizzarolli @thomastraum @igortopolski @OxxoCodes @patrickfleith @asoria @bn22 @sitammeur @Krodolf @bergr7f @Sbxxn @wietsevenema @sugatoray @Iamladi @MikeTrizna @feveromo @mokady @Bolero @prath @Dowwie @kfahn @decodingchris @alili2050 @RahulRaman @yzimmermann @Ameeeee @ecyht2 @MattMC001 @hemanthkumarak @Thegorgibus @akos2 @LawRun @ramithuh @SuperMuel @sjans @peterizsak @mosama @Eyel @mtr3 @cfahlgren1 @legentil @clem @Citaman @Aurelien-Morgan @AntoineBourgois @TotoB12 @Stanmey @osanseviero @multimodalart @maxiw @ariG23498 @ngk89 @femboysLover @dvs @tacohiddink @blanchon @DavidJimenez
Great work and let's give this a final push :)
@aashish1904 congrats on your month of HF pro. There is more to win during this sprint!
@aashish1904 @AnyaDesdein @davidberenstein1957 @Malalatiana @beta3 @fffiloni @munish0838 @Reza2kn @bbunzeck @Creazycreator @andrei-saceleanu @jafhaponiuk @rca-etl @kf120 @burtenshaw @mmhamdy @grib0ed0v @Doopus @AnyaDes @ttkap @Xceron @Lewox @davanstrien @Azazelle @adirik @Ashish08 @AntonVic @kenantang @sdiazlor @g-ronimo @dennis-rall @prithivMLmods @girtss3 @flozi00 @WaveCut @Taylor658 @Wildminder @Sara9999 @phaelishall @sararob @dvilasuero @pgabrys @plaguss @CDS899 @timajwilliams @rudzinskimaciej @pavel-ai @aggr8 @ignacioct @MouseAI @Leeps @MaksKul @NicolasDmln @Muinez @kusht55 @caiolang @Jakub-Brand24 @loamy @Demijan @eliab96 @Viewegger @JosephCatrambone @p1atdev @mrshu @o639 @Targezed @Aviv-anthonnyolime @thliang01 @Ahmed-Amine @glards @pranaykoppula @nataliaElv @MaPirlet @alvarobartt @gabrielmbmb @zlicastro @Jaydip @Chouettecheveche @lilcheaty @ruyrdiaz @robintema @fdaudens @ggcristian @a-r-r-o-w @pates @joheras @stopsatgreen @bezo97 @chachi902 @iamyann @liamcripwell @dmb23 @korbih @anonymous7743 @akbdx18 @OVAWARE @severo @akontra @lichorosario @lhoestq @SebastianBodza @Vishnou @ameerazam08 @appoose @Mukei @mearco @joaquincabezas @Fizzarolli @thomastraum @igortopolski @OxxoCodes @patrickfleith @asoria @bn22 @sitammeur @Krodolf @bergr7f @Sbxxn @wietsevenema @sugatoray @Iamladi @MikeTrizna @feveromo @mokady @Bolero @prath @Dowwie @kfahn @decodingchris @alili2050 @RahulRaman @yzimmermann @Ameeeee @ecyht2 @MattMC001 @hemanthkumarak @Thegorgibus @akos2 @LawRun @ramithuh @SuperMuel @sjans @peterizsak @mosama @Eyel @mtr3 @cfahlgren1 @legentil @clem @Citaman @Aurelien-Morgan @AntoineBourgois @TotoB12 @Stanmey @osanseviero @multimodalart @maxiw @ariG23498 @ngk89 @femboysLover @dvs @tacohiddink @blanchon @DavidJimenez
reacted to
reach-vb's
post with ππ₯
4 months ago
Post
2490
What a great day for Open Science!
@AIatMeta
released models, datasets, and code for many of its research artefacts! π₯
1. Meta Segment Anything Model 2.1: An updated checkpoint with improved results on visually similar objects, small objects and occlusion handling. A new developer suite will be added to make it easier for developers to build with SAM 2.
Model checkpoints: reach-vb/sam-21-6702d40defe7611a8bafa881
2. Layer Skip: Inference code and fine-tuned checkpoints demonstrating a new method for enhancing LLM performance.
Model checkpoints: facebook/layerskip-666b25c50c8ae90e1965727a
3. SALSA: New code enables researchers to benchmark AI-based attacks to validate security for post-quantum cryptography.
Repo: https://github.com/facebookresearch/LWE-benchmarking
4. Meta Lingua: A lightweight and self-contained codebase designed to train language models at scale.
Repo: https://github.com/facebookresearch/lingua
5. Meta Open Materials: New open source models and the largest dataset to accelerate AI-driven discovery of new inorganic materials.
Model checkpoints: fairchem/OMAT24
6. MEXMA: A new research paper and code for our novel pre-trained cross-lingual sentence encoder covering 80 languages.
Model checkpoint: facebook/MEXMA
7. Self-Taught Evaluator: a new method for generating synthetic preference data to train reward models without relying on human annotations.
Model checkpoint: facebook/Self-taught-evaluator-llama3.1-70B
8. Meta Spirit LM: An open-source language model for seamless speech and text integration.
Repo: https://github.com/facebookresearch/spiritlm
1. Meta Segment Anything Model 2.1: An updated checkpoint with improved results on visually similar objects, small objects and occlusion handling. A new developer suite will be added to make it easier for developers to build with SAM 2.
Model checkpoints: reach-vb/sam-21-6702d40defe7611a8bafa881
2. Layer Skip: Inference code and fine-tuned checkpoints demonstrating a new method for enhancing LLM performance.
Model checkpoints: facebook/layerskip-666b25c50c8ae90e1965727a
3. SALSA: New code enables researchers to benchmark AI-based attacks to validate security for post-quantum cryptography.
Repo: https://github.com/facebookresearch/LWE-benchmarking
4. Meta Lingua: A lightweight and self-contained codebase designed to train language models at scale.
Repo: https://github.com/facebookresearch/lingua
5. Meta Open Materials: New open source models and the largest dataset to accelerate AI-driven discovery of new inorganic materials.
Model checkpoints: fairchem/OMAT24
6. MEXMA: A new research paper and code for our novel pre-trained cross-lingual sentence encoder covering 80 languages.
Model checkpoint: facebook/MEXMA
7. Self-Taught Evaluator: a new method for generating synthetic preference data to train reward models without relying on human annotations.
Model checkpoint: facebook/Self-taught-evaluator-llama3.1-70B
8. Meta Spirit LM: An open-source language model for seamless speech and text integration.
Repo: https://github.com/facebookresearch/spiritlm
reacted to
AdinaY's
post with π
4 months ago
Post
2237
China is advancing rapidly in AI technology while maintaining a strong focus on governance π¨π³π
We've collected key AI governance documents released since 2017 and will continue updating them in this organization on the hub πChina LLMs on Hugging Face
β¨ zh-ai-community/china-ai-policy-research
Any feedback is welcomeπ€
We've collected key AI governance documents released since 2017 and will continue updating them in this organization on the hub πChina LLMs on Hugging Face
β¨ zh-ai-community/china-ai-policy-research
Any feedback is welcomeπ€
Amazing! About OMAT24, it could be cool to have a category/tag for material science dataset. Because they are pretty hard to search
reacted to
reach-vb's
post with π₯
4 months ago
Post
5502
Multimodal Ichigo Llama 3.1 - Real Time Voice AI π₯
> WhisperSpeech X Llama 3.1 8B
> Trained on 50K hours of speech (7 languages)
> Continually trained on 45hrs 10x A1000s
> MLS -> WhisperVQ tokens -> Llama 3.1
> Instruction tuned on 1.89M samples
> 70% speech, 20% transcription, 10% text
> Apache 2.0 licensed β‘
Architecture:
> WhisperSpeech/ VQ for Semantic Tokens
> Llama 3.1 8B Instruct for Text backbone
> Early fusion (Chameleon)
I'm super bullish on HomeBrew/ Jan and early fusion, audio and text, multimodal models!
(P.S. Play with the demo on Hugging Face: jan-hq/Ichigo-llama3.1-s-instruct)
> WhisperSpeech X Llama 3.1 8B
> Trained on 50K hours of speech (7 languages)
> Continually trained on 45hrs 10x A1000s
> MLS -> WhisperVQ tokens -> Llama 3.1
> Instruction tuned on 1.89M samples
> 70% speech, 20% transcription, 10% text
> Apache 2.0 licensed β‘
Architecture:
> WhisperSpeech/ VQ for Semantic Tokens
> Llama 3.1 8B Instruct for Text backbone
> Early fusion (Chameleon)
I'm super bullish on HomeBrew/ Jan and early fusion, audio and text, multimodal models!
(P.S. Play with the demo on Hugging Face: jan-hq/Ichigo-llama3.1-s-instruct)