Evaluation on the Hub

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

autoevaluate's activity

sayakpaulΒ 
posted an update 7 days ago
view post
Post
1750
We have been cooking a couple of fine-tuning runs on CogVideoX with finetrainers, smol datasets, and LoRA to generate cool video effects like crushing, dissolving, etc.

We are also releasing a LoRA extraction utility from a fully fine-tuned checkpoint. I know that kind of stuff has existed since eternity, but the quality on video models was nothing short of spectacular. Below are some links:

* Models and datasets: https://huggingface.co/finetrainers
* finetrainers: https://github.com/a-r-r-o-w/finetrainers
* LoRA extraction: https://github.com/huggingface/diffusers/blob/main/scripts/extract_lora_from_model.py
  • 1 reply
Β·
dylanebertΒ 
posted an update 9 days ago
sayakpaulΒ 
posted an update 10 days ago
view post
Post
1898
We have authored a post to go over the state of video generation in the Diffusers ecosystem 🧨

We cover the models supported, the knobs of optims our users can fire, fine-tuning, and more πŸ”₯

5-6GBs for HunyuanVideo, sky is the limit 🌌 πŸ€—
https://huggingface.co/blog/video_gen
lewtunΒ 
posted an update 12 days ago
view post
Post
9876
We are reproducing the full DeepSeek R1 data and training pipeline so everybody can use their recipe. Instead of doing it in secret we can do it together in the open!

πŸ§ͺ Step 1: replicate the R1-Distill models by distilling a high-quality reasoning corpus from DeepSeek-R1.

🧠 Step 2: replicate the pure RL pipeline that DeepSeek used to create R1-Zero. This will involve curating new, large-scale datasets for math, reasoning, and code.

πŸ”₯ Step 3: show we can go from base model -> SFT -> RL via multi-stage training.

Follow along: https://github.com/huggingface/open-r1
Β·
dylanebertΒ 
posted an update 13 days ago
view post
Post
624
βš™οΈ Convert .ply to .splat

i've created a simple space to convert .ply gaussian splat files to .splat format

dylanebert/ply-to-splat
dylanebertΒ 
posted an update 27 days ago
view post
Post
1990
🟦 New Image-to-3D model from Stability AI

stabilityai/stable-point-aware-3d

here's how it looks, with TRELLIS for comparison
jeffboudierΒ 
posted an update about 1 month ago
view post
Post
597
NVIDIA just announced the Cosmos World Foundation Models, available on the Hub: nvidia/cosmos-6751e884dc10e013a0a0d8e6

Cosmos is a family of pre-trained models purpose-built for generating physics-aware videos and world states to advance physical AI development.
The release includes Tokenizers nvidia/cosmos-tokenizer-672b93023add81b66a8ff8e6

Learn more in this great community article by @mingyuliutw and @PranjaliJoshi https://huggingface.co/blog/mingyuliutw/nvidia-cosmos
  • 1 reply
Β·
lewtunΒ 
posted an update about 1 month ago
view post
Post
3806
I was initially pretty sceptical about Meta's Coconut paper [1] because the largest perf gains were reported on toy linguistic problems. However, these results on machine translation are pretty impressive!

https://x.com/casper_hansen_/status/1875872309996855343

Together with the recent PRIME method [2] for scaling RL, reasoning for open models is looking pretty exciting for 2025!

[1] Training Large Language Models to Reason in a Continuous Latent Space (2412.06769)
[2] https://huggingface.co/blog/ganqu/prime
lewtunΒ 
posted an update about 1 month ago
view post
Post
2266
This paper ( HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs (2412.18925)) has a really interesting recipe for inducing o1-like behaviour in Llama models:

* Iteratively sample CoTs from the model, using a mix of different search strategies. This gives you something like Stream of Search via prompting.
* Verify correctness of each CoT using GPT-4o (needed because exact match doesn't work well in medicine where there are lots of aliases)
* Use GPT-4o to reformat the concatenated CoTs into a single stream that includes smooth transitions like "hmm, wait" etc that one sees in o1
* Use the resulting data for SFT & RL
* Use sparse rewards from GPT-4o to guide RL training. They find RL gives an average ~3 point boost across medical benchmarks and SFT on this data already gives a strong improvement.

Applying this strategy to other domains could be quite promising, provided the training data can be formulated with verifiable problems!
  • 1 reply
Β·
sayakpaulΒ 
posted an update about 1 month ago
view post
Post
4327
Commits speak louder than words πŸ€ͺ

* 4 new video models
* Multiple image models, including SANA & Flux Control
* New quantizers -> GGUF & TorchAO
* New training scripts

Enjoy this holiday-special Diffusers release πŸ€—
Notes: https://github.com/huggingface/diffusers/releases/tag/v0.32.0
sayakpaulΒ 
posted an update about 2 months ago
view post
Post
2151
In the past seven days, the Diffusers team has shipped:

1. Two new video models
2. One new image model
3. Two new quantization backends
4. Three new fine-tuning scripts
5. Multiple fixes and library QoL improvements

Coffee on me if someone can guess 1 - 4 correctly.
  • 1 reply
Β·
lewtunΒ 
posted an update about 2 months ago
view post
Post
6827
We outperform Llama 70B with Llama 3B on hard math by scaling test-time compute πŸ”₯

How? By combining step-wise reward models with tree search algorithms :)

We show that smol models can match or exceed the performance of their much larger siblings when given enough "time to think"

We're open sourcing the full recipe and sharing a detailed blog post.

In our blog post we cover:

πŸ“ˆ Compute-optimal scaling: How we implemented DeepMind's recipe to boost the mathematical capabilities of open models at test-time.

πŸŽ„ Diverse Verifier Tree Search (DVTS): An unpublished extension we developed to the verifier-guided tree search technique. This simple yet effective method improves diversity and delivers better performance, particularly at large test-time compute budgets.

🧭 Search and Learn: A lightweight toolkit for implementing search strategies with LLMs and built for speed with vLLM

Here's the links:

- Blog post: HuggingFaceH4/blogpost-scaling-test-time-compute

- Code: https://github.com/huggingface/search-and-learn

Enjoy!
  • 2 replies
Β·
dylanebertΒ 
posted an update about 2 months ago
view post
Post
2290
TRELLIS is now the highest ranked open-source model in the 3D Arena Leaderboard, surpassing InstantMesh

dylanebert/3d-arena
  • 1 reply
Β·
lhoestqΒ 
posted an update about 2 months ago
view post
Post
1796
Made a HF Dataset editor a la gg sheets here: lhoestq/dataset-spreadsheets

With Dataset Spreadsheets:
✏️ Edit datasets in the UI
πŸ”— Share link with collaborators
🐍 Use locally in DuckDB or Python

Available for the 100,000+ parquet datasets on HF :)
sayakpaulΒ 
posted an update about 2 months ago
view post
Post
2116
Introducing a high-quality open-preference dataset to further this line of research for image generation.

Despite being such an inseparable component for modern image generation, open preference datasets are a rarity!

So, we decided to work on one with the community!

Check it out here:
https://huggingface.co/blog/image-preferences
  • 7 replies
Β·
thomwolfΒ 
posted an update about 2 months ago
view post
Post
5266
We are proud to announce HuggingFaceFW/fineweb-2: A sparkling update to HuggingFaceFW/fineweb with 1000s of πŸ—£οΈlanguages.

We applied the same data-driven approach that led to SOTA English performance in🍷 FineWeb to thousands of languages.

πŸ₯‚ FineWeb2 has 8TB of compressed text data and outperforms other multilingual datasets in our experiments.

The dataset is released under the permissive πŸ“œ ODC-By 1.0 license, and the πŸ’» code to reproduce it and our evaluations is public.

We will very soon announce a big community project, and are working on a πŸ“ blogpost walking you through the entire dataset creation process. Stay tuned!

In the mean time come ask us question on our chat place: HuggingFaceFW/discussion

H/t @guipenedo @hynky @lvwerra as well as @vsabolcec Bettina Messmer @negar-foroutan and @mjaggi
  • 2 replies
Β·