SLLHF

community
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

sllhf's activity

victor 
posted an update 2 days ago
view post
Post
2775
Hey everyone, we've given https://hf.co/spaces page a fresh update!

Smart Search: Now just type what you want to do—like "make a viral meme" or "generate music"—and our search gets it.

New Categories: Check out the cool new filter bar with icons to help you pick a category fast.

Redesigned Space Cards: Reworked a bit to really show off the app descriptions, so you know what each Space does at a glance.

Random Prompt: Need ideas? Hit the dice button for a burst of inspiration.

We’d love to hear what you think—drop us some feedback plz!
·
m-ric 
posted an update 2 days ago
view post
Post
5530
Introducing 𝗼𝗽𝗲𝗻 𝗗𝗲𝗲𝗽-𝗥𝗲𝘀𝗲𝗮𝗿𝗰𝗵 by Hugging Face! 💥

OpenAI's latest agentic app Deep Research seems really good... But it's closed, as usual.

⏱️ So with a team of cracked colleagues, we set ourselves a 24hours deadline to replicate and open-source Deep Research! ⏱️

➡️ We built open-Deep-Research, an entirely open agent that can: navigate the web autonomously, scroll and search through pages, download and manipulate files, run calculation on data...

We aimed for the best performance: are the agent's answers really rigorous?

On GAIA benchmark, Deep Research had 67% accuracy on the validation set.
➡️ open Deep Research is at 55% (powered by o1), it is:
- the best pass@1 solution submitted
- the best open solution 💪💪

And it's only getting started ! Please jump in, drop PRs, and let's bring it to the top !

Read the blog post 👉 https://huggingface.co/blog/open-deep-research
m-ric 
posted an update 6 days ago
view post
Post
2433
Now you can launch a code agent directly from your terminal!
✨ 𝚜𝚖𝚘𝚕𝚊𝚐𝚎𝚗𝚝 "𝚈𝚘𝚞𝚛 𝚝𝚊𝚜𝚔" directly launches a CodeAgent
▶️ This also works with web agents (replace 𝚜𝚖𝚘𝚕𝚊𝚐𝚎𝚗𝚝 with 𝚠𝚎𝚋𝚊𝚐𝚎𝚗𝚝) thanks to @merve !

💾 Another treat from smolagents release 1.7.0:
Now agents have a memory mechanism, enabling many possibilities like replaying the last run with 𝚊𝚐𝚎𝚗𝚝.𝚛𝚎𝚙𝚕𝚊𝚢(), thank you @clefourrier !

Check the release notes here 👉 https://github.com/huggingface/smolagents/releases/tag/v1.7.0
m-ric 
posted an update 9 days ago
view post
Post
3549
𝗧𝗵𝗲 𝗛𝘂𝗯 𝘄𝗲𝗹𝗰𝗼𝗺𝗲𝘀 𝗲𝘅𝘁𝗲𝗿𝗻𝗮𝗹 𝗶𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗽𝗿𝗼𝘃𝗶𝗱𝗲𝗿𝘀!

✅ Hosting our own inference was not enough: now the Hub 4 new inference providers: fal, Replicate, SambaNova Systems, & Together AI.

Check model cards on the Hub: you can now, in 1 click, use inference from various providers (cf video demo)

Their inference can also be used through our Inference API client. There, you can use either your custom provider key, or your HF token, then billing will be handled directly on your HF account, as a way to centralize all expenses.

💸 Also, PRO users get 2$ inference credits per month!

Read more in the announcement 👉 https://huggingface.co/blog/inference-providers
  • 1 reply
·
victor 
posted an update 9 days ago
view post
Post
2947
Finally, an open-source AI that turns your lyrics into full songs is here—meet YuE! Unlike other tools that only create short clips, YuE can make entire songs (up to 5 minutes) with vocals, melody, and instruments all working together. Letsss go!

m-a-p/YuE-s1-7B-anneal-en-cot
  • 1 reply
·
m-ric 
posted an update 13 days ago
view post
Post
2837
Today we make the biggest release in smolagents so far: 𝘄𝗲 𝗲𝗻𝗮𝗯𝗹𝗲 𝘃𝗶𝘀𝗶𝗼𝗻 𝗺𝗼𝗱𝗲𝗹𝘀, 𝘄𝗵𝗶𝗰𝗵 𝗮𝗹𝗹𝗼𝘄𝘀 𝘁𝗼 𝗯𝘂𝗶𝗹𝗱 𝗽𝗼𝘄𝗲𝗿𝗳𝘂𝗹 𝘄𝗲𝗯 𝗯𝗿𝗼𝘄𝘀𝗶𝗻𝗴 𝗮𝗴𝗲𝗻𝘁𝘀! 🥳

Our agents can now casually open up a web browser, and navigate on it by scrolling, clicking elements on the webpage, going back, just like a user would.

The demo below shows Claude-3.5-Sonnet browsing GitHub for task: "Find how many commits the author of the current top trending repo did over last year."
Hi @mlabonne !

Go try it out, it's the most cracked agentic stuff I've seen in a while 🤯 (well, along with OpenAI's Operator who beat us by one day)

For more detail, read our announcement blog 👉 https://huggingface.co/blog/smolagents-can-see
The code for the web browser example is here 👉 https://github.com/huggingface/smolagents/blob/main/examples/vlm_web_browser.py
·
Xenova 
posted an update 21 days ago
view post
Post
4725
Introducing Kokoro.js, a new JavaScript library for running Kokoro TTS, an 82 million parameter text-to-speech model, 100% locally in the browser w/ WASM. Powered by 🤗 Transformers.js. WebGPU support coming soon!
👉 npm i kokoro-js 👈

Try it out yourself: webml-community/kokoro-web
Link to models/samples: onnx-community/Kokoro-82M-ONNX

You can get started in just a few lines of code!
import { KokoroTTS } from "kokoro-js";

const tts = await KokoroTTS.from_pretrained(
  "onnx-community/Kokoro-82M-ONNX",
  { dtype: "q8" }, // fp32, fp16, q8, q4, q4f16
);

const text = "Life is like a box of chocolates. You never know what you're gonna get.";
const audio = await tts.generate(text,
  { voice: "af_sky" }, // See `tts.list_voices()`
);
audio.save("audio.wav");

Huge kudos to the Kokoro TTS community, especially taylorchu for the ONNX exports and Hexgrad for the amazing project! None of this would be possible without you all! 🤗

The model is also extremely resilient to quantization. The smallest variant is only 86 MB in size (down from the original 326 MB), with no noticeable difference in audio quality! 🤯
·
m-ric 
posted an update 21 days ago
view post
Post
1226
𝗠𝗶𝗻𝗶𝗠𝗮𝘅'𝘀 𝗻𝗲𝘄 𝗠𝗼𝗘 𝗟𝗟𝗠 𝗿𝗲𝗮𝗰𝗵𝗲𝘀 𝗖𝗹𝗮𝘂𝗱𝗲-𝗦𝗼𝗻𝗻𝗲𝘁 𝗹𝗲𝘃𝗲𝗹 𝘄𝗶𝘁𝗵 𝟰𝗠 𝘁𝗼𝗸𝗲𝗻𝘀 𝗰𝗼𝗻𝘁𝗲𝘅𝘁 𝗹𝗲𝗻𝗴𝘁𝗵 💥

This work from Chinese startup @MiniMax-AI introduces a novel architecture that achieves state-of-the-art performance while handling context windows up to 4 million tokens - roughly 20x longer than current models. The key was combining lightning attention, mixture of experts (MoE), and a careful hybrid approach.

𝗞𝗲𝘆 𝗶𝗻𝘀𝗶𝗴𝗵𝘁𝘀:

🏗️ MoE with novel hybrid attention:
‣ Mixture of Experts with 456B total parameters (45.9B activated per token)
‣ Combines Lightning attention (linear complexity) for most layers and traditional softmax attention every 8 layers

🏆 Outperforms leading models across benchmarks while offering vastly longer context:
‣ Competitive with GPT-4/Claude-3.5-Sonnet on most tasks
‣ Can efficiently handle 4M token contexts (vs 256K for most other LLMs)

🔬 Technical innovations enable efficient scaling:
‣ Novel expert parallel and tensor parallel strategies cut communication overhead in half
‣ Improved linear attention sequence parallelism, multi-level padding and other optimizations achieve 75% GPU utilization (that's really high, generally utilization is around 50%)

🎯 Thorough training strategy:
‣ Careful data curation and quality control by using a smaller preliminary version of their LLM as a judge!

Overall, not only is the model impressive, but the technical paper is also really interesting! 📝
It has lots of insights including a great comparison showing how a 2B MoE (24B total) far outperforms a 7B model for the same amount of FLOPs.

Read it in full here 👉 MiniMax-01: Scaling Foundation Models with Lightning Attention (2501.08313)
Model here, allows commercial use <100M monthly users 👉 MiniMaxAI/MiniMax-Text-01
m-ric 
posted an update 22 days ago
view post
Post
2467
𝗪𝗲'𝘃𝗲 𝗷𝘂𝘀𝘁 𝗿𝗲𝗹𝗲𝗮𝘀𝗲𝗱 𝘀𝗺𝗼𝗹𝗮𝗴𝗲𝗻𝘁𝘀 𝘃𝟭.𝟯.𝟬 🚀, and it comes with a major feature: you can now log agent runs using OpenTelemetry to inspect them afterwards! 📊

This interactive format is IMO much easier to inspect big multi-step runs than endless console logs.

The setup is very easy, in a few lines of code.

Find a tutorial here 👉 https://huggingface.co/docs/smolagents/tutorials/inspect_runs
  • 5 replies
·
m-ric 
posted an update 25 days ago
view post
Post
644
𝗢𝗦-𝗚𝗲𝗻𝗲𝘀𝗶𝘀: 𝗻𝗲𝘄 𝗿𝗲𝘀𝗲𝗮𝗿𝗰𝗵 𝗽𝗮𝗽𝗲𝗿 𝗽𝗿𝗼𝗽𝗼𝘀𝗲𝘀 𝗮 𝗻𝗼𝘃𝗲𝗹 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝗱𝗮𝘁𝗮 𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻 𝗺𝗲𝘁𝗵𝗼𝗱 𝗳𝗼𝗿 𝗖𝗹𝗮𝘂𝗱𝗲-𝗖𝗼𝗺𝗽𝘂𝘁𝗲𝗿-𝗨𝘀𝗲-𝗹𝗶𝗸𝗲 𝗮𝗴𝗲𝗻𝘁𝘀, 𝘄𝗶𝘁𝗵 𝗶𝗺𝗽𝗿𝗲𝘀𝘀𝗶𝘃𝗲 𝗿𝗲𝘀𝘂𝗹𝘁𝘀! 🔥

The main bottleneck in building GUI agents it to find training data.
GUI Agent trajectories are not easy to get by. Crowdsourcing trajectories, then manually annotating them, could be an option, but at scale, it's hard to do

You could use synthetic data generation (ask 1000s small existing GUI agents to solve tasks, keep only successful runs). But then it's hard to come up with many high level-tasks.

➡️ Well, a novel technique was just published that creates a new promising paradigm for synthetic data generation: Shanghai AI Lab researchers propose OS-Genesis, a novel way to create training data for GUI agents that flips the traditional approach on its head. Instead of starting with predefined tasks and having humans or machines execute them, OS-Genesis first explores the interface naturally, then derives meaningful tasks from those interactions.

🔍 Exploration-driven vs task-driven approach:
‣ Instead of starting with tasks, OS-Genesis first explores GUIs by clicking and interacting
‣ It then reverse-engineers high-level tasks from successful interaction patterns
‣ This leads to more natural and diverse training data than predefined tasks

🎯 Novel reward model for trajectory quality:
‣ Rather than discarding incomplete trajectories, OS-Genesis scores them based on coherence and completion
‣ This preserves valuable partial successes that would otherwise be wasted

🏆 Superior results across environments:
‣ Nearly doubles performance on AndroidWorld (9.8% → 17.4%)

By the way, this field of GUI agents is still in infancy, so you can still make a difference with "low-cost" setups: their paper gets SOTA results with only 8xA100!

Read the paper here 👉 OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis (2412.19723)
m-ric 
posted an update about 1 month ago
view post
Post
5091
Since I published it on GitHub a few days ago,
Hugging Face's new agentic library 𝘀𝗺𝗼𝗹𝗮𝗴𝗲𝗻𝘁𝘀 has gathered nearly 4k stars 🤯

➡️ But we are just getting started on agents: so we are hiring an ML Engineer to join me and double down on this effort!

The plan is to build GUI agents: agents that can act on your computer with mouse & keyboard, like Claude Computer Use.

We will make it work better, and fully open. ✨

Sounds like something you'd like to do? Apply here 👉 https://apply.workable.com/huggingface/j/AF1D4E3FEB/
·
Xenova 
posted an update about 1 month ago