With the big hype around AI agents these days, I couldnโt stop thinking about how AI agents could truly enhance real-world activities. What sort of applications could we build with those AI agents: agentic RAG? self-correcting text-to-sql? Nah, boringโฆ
Passionate about outdoors, Iโve always dreamed of a tool that could simplify planning mountain trips while accounting for all potential risks. Thatโs why I built ๐๐น๐ฝ๐ถ๐ป๐ฒ ๐๐ด๐ฒ๐ป๐, a smart assistant designed to help you plan safe and enjoyable itineraries in the French Alps and Pyrenees.
Built using Hugging Face's ๐๐บ๐ผ๐น๐ฎ๐ด๐ฒ๐ป๐๐ library, Alpine Agent combines the power of AI with trusted resources like ๐๐ฌ๐ช๐ต๐ฐ๐ถ๐ณ.๐ง๐ณ (https://skitour.fr/) and METEO FRANCE. Whether itโs suggesting a route with moderate difficulty or analyzing avalanche risks and weather conditions, this agent dynamically integrates data to deliver personalized recommendations.
In my latest blog post, I share how I developed this projectโfrom defining tools and integrating APIs to selecting the best LLMs like ๐๐ธ๐ฆ๐ฏ2.5-๐๐ฐ๐ฅ๐ฆ๐ณ-32๐-๐๐ฏ๐ด๐ต๐ณ๐ถ๐ค๐ต, ๐๐ญ๐ข๐ฎ๐ข-3.3-70๐-๐๐ฏ๐ด๐ต๐ณ๐ถ๐ค๐ต, or ๐๐๐-4.
โท๏ธ Curious how AI can enhance adventure planning?โจTry the app and share your thoughts: florentgbelidji/alpine-agent ๐ Want to build your own agents? Whether for cooking, sports training, or other passions, the possibilities are endless. Check out the blog post to learn more: https://huggingface.co/blog/florentgbelidji/alpine-agent
Many thanks to @m-ric for helping on building this tool with smolagents!
1 reply
ยท
reacted to MoritzLaurer's
post with ๐ฅ30 days ago
๐ Releasing a new zeroshot-classifier based on ModernBERT! Some key takeaways:
- โก Speed & efficiency: It's multiple times faster and uses significantly less memory than DeBERTav3. You can use larger batch sizes and enabling bf16 (instead of fp16) gave me a ~2x speed boost as well - ๐ Performance tradeoff: It performs slightly worse than DeBERTav3 on average across my zeroshot classification task collection - ๐ง Use cases: I recommend using it for scenarios requiring speed and a larger context window (8k). - ๐ก Whatโs next? Iโm preparing a newer version trained on better + longer synthetic data to fully leverage the 8k context window and improve upon the training mix of my older zeroshot-v2.0 models. I also hope that there will be a multilingual variant in the future.
Since I published it on GitHub a few days ago, Hugging Face's new agentic library ๐๐บ๐ผ๐น๐ฎ๐ด๐ฒ๐ป๐๐ has gathered nearly 4k stars ๐คฏ
โก๏ธ But we are just getting started on agents: so we are hiring an ML Engineer to join me and double down on this effort!
The plan is to build GUI agents: agents that can act on your computer with mouse & keyboard, like Claude Computer Use.
New sampling strategy dropped in ๐ค transformers -- Min P sampling ๐ฅ
Are you tired of having top_k arbitrarily discarding high-quality continuations? Or top_p forgetting to exclude low-probability tokens, derailing your generation? Try out the new min_p flag in generate, fresh from a PR merged today! ๐ฅฌ
Min P consists of a dynamic token filter -- as opposed to Top K, which keeps the K most likely tokens, and Top P, which keeps the most likely tokens up to a fixed cumulative probability, both static filters. Min P takes a base probability (defined in the min_p flag) and multiplies it by the probability of the most likely token in the distribution for the next token. All tokens less likely than the resulting value are filtered. What happens with this strategy? ๐ High probability token present -> aggressive filter (we don't want to miss on that high-probability case and risk derailing generation) ๐ No high probability token present -> relaxed filter (there are many continuation possibilities that the model finds plausible)
You should set min_p to a low value, between 0.05 and 0.1. It behaves particularly well for creative text generation when paired up with temperature > 1.
Interesting paper: ๐๐๐๐จ๐ซ๐: ๐ญ๐ซ๐๐ข๐ง ๐๐ ๐ฆ๐จ๐๐๐ฅ๐ฌ ๐จ๐ง ๐๐จ๐ง๐ฌ๐ฎ๐ฆ๐๐ซ-๐ ๐ซ๐๐๐ ๐๐๐๐ฌ ๐ช It's now possible to ๐๐ช๐ก๐ก๐ฎ ๐ฅ๐ง๐-๐ฉ๐ง๐๐๐ฃ a 7B model on a consumer-grade GPU of 24Gb RAM, without any performance loss!
The memory usage of training models has always been an acute issue. For instance full pre-training of a 7B model used to eat ~50Gb of RAM!
The common workarounds to reduce memory load are: - separate models on multiple GPUs ("sharding") - quantize models: encode weights on fewer bits
Another technique is to ๐ฅ๐ง๐ค๐๐๐๐ฉ ๐ฉ๐๐ ๐ฌ๐๐๐๐๐ฉ ๐ข๐๐ฉ๐ง๐๐ญ ๐ฉ๐ค ๐ก๐ค๐ฌ๐๐ง-๐ง๐๐ฃ๐ ๐จ๐ฅ๐๐๐๐จ, (since sometimes the weights do not really vary on all dimensions): this can save a lot of space! This low-rank projection can be done on adapters to preserve the original weights (go check out LoRA), but it still generally hurts the performance too much for pre-training.
โก๏ธ Enter the authors of ๐๐ข๐๐ฐ๐ณ๐ฆ: ๐๐ฆ๐ฎ๐ฐ๐ณ๐บ-๐๐ง๐ง๐ช๐ค๐ช๐ฆ๐ฏ๐ต ๐๐๐ ๐๐ณ๐ข๐ช๐ฏ๐ช๐ฏ๐จ ๐ฃ๐บ ๐๐ณ๐ข๐ฅ๐ช๐ฆ๐ฏ๐ต ๐๐ฐ๐ธ-๐๐ข๐ฏ๐ฌ ๐๐ณ๐ฐ๐ซ๐ฆ๐ค๐ต๐ช๐ฐ๐ฏ. They gather (and prove) interesting insights: โ The weight matrix does not reliably converge to lower ranks during training. โ But the gradient matrix does!
Based on these insights, ๐๐ต๐ฒ๐ ๐ฏ๐๐ถ๐น๐ฑ ๐๐ฎ๐๐ผ๐ฟ๐ฒ, that projects the gradient to lower ranks. ๐บ๏ธ ๐๐ฟ๐ฒ๐ฎ๐ ๐ถ๐ฑ๐ฒ๐ฎ: to leave the optimization free to explore more space, they periodically re-build the low-rank projection throughout the training (a nice illustration is in the paper).
๐ค This method can even be combined with previous ones such as 8-bit Adam (quantizing the optimizer states to 8-bit).
โก๏ธ ๐๐๐ฌ๐ฎ๐ฅ๐ญ๐ฌ: ๐ Of course, huge reduction in memory footprint allowing the training on consumer-grade GPU (cf figure). ๐ช No reduction in performance: this scales well up to 7B parameters (and was independently confirmed since) โ this is essential, it confirms that the method is viable!
@HugoLaurencon@Leyo & @VictorSanh are introducing HuggingFaceM4/WebSight , a multimodal dataset featuring 823,000 pairs of synthetically generated HTML/CSS codes along with screenshots of the corresponding rendered websites to train GPT4-V-like models ๐๐ป
While crafting their upcoming foundation vision language model, they faced the challenge of converting website screenshots into usable HTML/CSS codes. Most VLMs suck at this and there was no public dataset available for this specific task, so they decided to create their own.
They prompted existing LLMs to generate 823k HTML/CSS codes of very simple websites. Through supervised fine-tuning of a vision language model on WebSight, they were able to generate the code to reproduce a website component, given a screenshot.