Nathan Habib

SaylorTwift

AI & ML interests

None yet

Recent Activity

Organizations

Hugging Face's profile picture Evaluation datasets's profile picture HuggingFaceGECLM's profile picture BigCode's profile picture Hugging Face H4's profile picture BigCode Data's profile picture Hugging Face Smol Cluster's profile picture Open LLM Leaderboard's profile picture huggingPartyParis's profile picture Qwen's profile picture gg-hf's profile picture Nanotron Research's profile picture HuggingFaceFW's profile picture HF-contamination-detection's profile picture Top Contributors: Dataset Downloads's profile picture hsramall's profile picture La Leaderboard's profile picture gg-tt's profile picture HuggingFaceEval's profile picture Novel Challenge's profile picture LLHF's profile picture SLLHF's profile picture lbhf's profile picture Lighteval testing org's profile picture Coordination Nationale pour l'IA's profile picture open-llm-leaderboard-react's profile picture Prompt Leaderboard's profile picture Open R1's profile picture

SaylorTwift's activity

reacted to elliesleightholm's post with 🤗 3 months ago
posted an update 3 months ago
reacted to Symbol-LLM's post with 🔥 3 months ago
view post
Post
999
🥳 Thrilled to introduce our recent efforts on bootstrapping VLMs for multi-modal chain-of-thought reasoning !

📕 Title: Vision-Language Models Can Self-Improve Reasoning via Reflection

🔗 Link: Vision-Language Models Can Self-Improve Reasoning via Reflection (2411.00855)

😇Takeaways:

- We found that VLMs can self-improve reasoning performance through a reflection mechanism, and importantly, this approach can scale through test-time computing.

- Evaluation on comprehensive and diverse Vision-Language reasoning tasks are included !
reacted to cfahlgren1's post with ❤️ 3 months ago
view post
Post
3171
You can clean and format datasets entirely in the browser with a few lines of SQL.

In this post, I replicate the process @mlabonne used to clean the new microsoft/orca-agentinstruct-1M-v1 dataset.

The cleaning process consists of:
- Joining the separate splits together / add split column
- Converting string messages into list of structs
- Removing empty system prompts

https://huggingface.co/blog/cfahlgren1/the-beginners-guide-to-cleaning-a-dataset

Here's his new cleaned dataset: mlabonne/orca-agentinstruct-1M-v1-cleaned
  • 1 reply
·
replied to rizzware's post 5 months ago
view reply

Hi! Lighteval makes it easy to compare model enhancements, such as different prompting or fine-tuning. You can change the prompts for a given task or even create a new task using a different prompt, generation size, stop words, etc.
All you need to create a new task is listed in the lighteval readme.
Do you have a more specific use case in mind so that we can eventually help you ?

reacted to yunusserhat's post with 🚀 6 months ago
view post
Post
3244
Hello everyone,

I am pleased to announce that I have founded the University of Glasgow organization on Huggingface. If you are affiliated with the University of Glasgow or have a relative who is, you can log in through the relevant link.

https://huggingface.co/UniversityofGlasgow
  • 1 reply
·
reacted to fdaudens's post with ❤️ 7 months ago
reacted to fdaudens's post with 👍 8 months ago
view post
Post
2824
Finally, a good handwriting recognition tool?

I'm impressed by Microsoft's latest vision model, Florence-2 microsoft/Florence-2-large

The results are really good, boasting a remarkably low error rate, as you can see with this letter from George W. Bush to Bill Clinton!

🚀🔒 What’s even better? You can run it locally on your device, ensuring your data stays 100% safe.

👉 Try it out here: gokaygokay/Florence-2
  • 1 reply
·
reacted to MrOvkill's post with 🔥 8 months ago
view post
Post
3329
Hello!

I've made a little evaluation dataset for LLMs that require advanced and convoluted logical reasoning. It's composed of 81 unique paradoxes, with admittedly a couple in the same category ( absolutes. ) It's available here: MrOvkill/pdox

**Update**: I have upgraded the dataset to v3, ( don't worry about v2, it can be forgotten... ) and placed in a separate repo here:
MrOvkill/pdox-reversed

Enjoy & Have fun!
-<3
·
reacted to thomwolf's post with 🚀🔥 8 months ago
view post
Post
4582
[New crazy blog post alert] We are releasing an extensive blog post on the science of creating high quality web-scale datasets, detailing all the steps and learnings that came in our recent 15 trillion tokens 🍷FineWeb release

Inspired by the distill.pub interactive graphics papers, we settled to write the most extensive, enjoyable and in-depth tech report we could draft on so prepare for a 45-mmin read with interactive graphics and all.

And it's not all, in this article we also introduce 📚FineWeb-Edu a filtered subset of Common Crawl with 1.3T tokens containing only web pages with very high educational content. Up to our knowledge, FineWeb-Edu out-performs all openly release web-scale datasets by a significant margin on knowledge- and reasoning-intensive benchmarks like MMLU, ARC, and OpenBookQA

We also make a number of surprising observations on the "quality" of the internet it-self which may challenge some of the general assumptions on web data (not saying more, I'll let you draw your conclusions ;)

HuggingFaceFW/blogpost-fineweb-v1
  • 1 reply
·
reacted to clem's post with 🤗 10 months ago
reacted to alozowski's post with ❤️🔥 10 months ago
view post
Post
2676
Do I need to make it a tradition to post here every Friday? Well, here we are again!

This week, I'm happy to share that we have two official Mistral models on the Leaderboard! 🔥 You can check them out: mistralai/Mixtral-8x22B-Instruct-v0.1 and mistralai/Mixtral-8x22B-v0.1

The most exciting thing here? mistralai/Mixtral-8x22B-Instruct-v0.1 model got a first place among pretrained models with an impressive average score of 79.15!🥇 Not far behind is the Mixtral-8x22B-v0.1, achieving second place with an average score of 74.47! Well done, Mistral AI! 👏

Check out my screenshot here or explore it yourself at the https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

The second news is that CohereForAI/c4ai-command-r-plus model in 4-bit quantization got a great average score of 70.08. Cool stuff, Cohere! 😎 (and I also have the screenshot for this, don't miss it)

The last news, which might seem small but is still significant, the Leaderboard frontpage now supports Python 3.12.1. This means we're on our way to speed up the Leaderboard's performance! 🚀

If you have any comments or suggestions, feel free to also tag me on X (Twitter), I'll try to help – [at]ailozovskaya

Have a nice weekend! ✨
  • 2 replies
·
reacted to clem's post with 👍 about 1 year ago
reacted to clem's post with 🤗 about 1 year ago
view post
Post
Is synthetic data the future of AI? 🔥🔥🔥

@HugoLaurencon @Leyo & @VictorSanh are introducing HuggingFaceM4/WebSight , a multimodal dataset featuring 823,000 pairs of synthetically generated HTML/CSS codes along with screenshots of the corresponding rendered websites to train GPT4-V-like models 🌐💻

While crafting their upcoming foundation vision language model, they faced the challenge of converting website screenshots into usable HTML/CSS codes. Most VLMs suck at this and there was no public dataset available for this specific task, so they decided to create their own.

They prompted existing LLMs to generate 823k HTML/CSS codes of very simple websites. Through supervised fine-tuning of a vision language model on WebSight, they were able to generate the code to reproduce a website component, given a screenshot.

You can explore the dataset here: HuggingFaceM4/WebSight

What do you think?
·
reacted to artificialguybr's post with 🤗 about 1 year ago
view post
Post
Cool feature! Thanks, HF, for allowing me test it.
  • 2 replies
·