SaylorTwift (Nathan Habib)

reacted to elliesleightholm's post with 🤗 3 months ago

Post

2778

I made a beginners guide to Hugging Face Spaces 🤗 I hope it's useful to some of you :)

YouTube video: https://www.youtube.com/watch?v=xqdTFyRdtjQ

Blog: https://www.marqo.ai/blog/how-to-create-a-hugging-face-space

8 replies

·

posted an update 3 months ago

Post

485

How do I test an LLM for my unique needs?
If you work in finance, law, or medicine, generic benchmarks are not enough.
This blog post uses Argilla, Distilllabel and 🌤️Lighteval to generate evaluation dataset and evaluate models.

https://github.com/argilla-io/argilla-cookbook/blob/main/domain-eval/README.md

reacted to Symbol-LLM's post with 🔥 3 months ago

Post

999

🥳 Thrilled to introduce our recent efforts on bootstrapping VLMs for multi-modal chain-of-thought reasoning !

📕 Title: Vision-Language Models Can Self-Improve Reasoning via Reflection

🔗 Link: Vision-Language Models Can Self-Improve Reasoning via Reflection (2411.00855)

😇Takeaways:

- We found that VLMs can self-improve reasoning performance through a reflection mechanism, and importantly, this approach can scale through test-time computing.

- Evaluation on comprehensive and diverse Vision-Language reasoning tasks are included !

reacted to cfahlgren1's post with ❤️ 3 months ago

Post

3171

You can clean and format datasets entirely in the browser with a few lines of SQL.

In this post, I replicate the process @mlabonne used to clean the new microsoft/orca-agentinstruct-1M-v1 dataset.

The cleaning process consists of:
- Joining the separate splits together / add split column
- Converting string messages into list of structs
- Removing empty system prompts

https://huggingface.co/blog/cfahlgren1/the-beginners-guide-to-cleaning-a-dataset

Here's his new cleaned dataset: mlabonne/orca-agentinstruct-1M-v1-cleaned

1 reply

·

replied to rizzware's post 5 months ago

Hi! Lighteval makes it easy to compare model enhancements, such as different prompting or fine-tuning. You can change the prompts for a given task or even create a new task using a different prompt, generation size, stop words, etc.
All you need to create a new task is listed in the lighteval readme.
Do you have a more specific use case in mind so that we can eventually help you ?

reacted to yunusserhat's post with 🚀 6 months ago

Post

3244

Hello everyone,

I am pleased to announce that I have founded the University of Glasgow organization on Huggingface. If you are affiliated with the University of Glasgow or have a relative who is, you can log in through the relevant link.

https://huggingface.co/UniversityofGlasgow

1 reply

·

reacted to fdaudens's post with ❤️ 7 months ago

Post

1887

Look at that 👀

Actual benchmarks have become too easy for recent models, much like grading high school students on middle school problems makes little sense. So the team worked on a new version of the Open LLM Leaderboard with new benchmarks.

Stellar work from @clefourrier @SaylorTwift and the team!

👉 Read the blog post: open-llm-leaderboard/blog
👉 Explore the leaderboard: open-llm-leaderboard/open_llm_leaderboard

1 reply

·

reacted to fdaudens's post with 👍 8 months ago

Post

2824

Finally, a good handwriting recognition tool?

I'm impressed by Microsoft's latest vision model, Florence-2 microsoft/Florence-2-large

The results are really good, boasting a remarkably low error rate, as you can see with this letter from George W. Bush to Bill Clinton!

🚀🔒 What’s even better? You can run it locally on your device, ensuring your data stays 100% safe.

👉 Try it out here: gokaygokay/Florence-2

1 reply

·

reacted to MrOvkill's post with 🔥 8 months ago

Post

3329

Hello!

I've made a little evaluation dataset for LLMs that require advanced and convoluted logical reasoning. It's composed of 81 unique paradoxes, with admittedly a couple in the same category ( absolutes. ) It's available here: MrOvkill/pdox

**Update**: I have upgraded the dataset to v3, ( don't worry about v2, it can be forgotten... ) and placed in a separate repo here:
MrOvkill/pdox-reversed

Enjoy & Have fun!
-<3

12 replies

·

reacted to thomwolf's post with 🚀🔥 8 months ago

Post

4582

[New crazy blog post alert] We are releasing an extensive blog post on the science of creating high quality web-scale datasets, detailing all the steps and learnings that came in our recent 15 trillion tokens 🍷FineWeb release

Inspired by the distill.pub interactive graphics papers, we settled to write the most extensive, enjoyable and in-depth tech report we could draft on so prepare for a 45-mmin read with interactive graphics and all.

And it's not all, in this article we also introduce 📚FineWeb-Edu a filtered subset of Common Crawl with 1.3T tokens containing only web pages with very high educational content. Up to our knowledge, FineWeb-Edu out-performs all openly release web-scale datasets by a significant margin on knowledge- and reasoning-intensive benchmarks like MMLU, ARC, and OpenBookQA

We also make a number of surprising observations on the "quality" of the internet it-self which may challenge some of the general assumptions on web data (not saying more, I'll let you draw your conclusions ;)

HuggingFaceFW/blogpost-fineweb-v1

1 reply

·

reacted to clem's post with 🤗 10 months ago

Post

2526

Great in-depth Llama-3 tests from @wolfram , of the models from Meta of course but also @MaziyarPanahi @emozilla @turboderp : https://huggingface.co/blog/wolfram/llm-comparison-test-llama-3

Spotted by @jack-kumar

2 replies

·

reacted to alozowski's post with ❤️🔥 10 months ago

Post

2676

Do I need to make it a tradition to post here every Friday? Well, here we are again!

This week, I'm happy to share that we have two official Mistral models on the Leaderboard! 🔥 You can check them out: mistralai/Mixtral-8x22B-Instruct-v0.1 and mistralai/Mixtral-8x22B-v0.1

The most exciting thing here? mistralai/Mixtral-8x22B-Instruct-v0.1 model got a first place among pretrained models with an impressive average score of 79.15!🥇 Not far behind is the Mixtral-8x22B-v0.1, achieving second place with an average score of 74.47! Well done, Mistral AI! 👏

Check out my screenshot here or explore it yourself at the https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

The second news is that CohereForAI/c4ai-command-r-plus model in 4-bit quantization got a great average score of 70.08. Cool stuff, Cohere! 😎 (and I also have the screenshot for this, don't miss it)

The last news, which might seem small but is still significant, the Leaderboard frontpage now supports Python 3.12.1. This means we're on our way to speed up the Leaderboard's performance! 🚀

If you have any comments or suggestions, feel free to also tag me on X (Twitter), I'll try to help – [at]ailozovskaya

Have a nice weekend! ✨

2 replies

·

reacted to clem's post with 👍 about 1 year ago

Post

Re-posting @karpathy 's blogpost here because it's down on https://karpathy.github.io/2024/01/21/selfdriving-agi. What do you all think?

4 replies

·

reacted to clem's post with 🤗 about 1 year ago

Post

Is synthetic data the future of AI? 🔥🔥🔥

@HugoLaurencon @Leyo & @VictorSanh are introducing HuggingFaceM4/WebSight , a multimodal dataset featuring 823,000 pairs of synthetically generated HTML/CSS codes along with screenshots of the corresponding rendered websites to train GPT4-V-like models 🌐💻

While crafting their upcoming foundation vision language model, they faced the challenge of converting website screenshots into usable HTML/CSS codes. Most VLMs suck at this and there was no public dataset available for this specific task, so they decided to create their own.

They prompted existing LLMs to generate 823k HTML/CSS codes of very simple websites. Through supervised fine-tuning of a vision language model on WebSight, they were able to generate the code to reproduce a website component, given a screenshot.

You can explore the dataset here: HuggingFaceM4/WebSight

What do you think?

12 replies

·

reacted to artificialguybr's post with 🤗 about 1 year ago

Post

Cool feature! Thanks, HF, for allowing me test it.

2 replies

·

Nathan Habib

AI & ML interests

Recent Activity

Organizations

SaylorTwift's activity