derek-thomas (Derek Thomas)

reacted to erinys's post with ❤️ 5 months ago

Post

1977

We shut down XetHub today after almost 2 years. What we learned from launching our Git-scaled product from scratch:
- Don't make me change my workflow
- Data inertia is real
- ML best practices are still evolving

Closing the door on our public product lets us focus on our new goal of scaling HF Hub's storage backend to improve devX for a larger community. We'd love to hear your thoughts on what experiences we can improve!

Read the full post: https://xethub.com/blog/shutting-down-xethub-learnings-and-takeaways

6 replies

·

reacted to thomwolf's post with 👍❤️🔥 6 months ago

Post

5184

A Little guide to building Large Language Models in 2024

This is a post-recording of a 75min lecture I gave two weeks ago on how to train a LLM from scratch in 2024. I tried to keep it short and comprehensive – focusing on concepts that are crucial for training good LLM but often hidden in tech reports.

In the lecture, I introduce the students to all the important concepts/tools/techniques for training good performance LLM:
* finding, preparing and evaluating web scale data
* understanding model parallelism and efficient training
* fine-tuning/aligning models
* fast inference

There is of course many things and details missing and that I should have added to it, don't hesitate to tell me you're most frustrating omission and I'll add it in a future part. In particular I think I'll add more focus on how to filter topics well and extensively and maybe more practical anecdotes and details.

Now that I recorded it I've been thinking this could be part 1 of a two-parts series with a 2nd fully hands-on video on how to run all these steps with some libraries and recipes we've released recently at HF around LLM training (and could be easily adapted to your other framework anyway):
*datatrove for all things web-scale data preparation: https://github.com/huggingface/datatrove
*nanotron for lightweight 4D parallelism LLM training: https://github.com/huggingface/nanotron
*lighteval for in-training fast parallel LLM evaluations: https://github.com/huggingface/lighteval

Here is the link to watch the lecture on Youtube: https://www.youtube.com/watch?v=2-SPH9hIKT8
And here is the link to the Google slides: https://docs.google.com/presentation/d/1IkzESdOwdmwvPxIELYJi8--K3EZ98_cL6c5ZcLKSyVg/edit#slide=id.p

Enjoy and happy to hear feedback on it and what to add, correct, extend in a second part.

2 replies

·

reacted to their post with 😎 6 months ago

Post

2261

Here is an AI Puzzle!
When you solve it just use a 😎 emoji.
NO SPOILERS
A similar puzzle might have each picture that has a hidden meaning of summer, winter, fall, spring, and the answer would be seasons.

Its a little dated now (almost a year), so bottom right might be tough.

Thanks to @johko for the encouragement to post!

posted an update 6 months ago

Post

2261

Here is an AI Puzzle!
When you solve it just use a 😎 emoji.
NO SPOILERS
A similar puzzle might have each picture that has a hidden meaning of summer, winter, fall, spring, and the answer would be seasons.

Its a little dated now (almost a year), so bottom right might be tough.

Thanks to @johko for the encouragement to post!

reacted to MohamedRashad's post with 🔥 9 months ago

Post

1221

@Ali-C137 and the team at OALL Just released OALL/Open-Arabic-LLM-Leaderboard

Amazing effort to push the Arabic LLMs development forward 👏

reacted to abhishek's post with 🚀🔥 10 months ago

Post

3069

🚀🚀🚀🚀 Introducing AutoTrain Configs! 🚀🚀🚀🚀
Now you can train models using yaml config files! 💥 These configs are easy to understand and are not at all overwhelming. So, even a person with almost zero knowledge of machine learning can train state of the art models without writing any code. Check out example configs in the config directory of autotrain-advanced github repo and feel free to share configs by creating a pull request 🤗
Github repo: https://github.com/huggingface/autotrain-advanced

2 replies

·

reacted to andrewrreed's post with 👍 10 months ago

Post

2324

IMO, the "grounded generation" feature from Cohere's CommandR+ has flown under the radar...

For RAG use cases, responses directly include inline citations, making source attribution an inherent part of generation rather than an afterthought 😎

Who's working on an open dataset with this for the HF community to fine-tune with??

🔗CommandR+ Docs: https://docs.cohere.com/docs/retrieval-augmented-generation-rag

🔗Model on the 🤗 Hub: CohereForAI/c4ai-command-r-plus

1 reply

·

reacted to chiphuyen's post with 👍 12 months ago

Post

It feels awkward having my first post sharing my stuff, but this is a weekend project that I really enjoyed working on. I'd love to meet more people interested in random ideas like this.

A hard part of building AI applications is choosing which model to use. What if we don’t have to? What if we can predict the best model for any prompt?

Predictive human preference aims to predict which model users might prefer for a specific query.

https://huyenchip.com/2024/02/28/predictive-human-preference.html

One use case is model routing. If we know in advance that for a prompt, users will prefer Claude Instant’s response over GPT-4, and Claude Instant is cheaper/faster than GPT-4, we can route this prompt to Claude Instant. Model routing has the potential to increase response quality while reducing costs and latency.

One pattern is that for simple prompts, weak models can do (nearly) as well as strong models. For more challenging prompts, however, users are more likely to prefer stronger models. Here’s a visualization of predicted human preference for an easy prompt (“hello, how are you?”) and a challenging prompt (“Explain why Planc length …”).

Preference predictors make it possible to create leaderboards unique to any prompt and domain.

3 replies

·

posted an update 12 months ago

Post

Defending LLMs against Jailbreaking Attacks via Backtranslation (2402.16459)
**Defending LLMs against Jailbreaking Attacks via Backtranslation**

I really love this! Its a really innovative way to get robust defense against jailbreaking. Its not cheap, 2-3 calls per user request. But for some use-cases it can be worth it!

reacted to alielfilali01's post with ❤️ 12 months ago

Post

🎉🥳🎉
Today, we are thrilled to officially launch the "2A2I" Arabic Artificial Intelligence Initiative. This is a community-driven initiative founded on the philosophy of "Small team, Big work" Our goal is to elevate Arabic AI (LLMs, Diffusion Models, ASR, etc.) to the same level as English (and also Chinese 🐉).

Naturally, our focus today is primarily on datasets. We aim to provide high-quality datasets, especially for LLMs this month, to support our future efforts. In line with this, we're excited to introduce the Arabic version of H4-no_robots, find here : 2A2I/H4_no_robots (and yes, we know it's not "no_robots" anymore 😄). Stay tuned for more exciting, high-quality datasets in the next couple of weeks (+ 4 million rows🔥)

In parallel, we're also developing a model 🐪 that we hope will set new high standards for Arabic LLMs. 🔥 This model is planned for release in the coming months.

For more information, please visit our Organization card here : https://huggingface.co/2A2I

If you're interested in Arabic AI and want to help pushing the wheel as well, fill out this form, and let us know your motivation and your exciting ideas 🔥

The form link : https://forms.gle/kZLVuynWFU2FyTm57

If you have any questions, feel free to reach out to us at the email address below.

Additionally, if you believe as we do in this mission and would like to help this community and contribute some compute resources 😉 or any other form of help you might think about, please contact us at the same email address below or reach out to me through LinkedIn 🔥

2A2I Contact Email : [email protected]
My LinkedIn : https://www.linkedin.com/in/alielfilali01/

Derek Thomas

AI & ML interests

Recent Activity

Organizations

derek-thomas's activity