10 75 111

Aurélien-Morgan CLAUDON

Aurelien-Morgan

https://huggingface.co/retrain-pipelines

AI & ML interests

None yet

Recent Activity

upvoted a paper about 6 hours ago

Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling

liked a Space about 24 hours ago

deepseek-ai/deepseek-vl2-small

upvoted a paper 3 days ago

s1: Simple test-time scaling

View all activity

Organizations

Aurelien-Morgan's activity

upvoted a paper about 6 hours ago

Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling

Paper • 2501.16975 • Published 9 days ago • 23

liked a Space about 24 hours ago

158

Chat with DeepSeek-VL2-small

🌍

upvoted a paper 3 days ago

s1: Simple test-time scaling

Paper • 2501.19393 • Published 6 days ago • 88

reacted to Kseniase's post with ❤️ 4 days ago

Post

4674

8 Free Sources on Reinforcement Learning

With the phenomenon of DeepSeek-R1's top reasoning capabilities, we all saw the true power of RL. At its core, RL is a type of machine learning where a model/agent learns to make decisions by interacting with an environment to maximize a reward. RL learns through trial and error, receiving feedback in the form of rewards or penalties.

Here's a list of free sources that will help you dive into RL and how to use it:

1. "Reinforcement Learning: An Introduction" book by Richard S. Sutton and Andrew G. Barto -> https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf

2. Hugging Face Deep Reinforcement Learning Course -> https://huggingface.co/learn/deep-rl-course/unit0/introduction
You'll learn how to train agents in unique environments, using best libraries, share your results, compete in challenges, and earn a certificate.

3. OpenAI Spinning Up in Deep RL -> https://spinningup.openai.com/en/latest/index.html
A comprehensive overview of RL with many useful resources

4. "Reinforcement Learning and Optimal Control" books, video lectures and course material by Dimitri P. Bertsekas from ASU -> https://web.mit.edu/dimitrib/www/RLbook.html
Explores approximate Dynamic Programming (DP) and RL with key concepts and methods like rollout, tree search, and neural network training for RL and more.

5. RL Course by David Silver (Google DeepMind) -> https://www.youtube.com/watch?v=2pWv7GOvuf0&list=PLqYmG7hTraZDM-OYHWgPeb
Many recommend these video lectures as a good foundation

6. RL theory seminars -> https://sites.google.com/view/rltheoryseminars/home?authuser=0
Provides virtual seminars from different experts about RL advancements

7. "Reinforcement Learning Specialization" (a 4-course series on Coursera) -> https://www.coursera.org/learn/fundament

8. Concepts: RLHF, RLAIF, RLEF, RLCF -> https://www.turingpost.com/p/rl-f
Our flashcards easily explain what are these four RL approaches with different feedback

commented 2 papers 6 days ago

Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch

Paper • 2501.18512 • Published 7 days ago • 25 •

Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch

Paper • 2501.18512 • Published 7 days ago • 25 •

upvoted a paper 6 days ago

Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch

Paper • 2501.18512 • Published 7 days ago • 25

reacted to hexgrad's post with 👍 7 days ago

Post

8211

hexgrad/Kokoro-82M got an upgrade! ⬆️ More voices, more languages, pip install kokoro, and still 82M parameters.

GitHub: https://github.com/hexgrad/kokoro
PyPI: https://pypi.org/project/kokoro/
Space: hexgrad/Kokoro-TTS

11 replies

upvoted 2 articles 14 days ago

Article

Mastering Long Contexts in LLMs with KVPress

and 1 other •

14 days ago

• 59

Article

SmolVLM Grows Smaller – Introducing the 250M & 500M Models!

15 days ago

• 119

upvoted an article 16 days ago

Article

Fine-tune ModernBERT for RAG with Synthetic Data

and 2 others •

17 days ago

• 33

liked a model 17 days ago

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

Text Generation • Updated 5 days ago • 469k • • 687

updated a Space 21 days ago

README

📈

upvoted a paper 21 days ago

Titans: Learning to Memorize at Test Time

Paper • 2501.00663 • Published Dec 31, 2024 • 14

commented a paper 22 days ago

Hype, Sustainability, and the Price of the Bigger-is-Better Paradigm in AI

Paper • 2409.14160 • Published Sep 21, 2024 • 2 •

upvoted a paper 22 days ago

Hype, Sustainability, and the Price of the Bigger-is-Better Paradigm in AI

Paper • 2409.14160 • Published Sep 21, 2024 • 2

liked a model 23 days ago

hexgrad/Kokoro-82M

Text-to-Speech • Updated 5 days ago • 172k • 2.83k

liked a dataset 23 days ago

DAMO-NLP-SG/multimodal_textbook

Updated 26 days ago • 15.2k • 132

reacted to danielhanchen's post with ❤️🔥 26 days ago

Post

4634

We fixed many bugs in Phi-4 & uploaded fixed GGUF + 4-bit versions! ✨

Our fixed versions are even higher on the Open LLM Leaderboard than Microsoft's!

GGUFs: unsloth/phi-4-GGUF
Dynamic 4-bit: unsloth/phi-4-unsloth-bnb-4bit

You can also now finetune Phi-4 for free on Colab: https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Phi_4-Conversational.ipynb

Read our blogpost for more details on bug fixes etc: https://unsloth.ai/blog/phi4