Jaward Sesay's picture

Jaward Sesay

Jaward

AI & ML interests

I like to train large deep neural nets too 🧠🤖💥 | First Paper (AutoAgents: A Framework for Automatic Agent Generation) Accepted @ IJCAI 2024 | Role Model Karpathy

Recent Activity

Organizations

MLX Community's profile picture

Jaward's activity

posted an update 2 days ago
view post
Post
2726
ByteDance drops OmniHuman🔥
This is peak SOTA performance - flawless natural gestures with perfect lip sync and facial expressions. This is the second time they've released SOTA level talking-heads only this time with hands and body motion.
Project: https://omnihuman-lab.github.io/
  • 2 replies
·
posted an update 6 days ago
view post
Post
1432
The beauty in GRPO is the fact that it doesn’t care if the rewards are rule-based or learned, the hack: let the data self-normalize— trajectories in a batch compete against their mean, no value model, no extra params, just clean, efficient RL that cuts memory usage by 50%, while maintaining SOTA performance. btw it was introduced 9months prior to R1: arxiv.org/pdf/2402.03300
  • 1 reply
·
upvoted an article 6 days ago
view article
Article

Open-R1: a fully open reproduction of DeepSeek-R1

646
reacted to mlabonne's post with 🧠 21 days ago
view post
Post
4039
🆕 LLM Course 2025 edition!

I updated the LLM Scientist roadmap and added a ton of new information and references. It covers training, datasets, evaluation, quantization, and new trends like test-time compute scaling.

The LLM Course has been incredibly popular (41.3k stars!) and I've been touched to receive many, many messages about how it helped people in their careers.

I know how difficult this stuff can be, so I'm super proud of the impact it had. I want to keep updating it in 2025, especially with the LLM Engineer roadmap.

Thanks everyone, hope you'll enjoy it!

💻 LLM Course: https://huggingface.co/blog/mlabonne/llm-course
posted an update 23 days ago
posted an update 28 days ago
posted an update about 1 month ago
view post
Post
2316
damn I love nvidia's bullish stance on taking AI to the edge - from being the overlord of compute to cutting-edge physical AI with SOTA multiverse simulation engines that brings the scaling laws under your control!!

My favorite: Cosmos - fully opensourced, open-weight physics based video gen platform, what an incredible way to start off the year✨

Code: https://github.com/NVIDIA/Cosmos
Models: nvidia/cosmos-6751e884dc10e013a0a0d8e6
Paper: https://d1qx31qr3h6wln.cloudfront.net/publications/NVIDIA%20Cosmos_2.pdf
posted an update about 1 month ago
view post
Post
3024
nanoBLT: Simplified lightweight implementation of a character-level Byte Latent Transformer model (under 500 lines of code). The model is 2x4x2 (n_layers_encoder, n_layers_latent, n_layers_decoder) layer deep trained on ~1M bytes of tiny Shakespeare with a patch size of 4.

Code: https://github.com/Jaykef/ai-algorithms/blob/main/byte_latent_transformer.ipynb
replied to their post about 2 months ago
view reply

btw the background songs in the videos are actually what I listen to during implementation

posted an update about 2 months ago