Jaward Sesay

Jaward

AI & ML interests

I like to train large deep neural nets too 🧠🤖💥 | First Paper (AutoAgents: A Framework for Automatic Agent Generation) Accepted @ IJCAI 2024 | Role Model Karpathy

Recent Activity

posted an update 2 days ago

ByteDance drops OmniHuman🔥 This is peak SOTA performance - flawless natural gestures with perfect lip sync and facial expressions. This is the second time they've released SOTA level talking-heads only this time with hands and body motion. Project: https://omnihuman-lab.github.io/

upvoted a paper 2 days ago

Process Reinforcement through Implicit Rewards

upvoted a paper 2 days ago

OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models

View all activity

Organizations

Jaward's activity

posted an update 2 days ago

Post

2726

ByteDance drops OmniHuman🔥
This is peak SOTA performance - flawless natural gestures with perfect lip sync and facial expressions. This is the second time they've released SOTA level talking-heads only this time with hands and body motion.
Project: https://omnihuman-lab.github.io/

2 replies

upvoted 2 papers 2 days ago

Process Reinforcement through Implicit Rewards

Paper • 2502.01456 • Published 3 days ago • 53

OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models

Paper • 2502.01061 • Published 3 days ago • 149

posted an update 6 days ago

Post

1432

The beauty in GRPO is the fact that it doesn’t care if the rewards are rule-based or learned, the hack: let the data self-normalize— trajectories in a batch compete against their mean, no value model, no extra params, just clean, efficient RL that cuts memory usage by 50%, while maintaining SOTA performance. btw it was introduced 9months prior to R1: arxiv.org/pdf/2402.03300

1 reply

upvoted an article 6 days ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

10 days ago

• 646

liked a model 12 days ago

deepseek-ai/DeepSeek-R1

Text Generation • Updated 5 days ago • 1.54M • • 7.21k

liked a Space 14 days ago

495

DeepSeek-R1 WebGPU

🧠

Next-generation reasoning model that runs locally in-browser

upvoted a paper 17 days ago

Evolving Deeper LLM Thinking

Paper • 2501.09891 • Published 21 days ago • 105

reacted to mlabonne's post with 🧠 21 days ago

Post

4039

🆕 LLM Course 2025 edition!

I updated the LLM Scientist roadmap and added a ton of new information and references. It covers training, datasets, evaluation, quantization, and new trends like test-time compute scaling.

The LLM Course has been incredibly popular (41.3k stars!) and I've been touched to receive many, many messages about how it helped people in their careers.

I know how difficult this stuff can be, so I'm super proud of the impact it had. I want to keep updating it in 2025, especially with the LLM Engineer roadmap.

Thanks everyone, hope you'll enjoy it!

💻 LLM Course: https://huggingface.co/blog/mlabonne/llm-course

liked a model 21 days ago

unsloth/phi-4-GGUF

Text Generation • Updated 24 days ago • 77.7k • 138

posted an update 23 days ago

Post

1869

minimal single script implementation of knowledge distillation in LLMs. In this implementation, we use GPT-2 (124M) as student model and GPT-2 Medium (340M) as teacher via reverse Kullback-Leibler (KL) divergence, trained on a small chunk of openwebtext.

Code: https://github.com/Jaykef/ai-algorithms/blob/main/llm_knowledge_distillation.ipynb

liked a model 24 days ago

deepseek-ai/DeepSeek-V3

Text Generation • Updated 13 days ago • 1.05M • • 3.21k

posted an update 28 days ago

Post

1366

Huge AI win in medicine👏
"Large language of life model" just dropped!!
Full paper: https://www.nature.com/articles/s41586-024-08391-z

1 reply

upvoted a collection about 1 month ago

Cosmos

Collection

The collection of Cosmos models • 31 items • Updated 20 days ago • 254

posted an update about 1 month ago

Post

2316

damn I love nvidia's bullish stance on taking AI to the edge - from being the overlord of compute to cutting-edge physical AI with SOTA multiverse simulation engines that brings the scaling laws under your control!!

My favorite: Cosmos - fully opensourced, open-weight physics based video gen platform, what an incredible way to start off the year✨

Code: https://github.com/NVIDIA/Cosmos
Models: nvidia/cosmos-6751e884dc10e013a0a0d8e6
Paper: https://d1qx31qr3h6wln.cloudfront.net/publications/NVIDIA%20Cosmos_2.pdf

liked a model about 1 month ago

Qwen/QVQ-72B-Preview

Image-Text-to-Text • Updated 26 days ago • 167k • 538

posted an update about 1 month ago

Post

3024

nanoBLT: Simplified lightweight implementation of a character-level Byte Latent Transformer model (under 500 lines of code). The model is 2x4x2 (n_layers_encoder, n_layers_latent, n_layers_decoder) layer deep trained on ~1M bytes of tiny Shakespeare with a patch size of 4.

Code: https://github.com/Jaykef/ai-algorithms/blob/main/byte_latent_transformer.ipynb

liked a model about 1 month ago

deepseek-ai/DeepSeek-V3-Base

Updated 13 days ago • 28.8k • 1.52k

replied to their post about 2 months ago

btw the background songs in the videos are actually what I listen to during implementation

posted an update about 2 months ago

Post

1807

Implements from first-principle a discrete flow matching model for code generation- trained a small sized 2D dfm model on two variations of code for binary search. The result was amazing, code in comment:
Code: https://github.com/Jaykef/ai-algorithms/blob/main/dfm.ipynb

1 reply