3 17

UnstableLlama

AI & ML interests

Local AI

Recent Activity

updated a model 6 days ago

UnstableLlama/Llama-3.1-Nemotron-70B-Reward-exl2

published a model 6 days ago

UnstableLlama/Llama-3.1-Nemotron-70B-Reward-exl2

liked a model 9 days ago

turboderp/Qwen2.5-VL-7B-Instruct-exl2

View all activity

Organizations

None yet

UnstableLlama's activity

updated a model 6 days ago

UnstableLlama/Llama-3.1-Nemotron-70B-Reward-exl2

Text Generation • Updated 6 days ago

published a model 6 days ago

UnstableLlama/Llama-3.1-Nemotron-70B-Reward-exl2

Text Generation • Updated 6 days ago

liked a model 9 days ago

turboderp/Qwen2.5-VL-7B-Instruct-exl2

Updated 9 days ago • 70 • 5

reacted to chansung's post with 👍 21 days ago

Post

2025

Simple Summarization on DeepSeek-R1 from DeepSeek AI

The RL stage is very important.
↳ However, it is difficult to create a truly helpful AI for people solely through RL.
↳ So, we applied a learning pipeline consisting of four stages: providing a good starting point, reasoning RL, SFT, and safety RL, and achieved performance comparable to o1.
↳ Simply fine-tuning other open models with the data generated by R1-Zero (distillation) resulted in performance comparable to o1-mini.

Of course, this is just a brief overview and may not be of much help. All models are accessible on Hugging Face, and the paper can be read through the GitHub repository.

Model: https://huggingface.co/deepseek-ai
Paper: https://github.com/deepseek-ai/DeepSeek-R1

1 reply

liked a model 22 days ago

deepseek-ai/DeepSeek-R1

Text Generation • Updated 3 days ago • 2.94M • • 8.31k

updated a model 3 months ago

UnstableLlama/Marco-o1-exl2

Text Generation • Updated Nov 26, 2024 • 14

liked a model 3 months ago

turboderp/pixtral-12b-exl2

Updated Nov 11, 2024 • 115 • 8

reacted to ezgikorkmaz's post with 🚀 3 months ago

Post

2090

I wrote a recent survey about deep reinforcement learning. The paper is a compact guide to understand some of the key concepts in reinforcement learning. Find the paper below:

Paper: https://arxiv.org/pdf/2401.02349v2
Twitter: https://x.com/EzgiKorkmazAI/status/1851934161138798615