Mariusz Kurman's picture

Mariusz Kurman PRO

mkurman

·

AI & ML interests

AI Tech Lead | MD

Recent Activity

new activity about 10 hours ago

mkurman/Llama-3.2-MedIT-SUN-2.5B-BT-GRPO:Issue with Padding

reacted to nicolay-r's post with 🔥 4 days ago

📢 The LLaMA-3.1-8B distilled 8B version of the R1 DeepSeek AI is available besides the one based on Qwen 📙 Notebook for using it in reasoning over series of data 🧠 : https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/llm_deep_seek_7b_distill_llama3.ipynb Loading using the pipeline API of the transformers library: https://github.com/nicolay-r/nlp-thirdgate/blob/master/llm/transformers_llama.py 🟡 GPU Usage: 12.3 GB (FP16/FP32 mode) which is suitable for T4. (a 1.5 GB less than Qwen-distilled version) 🐌 Perfomance: T4 instance: ~0.19 tokens/sec (FP32 mode) and (FP16 mode) ~0.22-0.30 tokens/sec. Is it should be that slow? 🤔 Model name: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B ⭐ Framework: https://github.com/nicolay-r/bulk-chain 🌌 Notebooks and models hub: https://github.com/nicolay-r/nlp-thirdgate

updated a model 4 days ago

mkurman/Llama-3.2-MedIT-SUN-2.5B-BT-GRPO

View all activity

Organizations

mkurman's activity

upvoted a collection 3 months ago

MedIT SUN

Llama 3.2 1B upscaled to 2.5B parameters • 4 items • Updated Nov 27, 2024 • 1

upvoted an article 8 months ago

Article

Space secrets security update

May 31, 2024

• 50