Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
40.9
TFLOPS
17
3
19
Mariusz Kurman
PRO
mkurman
Follow
usamakhatab980's profile picture
asad1005's profile picture
yennj12's profile picture
44 followers
·
28 following
mkurman88
mkurman
mariuszkurman
mkurman.bsky.social
AI & ML interests
AI Tech Lead | MD
Recent Activity
new
activity
about 10 hours ago
mkurman/Llama-3.2-MedIT-SUN-2.5B-BT-GRPO:
Issue with Padding
reacted
to
nicolay-r
's
post
with 🔥
4 days ago
📢 The LLaMA-3.1-8B distilled 8B version of the R1 DeepSeek AI is available besides the one based on Qwen 📙 Notebook for using it in reasoning over series of data 🧠 : https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/llm_deep_seek_7b_distill_llama3.ipynb Loading using the pipeline API of the transformers library: https://github.com/nicolay-r/nlp-thirdgate/blob/master/llm/transformers_llama.py 🟡 GPU Usage: 12.3 GB (FP16/FP32 mode) which is suitable for T4. (a 1.5 GB less than Qwen-distilled version) 🐌 Perfomance: T4 instance: ~0.19 tokens/sec (FP32 mode) and (FP16 mode) ~0.22-0.30 tokens/sec. Is it should be that slow? 🤔 Model name: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B ⭐ Framework: https://github.com/nicolay-r/bulk-chain 🌌 Notebooks and models hub: https://github.com/nicolay-r/nlp-thirdgate
updated
a model
4 days ago
mkurman/Llama-3.2-MedIT-SUN-2.5B-BT-GRPO
View all activity
Organizations
mkurman
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
upvoted
a
collection
3 months ago
MedIT SUN
Collection
Llama 3.2 1B upscaled to 2.5B parameters
•
4 items
•
Updated
Nov 27, 2024
•
1
upvoted
an
article
8 months ago
view article
Article
Space secrets security update
May 31, 2024
•
50