Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
deepseek-ai
/
DeepSeek-R1
like
7.23k
Follow
DeepSeek
27.6k
Text Generation
Transformers
Safetensors
deepseek_v3
conversational
custom_code
fp8
arxiv:
2501.12948
License:
mit
Model card
Files
Files and versions
Community
114
Train
Deploy
Use this model
请问下deepseek的同学,能不能train出一个 stable 的 moe model?
#97
by
tflchina
- opened
5 days ago
Discussion
tflchina
5 days ago
意思是对同一prompt, topk的moe基本保持不变。这样的好处是有可能32GB的显卡可以跑600B的moe model.
See translation
tflchina
changed discussion title from
能不能train出一个 stable 的 moe model?
to
请问下deepseek的同学,能不能train出一个 stable 的 moe model?
5 days ago
Edit
Preview
Upload images, audio, and videos by dragging in the text input, pasting, or
clicking here
.
Tap or paste here to upload images
Comment
·
Sign up
or
log in
to comment