请问下deepseek的同学，能不能train出一个 stable 的 moe model?

#97

by tflchina - opened 5 days ago

5 days ago

意思是对同一prompt, topk的moe基本保持不变。这样的好处是有可能32GB的显卡可以跑600B的moe model.

tflchina changed discussion title from 能不能train出一个 stable 的 moe model? to 请问下deepseek的同学，能不能train出一个 stable 的 moe model? 5 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment