无辅助损失的专家路由

#56
by qing9 - opened

https://huggingface.co/deepseek-ai/DeepSeek-V3/blob/main/modeling_deepseek.py#:~:text=group_scores%20%3D%20(,%23%20%5Bn%2C%20n_group%5D
这个地方的topk(2, dim=-1)是不是有问题?应该是topk(topk_group)?

看到代码里有modeling_deepseek.py", line 439, in forward:assert not self.training
仓库里给的模型定义文件是不支持用来进行微调吗?

Sign up or log in to comment