OPEA/DeepSeek-R1-int2-mixed-sym-inc

Model Details

This model is an int2 model with group_size 64 and symmetric quantization of deepseek-ai/DeepSeek-R1 generated by intel/auto-round algorithm. Some layers are fallback to 4/16 bits. Refer to Section "Generate the model" for more details of mixed bits setting.

Please follow the license of the original model. This model could NOT run on other severing frameworks.

How To Use

INT2 Inference on CUDA(4X80G)

please note int2 may be slower than int4 on CUDA due to kernel issue.

To prevent potential overflow and achieve better accuracy, we recommend using the CPU version detailed in the next section.

import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
from auto_round import AutoRoundConfig ##must import for auto-round format


#  https://github.com/huggingface/transformers/pull/35493
def set_initialized_submodules(model, state_dict_keys):
    """
    Sets the `_is_hf_initialized` flag in all submodules of a given model when all its weights are in the loaded state
    dict.
    """
    state_dict_keys = set(state_dict_keys)
    not_initialized_submodules = {}
    for module_name, module in model.named_modules():
        if module_name == "":
            # When checking if the root module is loaded there's no need to prepend module_name.
            module_keys = set(module.state_dict())
        else:
            module_keys = {f"{module_name}.{k}" for k in module.state_dict()}
        if module_keys.issubset(state_dict_keys):
            module._is_hf_initialized = True
        else:
            not_initialized_submodules[module_name] = module
    return not_initialized_submodules


transformers.modeling_utils.set_initialized_submodules = set_initialized_submodules

import torch

quantized_model_dir = "OPEA/DeepSeek-R1-int2-mixed-sym-inc"

## directly use device_map='auto' if you have enough GPUs
device_map = {"model.norm": 0, "lm_head": 0, "model.embed_tokens": 0}
for i in range(61):
    name = "model.layers." + str(i)
    if i < 15:
        device_map[name] = 0
    elif i < 30:
        device_map[name] = 1
    elif i < 45:
        device_map[name] = 2
    else:
        device_map[name] = 3

model = AutoModelForCausalLM.from_pretrained(
    quantized_model_dir,
    torch_dtype=torch.float16,
    trust_remote_code=True,
    device_map=device_map,
)


def forward_hook(module, input, output):
    return torch.clamp(output, -65504, 65504)


def register_fp16_hooks(model):
    for name, module in model.named_modules():
        if "QuantLinear" in module.__class__.__name__ or isinstance(module, torch.nn.Linear):
            module.register_forward_hook(forward_hook)


register_fp16_hooks(model)  ##better add this hook to avoid overflow

tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir, trust_remote_code=True)
prompts = [
    "9.11和9.8哪个数字大",
    "如果你是人，你最想做什么“",
    "How many e in word deepseek",
    "There are ten birds in a tree. A hunter shoots one. How many are left in the tree?",
]

texts = []
for prompt in prompts:
    messages = [
        {"role": "user", "content": prompt}
    ]
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    texts.append(text)
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True)

outputs = model.generate(
    input_ids=inputs["input_ids"].to(model.device),
    attention_mask=inputs["attention_mask"].to(model.device),
    max_length=512,  ##change this to align with the official usage
    num_return_sequences=1,
    do_sample=False  ##change this to align with the official usage
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(inputs["input_ids"], outputs)
]

decoded_outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

for i, prompt in enumerate(prompts):
    input_id = inputs
    print(f"Prompt: {prompt}")
    print(f"Generated: {decoded_outputs[i]}")
    print("-" * 50)

    
    
"""
Prompt: 9.11和9.8哪个数字大
Generated: <think>
首先，我需要比较两个数字：9.11和9.8。

为了准确比较，我首先将它们转换为相同的小数位数。将9.8写成9.80，这样两个数字都有两位小数。

接下来，我比较整数部分。两个数字的整数部分都是9，所以它们在这一部分相等。

然后，我比较小数部分。9.11的小数部分是0.11，而9.80的小数部分是0.80。显然，0.80大于0.11。

因此，综合整数和小数部分的比较结果，9.80大于9.11。
</think>

要比较两个数字 **9.11** 和 **9.8** 的大小，我们可以按照以下步骤进行：

1. **统一小数位数**：
   - 将 **9.8** 写成 **9.80**，以便与 **9.11** 进行比较。

2. **比较整数部分**：
   - 两个数字的整数部分都是 **9**，因此整数部分相等。

3. **比较小数部分**：
   - **9.11** 的小数部分是 **0.11**。
   - **9.80** 的小数部分是 **0.80**。
   - 显然，**0.80** 大于 **0.11**。

4. **综合比较结果**：
   - 由于整数部分相等，小数部分 **0.80** 大于 **0.11**，因此 **9.80** 大于 **9.11**。

**最终结论：**

\[
\boxed{9.8 \text{ 大于 } 9.11}
\]
--------------------------------------------------

Prompt: 如果你是人，你最想做什么“
Generated: <think>
嗯，如果我是人，我最想做什么呢？这个问题挺有意思的。首先，我需要理解“如果我是人”这个前提。作为一个人，我有自己的思想、情感和自由意志，对吧？所以，我可以选择自己想要的生活方式和追求的目标。

首先，可能我会考虑自己的兴趣和爱好。比如，如果我喜欢艺术，可能会想成为画家或音乐家；如果对科学感兴趣，可能会投身于研究。但也许我更倾向于帮助别人，所以可能选择成为医生、教师或社会工作者。这些都是常见的职业选择，但作为一个人，可能还有更多的可能性。

然后，我需要考虑自己的价值观。如果我认为家庭很重要，可能会想建立一个幸福的家庭，花时间陪伴家人。如果更关注社会贡献，可能会参与公益活动，帮助需要帮助的人。或者，如果追求个人成就，可能会努力在事业上取得成功，获得认可和成就感。

另外，作为人，可能会有很多梦想和愿望。比如，旅行世界，体验不同的文化；学习新技能，不断自我提升；或者追求某种精神层面的满足，比如冥想、哲学探索等。这些都是可能的选项。

不过，也有可能存在挑战和困难。比如，经济压力、时间限制、社会压力等，这些都可能影响我的选择。所以，我需要权衡利弊，找到最适合自己的道路。

还有，作为人，可能会有不同的阶段。年轻时可能更注重探索和冒险，中年时可能追求稳定和家庭，老年时可能寻求平静和传承。不同阶段有不同的目标和愿望。

另外，人际关系也很重要。作为人，与朋友、家人、同事的关系会影响我的幸福感和满足感。所以，维护良好的人际关系可能也是一个重要的目标。

可能还需要考虑健康问题。保持身体健康，才能更好地追求其他目标。所以，锻炼、合理饮食、心理健康也是需要考虑的方面。

还有，教育的重要性。不断学习和成长，获取新知识，提升自己的能力，这对实现各种目标都是基础。

当然，每个人的情况不同，所以我的选择也会因人而异。但总的来说，作为人，最想做的事情可能是一个综合性的目标，结合了个人兴趣、价值观、社会贡献、家庭、健康等多个方面。

不过，可能还需要考虑现实因素。比如，经济条件允许吗？社会支持如何？有没有足够的资源和机会？这些都会影响最终的选择。

也许，如果我是人，我会追求一种平衡的生活，既满足个人发展，又能帮助他人，同时享受生活中的美好时光。这可能包括
--------------------------------------------------
Prompt: How many e in word deepseek
Generated: <think>
Okay, so I need to figure out how many times the letter 'e' appears in the word "deepseek." Let me start by writing the word out to see each letter clearly. The word is d-e-e-p-s-e-e-k. Let me count each letter one by one.

Starting with the first letter: that's a 'd'. No 'e' there. The next letter is 'e', so that's one. Then the next letter is another 'e', making it two. The third letter is 'p', which isn't an 'e'. Then comes 's', also not an 'e'. After that, another 'e', so that's three. Then another 'e', bringing the count to four. Finally, the last letter is 'k', which isn't an 'e'.

Wait, let me check again to make sure I didn't miss any. The word is spelled D-E-E-P-S-E-E-K. So breaking it down: D, E, E, P, S, E, E, K. So positions 2, 3, 6, and 7 are all 'e's. That's four 'e's in total. Hmm, I think that's correct. Let me verify by writing the letters with their positions:

1: D
2: E
3: E
4: P
5: S
6: E
7: E
8: K

Yes, positions 2, 3, 6, and 7 are all 'e's. So that's four instances of the letter 'e'. I don't think I missed any. So the answer should be 4.
</think>

The word "deepseek" contains the letter 'e' four times.

**Step-by-Step Explanation:**
1. Write out the word: D, E, E, P, S, E, E, K.
2. Identify each 'e' by its position:
   - Position 2: E
   - Position# 1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.

--------------------------------------------------

Prompt: There are ten birds in a tree. A hunter shoots one. How many are left in the tree?
Generated: <think>
Okay, so there's this problem here: "There are ten birds in a tree. A hunter shoots one. How many are left in the tree?" Hmm, at first glance, it seems straightforward, but I remember sometimes these kinds of questions have a trick to them. Let me think through this step by step.

Alright, starting with the basics. If there are ten birds in a tree, and a hunter shoots one, the immediate mathematical answer would be 10 minus 1, which is 9. So, nine birds left. But wait, maybe there's more to it. Sometimes these riddles play on the situation rather than just the numbers. Let me consider different angles.

First, when the hunter shoots, the sound of the gunshot might scare the other birds. So, even if only one bird is shot, the rest might fly away. If all the other birds get scared and fly off, then there would be zero birds left in the tree. But is that the case here? The problem doesn't specify whether the other birds are scared or not. It just says a hunter shoots one. So, maybe the question is testing if you consider that possibility.

But then again, maybe it's a straightforward math problem. If you take one away from ten, you get nine. But in real-life scenarios, the noise would likely cause the other birds to flee. However, the problem doesn't mention anything about the birds flying away. So, is it assuming that the other birds stay? Or is it expecting you to consider the real-world consequence?

I think this is a classic riddle where the expected answer is zero because the remaining birds would fly away after the gunshot. But I should verify if that's the common interpretation. Let me check similar riddles. For example, "There are ten birds on a fence, you shoot one, how many are left?" The answer is usually zero because the rest fly away. So, applying that logic here, even though the setting is a tree instead of a fence, the principle would be the same. The act of shooting would scare the other birds, resulting in none remaining.

But wait, maybe the question is more literal. If the hunter successfully shoots one bird, that bird would presumably fall out of the tree, leaving nine. But if the other birds don't flee, they would remain. However, in reality, birds are

"""

INT2 Inference on CPU

Requirements

pip install auto-round
pip uninstall intel-extension-for-pytorch
pip install intel-extension-for-transformers

It would be quite slow if the cpu does not support avx512

import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
from auto_round import AutoRoundConfig ##must import for auto-round format

#  https://github.com/huggingface/transformers/pull/35493
def set_initialized_submodules(model, state_dict_keys):
    """
    Sets the `_is_hf_initialized` flag in all submodules of a given model when all its weights are in the loaded state
    dict.
    """
    state_dict_keys = set(state_dict_keys)
    not_initialized_submodules = {}
    for module_name, module in model.named_modules():
        if module_name == "":
            # When checking if the root module is loaded there's no need to prepend module_name.
            module_keys = set(module.state_dict())
        else:
            module_keys = {f"{module_name}.{k}" for k in module.state_dict()}
        if module_keys.issubset(state_dict_keys):
            module._is_hf_initialized = True
        else:
            not_initialized_submodules[module_name] = module
    return not_initialized_submodules


transformers.modeling_utils.set_initialized_submodules = set_initialized_submodules

import torch

quantized_model_dir = "OPEA/DeepSeek-R1-int2-mixed-sym-inc"


quantization_config = AutoRoundConfig(
    backend="cpu",
)
model = AutoModelForCausalLM.from_pretrained(
    quantized_model_dir,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="cpu",
    quantization_config=quantization_config
)



tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir, trust_remote_code=True)
prompts = [
    "9.11和9.8哪个数字大",
    "如果你是人，你最想做什么“",
    "How many e in word deepseek",
    "There are ten birds in a tree. A hunter shoots one. How many are left in the tree?",
]

texts = []
for prompt in prompts:
    messages = [
        {"role": "user", "content": prompt}
    ]
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    texts.append(text)
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True)

outputs = model.generate(
    input_ids=inputs["input_ids"].to(model.device),
    attention_mask=inputs["attention_mask"].to(model.device),
    max_length=512,  ##change this to align with the official usage
    num_return_sequences=1,
    do_sample=False  ##change this to align with the official usage
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(inputs["input_ids"], outputs)
]

decoded_outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

for i, prompt in enumerate(prompts):
    input_id = inputs
    print(f"Prompt: {prompt}")
    print(f"Generated: {decoded_outputs[i]}")
    print("-" * 50)
    
"""
Prompt: 9.11和9.8哪个数字大
Generated: <think>
首先，我需要比较两个数字：9.11和9.8。

首先，比较整数部分。两个数字的整数部分都是9，所以整数部分相同。

接下来，比较小数部分。9.11的小数部分是0.11，而9.8的小数部分是0.8。

由于0.8大于0.11，因此9.8的小数部分更大。

综合整数部分和小数部分的结果，可以确定9.8大于9.11。
</think>

要比较两个数字 **9.11** 和 **9.8** 的大小，可以按照以下步骤进行：

1. **比较整数部分**：
   - 两个数字的整数部分都是 **9**，因此整数部分相同。

2. **比较小数部分**：
   - **9.11** 的小数部分是 **0.11**
   - **9.8** 的小数部分是 **0.8**（即 **0.80**）

3. **比较小数部分的大小**：
   - **0.80** 大于 **0.11**，因此 **9.8** 的小数部分更大。

**结论**：由于整数部分相同且 **9.8** 的小数部分更大，因此 **9.8** 大于 **9.11**。

\boxed{9.8}

--------------------------------------------------
Prompt: 如果你是人，你最想做什么“
Generated: <think>
嗯，如果我是人，我最想做什么呢？这个问题挺有意思的。首先，我需要理解“人”在这里指的是什么。可能是指拥有自主意识、情感和自由意志的人类。那么，作为一个人，我最想做的事情可能和人类通常追求的东西有关，比如幸福、成就、人际关系、自我实现等等。

首先，幸福可能是大多数人的追求。所以，如果我是人，我可能会追求让自己感到快乐和满足的事情。这可能包括从事自己喜欢的活动，比如艺术、音乐、运动，或者帮助他人，因为帮助他人也能带来满足感。

其次，自我实现也是一个重要的方面。根据马斯洛的需求层次理论，自我实现是最高层次的需求，指的是实现个人潜能、追求个人成长和高峰体验。所以，如果我是人，我可能会追求在某个领域达到卓越，比如成为科学家、艺术家、作家，或者在其他专业领域有所建树。

另外，人际关系也是人类生活的重要组成部分。建立和维护亲密的关系，如家庭、朋友和伴侣关系，可能会是重要的目标。作为人，我可能会努力培养这些关系，寻找爱和归属感。

还有，探索和好奇心也是人类的驱动力。如果我是人，可能会对世界充满好奇，想要探索不同的文化、科学、技术，或者旅行到不同的地方，体验不同的生活方式。

不过，这些想法可能受到个人价值观、文化背景和生活经历的影响。不同的人可能有不同的优先事项。例如，有些人可能更注重物质财富，而另一些人则更重视精神层面的满足。此外，个人的兴趣和才能也会影响他们的选择。比如，一个对技术感兴趣的人可能会追求科技创新，而一个热爱艺术的人可能会投身于创作。

另外，作为人，可能还会面临各种挑战和困难，比如经济压力、健康问题、社会压力等。因此，在追求目标的过程中，也需要应对这些挑战，这可能涉及到解决问题、适应变化和克服障碍。

还有，道德和伦理也是需要考虑的因素。作为人，可能会思考自己的行为对他人和社会的影响，努力做出符合道德的选择，比如环保、社会正义等。

总结一下，如果我是人，我最想做的事情可能包括追求幸福、自我实现、建立有意义的人际关系、探索世界、应对挑战，并遵循道德准则。当然，具体的目标和愿望可能会因人而异，取决于个人的价值观、兴趣和环境。
</think>

如果我是人，我最想做的事情将围绕以下几个核心方面展开：

1. **

--------------------------------------------------

Prompt: How many e in word deepseek
Generated: <think>
Okay, so I need to figure out how many times the letter 'e' appears in the word "deepseek". Let me start by writing down the word to visualize it better. The word is D, E, E, P, S, E, E, K. Let me count each letter one by one.

Starting with the first letter: D. That's not an 'e', so I can skip that. The next letter is E. That's one 'e'. The third letter is another E, so that's two. Then comes P, which isn't an 'e'. After P is S, also not an 'e'. Then the next letter is E again, making it three. The following letter is another E, bringing the count to four. Finally, the last letter is K, which isn't an 'e'.

Wait, let me check again to make sure I didn't miss any. D, E, E, P, S, E, E, K. So positions 2, 3, 6, and 7 are all 'e's. That's four in total. Hmm, but sometimes people might get confused if there are uppercase or lowercase letters, but since the word is written as "deepseek", all letters are lowercase, so that shouldn't be an issue.

I think that's it. The letter 'e' appears four times in "deepseek".
</think>

The letter 'e' appears 4 times in the word "deepseek".

**Step-by-Step Explanation:**

1. **Write out the word:** D, E, E, P, S, E, E, K.
2. **Identify each letter and count the 'e's:**
   - Position 1: D (not 'e')
   - Position 2: E (count = 1)
   - Position 3: E (count = 2)
   - Position 4: P (not 'e')
   - Position 5: S (not 'e')
   - Position 6: E (count = 3)
   - Position 7: E (count = 4)
   - Position 8: K (not 'e')
3. **Total count of 'e's:** 4

**Answer:** There are 4 'e's in the word "deepseek".

--------------------------------------------------
Prompt: There are ten birds in a tree. A hunter shoots one. How many are left in the tree?
Generated: <think>
Okay, so there's this problem here: "There are ten birds in a tree. A hunter shoots one. How many are left in the tree?" Hmm, at first glance, it seems straightforward, but I remember sometimes these kinds of questions have a trick to them. Let me think through this step by step.

Alright, starting with the basics. There are ten birds in the tree. If a hunter shoots one, the immediate thought is to subtract one from ten, which would leave nine birds. But wait, maybe there's more to it. I've heard similar riddles where the answer isn't just a simple subtraction. For example, sometimes when a bird is shot, the other birds might fly away because of the noise. So, if the hunter shoots one bird, the rest might get scared and leave the tree. In that case, there would be zero birds left. But the problem doesn't explicitly say that the other birds fly away. It just says the hunter shoots one. So, do we assume the others stay, or do we assume they flee?

Let me consider both possibilities. If we take the problem literally, the hunter shoots one bird, so that one is either dead or injured and falls out of the tree. The remaining birds would then be ten minus one, which is nine. But if the gunshot scares the other birds, they might all fly away immediately. In that scenario, even though the hunter only shot one, the rest are startled and leave, resulting in zero birds remaining.

Which interpretation is correct? The problem doesn't specify whether the other birds are scared by the gunshot. It's a bit ambiguous. In typical riddles like this, the answer is often zero because the noise would cause the other birds to fly off. But if we're being strictly mathematical and not considering the behavior of the birds, it would be nine. However, since this is presented as a riddle, it's more likely expecting the answer that considers the behavior of the birds, leading to zero remaining.

Wait, but let me check if there's another angle. Maybe the question is testing something else. For example, if the hunter shoots one bird, but misses, then all the birds might stay. But the problem says the hunter shoots one, which implies that the shot was successful. So the bird is hit. If the hunter hits one bird, that bird is

"""

Evaluate the model

we have no enough resource to evaluate the model

Generate the model

5*80g and 1.4T-1.6T memory is required

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import transformers

#  https://github.com/huggingface/transformers/pull/35493
def set_initialized_submodules(model, state_dict_keys):
    """
    Sets the `_is_hf_initialized` flag in all submodules of a given model when all its weights are in the loaded state
    dict.
    """
    state_dict_keys = set(state_dict_keys)
    not_initialized_submodules = {}
    for module_name, module in model.named_modules():
        if module_name == "":
            # When checking if the root module is loaded there's no need to prepend module_name.
            module_keys = set(module.state_dict())
        else:
            module_keys = {f"{module_name}.{k}" for k in module.state_dict()}
        if module_keys.issubset(state_dict_keys):
            module._is_hf_initialized = True
        else:
            not_initialized_submodules[module_name] = module
    return not_initialized_submodules


transformers.modeling_utils.set_initialized_submodules = set_initialized_submodules

model_name = "opensourcerelease/DeepSeek-R1-bf16"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, torch_dtype="auto")

block = model.model.layers
device_map = {}

for n, m in block.named_modules():
    if isinstance(m, (torch.nn.Linear, transformers.modeling_utils.Conv1D)):
        if "experts" in n and ("shared_experts" not in n) and int(n.split('.')[-2]) < 63:
            device = "cuda:1"
        elif "experts" in n and ("shared_experts" not in n) and int(n.split('.')[-2]) >= 63 and int(
                n.split('.')[-2]) < 128:
            device = "cuda:2"
        elif "experts" in n and ("shared_experts" not in n) and int(n.split('.')[-2]) >= 128 and int(
                n.split('.')[-2]) < 192:
            device = "cuda:3"
        elif "experts" in n and ("shared_experts" not in n) and int(
                n.split('.')[-2]) >= 192:
            device = "cuda:4"
        else:
            device = "cuda:0"
        n = n[2:]

        device_map.update({n: device})

from auto_round import AutoRound

layer_config = {}
for n, m in model.named_modules():
    if not isinstance(m, (torch.nn.Linear, transformers.modeling_utils.Conv1D)):
        continue
    if not "experts" in n:
        layer_config[n] = {"bits": 4, "group_size": 128}
    if "experts" in n and "shared_experts" in n:
        layer_config[n] = {"bits": 4, "group_size": 128}
    ##handle first 3 layers
    name_splits = n.split('.')
    if len(name_splits) >= 3 and int(name_splits[2]) < 3:
        layer_config[n] = {"bits": 4, "group_size": 128}
    if len(name_splits) >= 3 and int(name_splits[2]) == 60 and "down_proj" in n:
        layer_config[n] = {"bits": 16}


layer_config["lm_head"] = {"bits": 16}
autoround = AutoRound(model=model, tokenizer=tokenizer, device_map=device_map, bits=2, group_size=64,
                      iters=400, batch_size=4, seqlen=512, nsamples=512, enable_torch_compile=False,
                      layer_config=layer_config)
autoround.quantize()
autoround.save_quantized(format="auto_round", output_dir="tmp_autoround")

Ethical Considerations and Limitations

The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.

Therefore, before deploying any applications of the model, developers should perform safety testing.

Caveats and Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

Here are a couple of useful links to learn more about Intel's AI software:

Intel Neural Compressor link

Disclaimer

The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.

Cite

@article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

arxiv github

OPEA
/

DeepSeek-R1-int2-mixed-sym-inc