ZeroXClem commited on
Commit
15edd3a
·
verified ·
1 Parent(s): 043857b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +156 -8
README.md CHANGED
@@ -8,21 +8,37 @@ tags:
8
  - mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1
9
  - Triangle104/DSR1-Distill-Qwen-7B-RP
10
  - deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
 
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
  # ZeroXClem/Qwen2.5-7B-DistilPrism
14
 
15
- ZeroXClem/Qwen2.5-7B-DistilPrism is a merge of the following models using [mergekit](https://github.com/cg123/mergekit):
16
- * [huihui-ai/DeepSeek-R1-Distill-Qwen-7B-abliterated-v2](https://huggingface.co/huihui-ai/DeepSeek-R1-Distill-Qwen-7B-abliterated-v2)
17
- * [mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1](https://huggingface.co/mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1)
18
- * [Triangle104/DSR1-Distill-Qwen-7B-RP](https://huggingface.co/Triangle104/DSR1-Distill-Qwen-7B-RP)
19
- * [deepseek-ai/DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B)
 
 
 
 
 
20
 
21
- ## 🧩 Configuration
 
 
22
 
23
  ```yaml
24
  # Merge configuration for ZeroXClem/Qwen2.5-7B-DistilPrism using Model Stock
25
-
26
  name: ZeroXClem-Qwen2.5-7B-DistilPrism
27
  merge_method: model_stock
28
  base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
@@ -44,6 +60,138 @@ models:
44
  - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
45
  parameters:
46
  weight: 0.25
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
 
 
48
 
49
- ```
 
 
 
 
8
  - mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1
9
  - Triangle104/DSR1-Distill-Qwen-7B-RP
10
  - deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
11
+ language:
12
+ - en
13
+ - zh
14
+ base_model:
15
+ - deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
16
+ - huihui-ai/DeepSeek-R1-Distill-Qwen-7B-abliterated-v2
17
+ - mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1
18
+ - Triangle104/DSR1-Distill-Qwen-7B-RP
19
+ pipeline_tag: text-classification
20
+ library_name: transformers
21
  ---
22
 
23
  # ZeroXClem/Qwen2.5-7B-DistilPrism
24
 
25
+ **Qwen2.5-7B-DistilPrism** is a **distillation / reasoning focused model merge** designed to combine multiple variations of DeepSeek-R1 distillations, resulting in a **refined, high-performance language model**. Utilizing the **Model Stock** merge method, this fusion captures the best attributes of **DeepSeek-R1-Distill-Qwen-7B** and its improved derivatives.
26
+
27
+ ## 🚀 Merged Models
28
+
29
+ This model is a weighted merge of the following:
30
+
31
+ - [**huihui-ai/DeepSeek-R1-Distill-Qwen-7B-abliterated-v2**](https://huggingface.co/huihui-ai/DeepSeek-R1-Distill-Qwen-7B-abliterated-v2): An uncensored distillation of DeepSeek-R1, optimized to remove refusals and improve usability.
32
+ - [**mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1**](https://huggingface.co/mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1): A refined distillation that improves accuracy and robustness across various benchmarks.
33
+ - [**Triangle104/DSR1-Distill-Qwen-7B-RP**](https://huggingface.co/Triangle104/DSR1-Distill-Qwen-7B-RP): A composite merge of various distilled DeepSeek variants, serving as an essential ingredient for performance tuning.
34
+ - [**deepseek-ai/DeepSeek-R1-Distill-Qwen-7B**](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B): The foundation of this merge, representing the distilled form of DeepSeek-R1 optimized for efficiency and strong reasoning capabilities.
35
 
36
+ ## 🧩 Merge Configuration
37
+
38
+ The following **YAML configuration** defines how these models were combined using **Model Stock**, ensuring **balanced contributions** from each source:
39
 
40
  ```yaml
41
  # Merge configuration for ZeroXClem/Qwen2.5-7B-DistilPrism using Model Stock
 
42
  name: ZeroXClem-Qwen2.5-7B-DistilPrism
43
  merge_method: model_stock
44
  base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
 
60
  - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
61
  parameters:
62
  weight: 0.25
63
+ ```
64
+
65
+ ### 🔑 Key Parameters
66
+
67
+ - **Normalization & Rescaling**: Ensures weight distributions remain balanced across all components.
68
+ - **Model Stock Merge Method**: Optimizes contribution from each model to retain the best attributes.
69
+ - **Weighted Blending**: The **abliterated** and **re-distilled** models contribute the most, refining both alignment and general usability.
70
+
71
+ ---
72
+
73
+ ## 🗣️ Inference
74
+
75
+ You can use the model for text generation as follows:
76
+
77
+ ### Ollama
78
+
79
+ **[Quickstart to Ollama Guide Here](https://aidev.zeroxclem.com/blog/08-setting-up-ollama)** I recommend ollama for daily driver applications, as it supports thinkking tags.
80
+
81
+ ```bash
82
+ ollama run hf.co/ZeroXClem/Qwen2.5-7B-DistilPrism
83
+
84
+ # If you are using quants, just copy the url and replace 'huggingface.co/' with 'hf.co/' followed by name of quant.
85
+ ```
86
+
87
+ ### Transformers
88
+
89
+ ```python
90
+ from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
91
+ import torch
92
+
93
+ # Define the model name
94
+ model_name = "ZeroXClem/Qwen2.5-7B-DistilPrism"
95
+
96
+ # Load the tokenizer
97
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
98
+
99
+ # Load the model
100
+ model = AutoModelForCausalLM.from_pretrained(
101
+ model_name,
102
+ torch_dtype=torch.bfloat16,
103
+ device_map="auto"
104
+ )
105
+
106
+ # Initialize the pipeline
107
+ text_generator = pipeline(
108
+ "text-generation",
109
+ model=model,
110
+ tokenizer=tokenizer,
111
+ torch_dtype=torch.bfloat16,
112
+ device_map="auto"
113
+ )
114
+
115
+ # Define the input prompt
116
+ prompt = "Explain the significance of artificial intelligence in modern healthcare."
117
+
118
+ # Generate the output
119
+ outputs = text_generator(
120
+ prompt,
121
+ max_new_tokens=150,
122
+ do_sample=True,
123
+ temperature=0.7,
124
+ top_k=50,
125
+ top_p=0.95
126
+ )
127
+
128
+ # Print the generated text
129
+ print(outputs[0]["generated_text"])
130
+ ```
131
+
132
+ ---
133
+
134
+ ## 🎯 Use Case & Applications
135
+
136
+ **Qwen2.5-7B-DistilPrism** is designed for **efficient, high-quality text generation** with strong reasoning capabilities. It is well-suited for:
137
+
138
+ - **Advanced Reasoning & Problem Solving**: Excels in logic-heavy tasks and multi-step reasoning problems.
139
+ - **Conversational AI**: Optimized for **fluid, responsive dialogue**, reducing refusals and improving engagement.
140
+ - **Mathematical & Scientific Computation**: Enhanced **math & code generation abilities** compared to standard distillations.
141
+ - **Content Creation & Summarization**: Generates coherent and **contextually rich** text suitable for various applications.
142
+
143
+ ---
144
+
145
+ ## 📜 License
146
+
147
+ This model is released under the **MIT License**.
148
+
149
+ ---
150
+
151
+ ## 📊 Benchmark Results (Coming Soon)
152
+
153
+ We are currently in the process of **quantizing and benchmarking** this model. Stay tuned for performance updates across:
154
+
155
+ - **IFEval (0-Shot)**
156
+ - **BBH (3-Shot)**
157
+ - **MATH (4-Shot)**
158
+ - **GPQA (0-Shot)**
159
+ - **MuSR (0-Shot)**
160
+ - **MMLU-PRO (5-Shot)**
161
+
162
+ ---
163
+
164
+ ## 💡 Tags
165
+
166
+ - `merge`
167
+ - `mergekit`
168
+ - `model_stock`
169
+ - `DeepSeek-R1`
170
+ - `Distillation`
171
+ - `abliterated`
172
+ - `re-distilled`
173
+ - `DeepSeek-R1-Distill-Qwen-7B`
174
+
175
+ ---
176
+
177
+ ## 🙏 Special Thanks
178
+
179
+ This project wouldn't be possible without the incredible contributions from:
180
+
181
+ - **[@huihui-ai](https://huggingface.co/huihui-ai)** – For developing **DeepSeek-R1-Distill-Qwen-7B-abliterated-v2**, a bold step towards improving model alignment.
182
+ - **[@mobiuslabsgmbh](https://huggingface.co/mobiuslabsgmbh)** – For refining distillation techniques with **DeepSeek-R1-ReDistill-Qwen-7B-v1.1**.
183
+ - **[@Triangle104](https://huggingface.co/Triangle104)** – For crafting innovative merges like **DSR1-Distill-Qwen-7B-RP**, an essential component in this blend.
184
+ - **[@deepseek-ai](https://huggingface.co/deepseek-ai)** – For open-sourcing **DeepSeek-R1-Distill-Qwen-7B**, a foundation for reasoning advancements.
185
+
186
+ And a heartfelt **thank you** to everyone in the **🤗 & Open-Source AI community** for their continued research, testing, and support. 💜🚀
187
+
188
+ ---
189
+
190
+ Let me know if you want to add any personal shoutouts or modifications! 😊
191
 
192
+ # 🔗 Additional Resources
193
 
194
+ - [Hugging Face Model Card](https://huggingface.co/ZeroXClem/Qwen2.5-7B-DistilPrism)
195
+ - [MergeKit Repository](https://github.com/ZeroXClem/mergekit)
196
+ - [DeepSeek AI Homepage](https://huggingface.co/deepseek-ai)
197
+ - [Open LLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)