Update README.md
Browse files
README.md
CHANGED
@@ -31,10 +31,11 @@ Zurich 1.5B GammaCorpus v2-10k is a fine-tune of Alibaba's **Qwen 2.5 1.5B Instr
|
|
31 |
- **Base Model:** [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct)
|
32 |
- **Type:** Causal Language Models
|
33 |
- **Architecture:** Transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
|
34 |
-
- **Number of Parameters:**
|
35 |
-
- **Number of Paramaters (Non-Embedding)
|
36 |
- **Number of Layers:** 28
|
37 |
-
- **Number of Attention Heads (GQA):**
|
|
|
38 |
|
39 |
## Training Details
|
40 |
|
|
|
31 |
- **Base Model:** [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct)
|
32 |
- **Type:** Causal Language Models
|
33 |
- **Architecture:** Transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
|
34 |
+
- **Number of Parameters:** 1.54B
|
35 |
+
- **Number of Paramaters (Non-Embedding)**: 1.31B
|
36 |
- **Number of Layers:** 28
|
37 |
+
- **Number of Attention Heads (GQA):** 12 for Q and 2 for KV
|
38 |
+
|
39 |
|
40 |
## Training Details
|
41 |
|