Update README.md
Browse files
README.md
CHANGED
@@ -136,15 +136,16 @@ The NT-Java-1.1B model has been trained on publicly available datasets and is of
|
|
136 |
|
137 |
## Model
|
138 |
|
139 |
-
- **Architecture:** GPT-2 model with Multi-Query Attention and Fill-in-the-Middle objective
|
140 |
-
- **Pretraining steps:**
|
141 |
-
- **
|
|
|
142 |
- **Precision:** bfloat16
|
143 |
|
144 |
## Hardware
|
145 |
|
146 |
- **GPUs:** 6 NVIDIA A100 80GB
|
147 |
-
- **Training time:**
|
148 |
|
149 |
## Software
|
150 |
|
|
|
136 |
|
137 |
## Model
|
138 |
|
139 |
+
- **Architecture:** GPT-2 model with Multi-Query Attention and Fill-in-the-Middle objective.
|
140 |
+
- **Pretraining steps:** 100k
|
141 |
+
- **Context length:** 8K tokens
|
142 |
+
- **Pretraining tokens:** 22 billion
|
143 |
- **Precision:** bfloat16
|
144 |
|
145 |
## Hardware
|
146 |
|
147 |
- **GPUs:** 6 NVIDIA A100 80GB
|
148 |
+
- **Training time:** 10 days
|
149 |
|
150 |
## Software
|
151 |
|