rajabmondal commited on
Commit
45dfe9e
·
verified ·
1 Parent(s): 9b59dcc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -4
README.md CHANGED
@@ -136,15 +136,16 @@ The NT-Java-1.1B model has been trained on publicly available datasets and is of
136
 
137
  ## Model
138
 
139
- - **Architecture:** GPT-2 model with Multi-Query Attention and Fill-in-the-Middle objective
140
- - **Pretraining steps:** 50k
141
- - **Pretraining tokens:** 22 Billion
 
142
  - **Precision:** bfloat16
143
 
144
  ## Hardware
145
 
146
  - **GPUs:** 6 NVIDIA A100 80GB
147
- - **Training time:** 4 days
148
 
149
  ## Software
150
 
 
136
 
137
  ## Model
138
 
139
+ - **Architecture:** GPT-2 model with Multi-Query Attention and Fill-in-the-Middle objective.
140
+ - **Pretraining steps:** 100k
141
+ - **Context length:** 8K tokens
142
+ - **Pretraining tokens:** 22 billion
143
  - **Precision:** bfloat16
144
 
145
  ## Hardware
146
 
147
  - **GPUs:** 6 NVIDIA A100 80GB
148
+ - **Training time:** 10 days
149
 
150
  ## Software
151