pere commited on
Commit
e268c1b
·
1 Parent(s): 51a6138

Saving train state of step 1000

Browse files
Files changed (47) hide show
  1. added_tokens.json +1611 -0
  2. checkpoint-1000/added_tokens.json +1611 -0
  3. checkpoint-1000/config.json +288 -0
  4. checkpoint-1000/flax_model.msgpack +3 -0
  5. checkpoint-1000/generation_config.json +271 -0
  6. checkpoint-1000/merges.txt +0 -0
  7. checkpoint-1000/preprocessor_config.json +14 -0
  8. checkpoint-1000/special_tokens_map.json +139 -0
  9. checkpoint-1000/tokenizer_config.json +0 -0
  10. checkpoint-1000/train_state.msgpack +3 -0
  11. checkpoint-1000/vocab.json +0 -0
  12. config.json +288 -0
  13. create_student_model.py +226 -0
  14. distil-small-init/added_tokens.json +1609 -0
  15. distil-small-init/config.json +389 -0
  16. distil-small-init/flax_model.msgpack +3 -0
  17. distil-small-init/generation_config.json +269 -0
  18. distil-small-init/merges.txt +0 -0
  19. distil-small-init/normalizer.json +1742 -0
  20. distil-small-init/preprocessor_config.json +14 -0
  21. distil-small-init/special_tokens_map.json +139 -0
  22. distil-small-init/tokenizer_config.json +0 -0
  23. distil-small-init/vocab.json +0 -0
  24. generation_config.json +271 -0
  25. merges.txt +0 -0
  26. nb-distil-large-init/added_tokens.json +1611 -0
  27. nb-distil-large-init/config.json +288 -0
  28. nb-distil-large-init/flax_model.msgpack +3 -0
  29. nb-distil-large-init/generation_config.json +270 -0
  30. nb-distil-large-init/merges.txt +0 -0
  31. nb-distil-large-init/preprocessor_config.json +14 -0
  32. nb-distil-large-init/special_tokens_map.json +139 -0
  33. nb-distil-large-init/tokenizer_config.json +0 -0
  34. nb-distil-large-init/vocab.json +0 -0
  35. preprocessor_config.json +14 -0
  36. run_distillation.py +2156 -0
  37. run_distillation_debug.py +2162 -0
  38. run_distillation_nodes.py +2168 -0
  39. run_large_training.sh +38 -0
  40. run_large_training_debug.sh +38 -0
  41. run_large_training_lr1e4.sh +41 -0
  42. run_large_training_lr6e4.sh +41 -0
  43. run_large_training_recover.sh +41 -0
  44. special_tokens_map.json +139 -0
  45. tokenizer.json +0 -0
  46. tokenizer_config.json +0 -0
  47. vocab.json +0 -0
added_tokens.json ADDED
@@ -0,0 +1,1611 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "<|0.00|>": 50365,
3
+ "<|0.02|>": 50366,
4
+ "<|0.04|>": 50367,
5
+ "<|0.06|>": 50368,
6
+ "<|0.08|>": 50369,
7
+ "<|0.10|>": 50370,
8
+ "<|0.12|>": 50371,
9
+ "<|0.14|>": 50372,
10
+ "<|0.16|>": 50373,
11
+ "<|0.18|>": 50374,
12
+ "<|0.20|>": 50375,
13
+ "<|0.22|>": 50376,
14
+ "<|0.24|>": 50377,
15
+ "<|0.26|>": 50378,
16
+ "<|0.28|>": 50379,
17
+ "<|0.30|>": 50380,
18
+ "<|0.32|>": 50381,
19
+ "<|0.34|>": 50382,
20
+ "<|0.36|>": 50383,
21
+ "<|0.38|>": 50384,
22
+ "<|0.40|>": 50385,
23
+ "<|0.42|>": 50386,
24
+ "<|0.44|>": 50387,
25
+ "<|0.46|>": 50388,
26
+ "<|0.48|>": 50389,
27
+ "<|0.50|>": 50390,
28
+ "<|0.52|>": 50391,
29
+ "<|0.54|>": 50392,
30
+ "<|0.56|>": 50393,
31
+ "<|0.58|>": 50394,
32
+ "<|0.60|>": 50395,
33
+ "<|0.62|>": 50396,
34
+ "<|0.64|>": 50397,
35
+ "<|0.66|>": 50398,
36
+ "<|0.68|>": 50399,
37
+ "<|0.70|>": 50400,
38
+ "<|0.72|>": 50401,
39
+ "<|0.74|>": 50402,
40
+ "<|0.76|>": 50403,
41
+ "<|0.78|>": 50404,
42
+ "<|0.80|>": 50405,
43
+ "<|0.82|>": 50406,
44
+ "<|0.84|>": 50407,
45
+ "<|0.86|>": 50408,
46
+ "<|0.88|>": 50409,
47
+ "<|0.90|>": 50410,
48
+ "<|0.92|>": 50411,
49
+ "<|0.94|>": 50412,
50
+ "<|0.96|>": 50413,
51
+ "<|0.98|>": 50414,
52
+ "<|1.00|>": 50415,
53
+ "<|1.02|>": 50416,
54
+ "<|1.04|>": 50417,
55
+ "<|1.06|>": 50418,
56
+ "<|1.08|>": 50419,
57
+ "<|1.10|>": 50420,
58
+ "<|1.12|>": 50421,
59
+ "<|1.14|>": 50422,
60
+ "<|1.16|>": 50423,
61
+ "<|1.18|>": 50424,
62
+ "<|1.20|>": 50425,
63
+ "<|1.22|>": 50426,
64
+ "<|1.24|>": 50427,
65
+ "<|1.26|>": 50428,
66
+ "<|1.28|>": 50429,
67
+ "<|1.30|>": 50430,
68
+ "<|1.32|>": 50431,
69
+ "<|1.34|>": 50432,
70
+ "<|1.36|>": 50433,
71
+ "<|1.38|>": 50434,
72
+ "<|1.40|>": 50435,
73
+ "<|1.42|>": 50436,
74
+ "<|1.44|>": 50437,
75
+ "<|1.46|>": 50438,
76
+ "<|1.48|>": 50439,
77
+ "<|1.50|>": 50440,
78
+ "<|1.52|>": 50441,
79
+ "<|1.54|>": 50442,
80
+ "<|1.56|>": 50443,
81
+ "<|1.58|>": 50444,
82
+ "<|1.60|>": 50445,
83
+ "<|1.62|>": 50446,
84
+ "<|1.64|>": 50447,
85
+ "<|1.66|>": 50448,
86
+ "<|1.68|>": 50449,
87
+ "<|1.70|>": 50450,
88
+ "<|1.72|>": 50451,
89
+ "<|1.74|>": 50452,
90
+ "<|1.76|>": 50453,
91
+ "<|1.78|>": 50454,
92
+ "<|1.80|>": 50455,
93
+ "<|1.82|>": 50456,
94
+ "<|1.84|>": 50457,
95
+ "<|1.86|>": 50458,
96
+ "<|1.88|>": 50459,
97
+ "<|1.90|>": 50460,
98
+ "<|1.92|>": 50461,
99
+ "<|1.94|>": 50462,
100
+ "<|1.96|>": 50463,
101
+ "<|1.98|>": 50464,
102
+ "<|10.00|>": 50865,
103
+ "<|10.02|>": 50866,
104
+ "<|10.04|>": 50867,
105
+ "<|10.06|>": 50868,
106
+ "<|10.08|>": 50869,
107
+ "<|10.10|>": 50870,
108
+ "<|10.12|>": 50871,
109
+ "<|10.14|>": 50872,
110
+ "<|10.16|>": 50873,
111
+ "<|10.18|>": 50874,
112
+ "<|10.20|>": 50875,
113
+ "<|10.22|>": 50876,
114
+ "<|10.24|>": 50877,
115
+ "<|10.26|>": 50878,
116
+ "<|10.28|>": 50879,
117
+ "<|10.30|>": 50880,
118
+ "<|10.32|>": 50881,
119
+ "<|10.34|>": 50882,
120
+ "<|10.36|>": 50883,
121
+ "<|10.38|>": 50884,
122
+ "<|10.40|>": 50885,
123
+ "<|10.42|>": 50886,
124
+ "<|10.44|>": 50887,
125
+ "<|10.46|>": 50888,
126
+ "<|10.48|>": 50889,
127
+ "<|10.50|>": 50890,
128
+ "<|10.52|>": 50891,
129
+ "<|10.54|>": 50892,
130
+ "<|10.56|>": 50893,
131
+ "<|10.58|>": 50894,
132
+ "<|10.60|>": 50895,
133
+ "<|10.62|>": 50896,
134
+ "<|10.64|>": 50897,
135
+ "<|10.66|>": 50898,
136
+ "<|10.68|>": 50899,
137
+ "<|10.70|>": 50900,
138
+ "<|10.72|>": 50901,
139
+ "<|10.74|>": 50902,
140
+ "<|10.76|>": 50903,
141
+ "<|10.78|>": 50904,
142
+ "<|10.80|>": 50905,
143
+ "<|10.82|>": 50906,
144
+ "<|10.84|>": 50907,
145
+ "<|10.86|>": 50908,
146
+ "<|10.88|>": 50909,
147
+ "<|10.90|>": 50910,
148
+ "<|10.92|>": 50911,
149
+ "<|10.94|>": 50912,
150
+ "<|10.96|>": 50913,
151
+ "<|10.98|>": 50914,
152
+ "<|11.00|>": 50915,
153
+ "<|11.02|>": 50916,
154
+ "<|11.04|>": 50917,
155
+ "<|11.06|>": 50918,
156
+ "<|11.08|>": 50919,
157
+ "<|11.10|>": 50920,
158
+ "<|11.12|>": 50921,
159
+ "<|11.14|>": 50922,
160
+ "<|11.16|>": 50923,
161
+ "<|11.18|>": 50924,
162
+ "<|11.20|>": 50925,
163
+ "<|11.22|>": 50926,
164
+ "<|11.24|>": 50927,
165
+ "<|11.26|>": 50928,
166
+ "<|11.28|>": 50929,
167
+ "<|11.30|>": 50930,
168
+ "<|11.32|>": 50931,
169
+ "<|11.34|>": 50932,
170
+ "<|11.36|>": 50933,
171
+ "<|11.38|>": 50934,
172
+ "<|11.40|>": 50935,
173
+ "<|11.42|>": 50936,
174
+ "<|11.44|>": 50937,
175
+ "<|11.46|>": 50938,
176
+ "<|11.48|>": 50939,
177
+ "<|11.50|>": 50940,
178
+ "<|11.52|>": 50941,
179
+ "<|11.54|>": 50942,
180
+ "<|11.56|>": 50943,
181
+ "<|11.58|>": 50944,
182
+ "<|11.60|>": 50945,
183
+ "<|11.62|>": 50946,
184
+ "<|11.64|>": 50947,
185
+ "<|11.66|>": 50948,
186
+ "<|11.68|>": 50949,
187
+ "<|11.70|>": 50950,
188
+ "<|11.72|>": 50951,
189
+ "<|11.74|>": 50952,
190
+ "<|11.76|>": 50953,
191
+ "<|11.78|>": 50954,
192
+ "<|11.80|>": 50955,
193
+ "<|11.82|>": 50956,
194
+ "<|11.84|>": 50957,
195
+ "<|11.86|>": 50958,
196
+ "<|11.88|>": 50959,
197
+ "<|11.90|>": 50960,
198
+ "<|11.92|>": 50961,
199
+ "<|11.94|>": 50962,
200
+ "<|11.96|>": 50963,
201
+ "<|11.98|>": 50964,
202
+ "<|12.00|>": 50965,
203
+ "<|12.02|>": 50966,
204
+ "<|12.04|>": 50967,
205
+ "<|12.06|>": 50968,
206
+ "<|12.08|>": 50969,
207
+ "<|12.10|>": 50970,
208
+ "<|12.12|>": 50971,
209
+ "<|12.14|>": 50972,
210
+ "<|12.16|>": 50973,
211
+ "<|12.18|>": 50974,
212
+ "<|12.20|>": 50975,
213
+ "<|12.22|>": 50976,
214
+ "<|12.24|>": 50977,
215
+ "<|12.26|>": 50978,
216
+ "<|12.28|>": 50979,
217
+ "<|12.30|>": 50980,
218
+ "<|12.32|>": 50981,
219
+ "<|12.34|>": 50982,
220
+ "<|12.36|>": 50983,
221
+ "<|12.38|>": 50984,
222
+ "<|12.40|>": 50985,
223
+ "<|12.42|>": 50986,
224
+ "<|12.44|>": 50987,
225
+ "<|12.46|>": 50988,
226
+ "<|12.48|>": 50989,
227
+ "<|12.50|>": 50990,
228
+ "<|12.52|>": 50991,
229
+ "<|12.54|>": 50992,
230
+ "<|12.56|>": 50993,
231
+ "<|12.58|>": 50994,
232
+ "<|12.60|>": 50995,
233
+ "<|12.62|>": 50996,
234
+ "<|12.64|>": 50997,
235
+ "<|12.66|>": 50998,
236
+ "<|12.68|>": 50999,
237
+ "<|12.70|>": 51000,
238
+ "<|12.72|>": 51001,
239
+ "<|12.74|>": 51002,
240
+ "<|12.76|>": 51003,
241
+ "<|12.78|>": 51004,
242
+ "<|12.80|>": 51005,
243
+ "<|12.82|>": 51006,
244
+ "<|12.84|>": 51007,
245
+ "<|12.86|>": 51008,
246
+ "<|12.88|>": 51009,
247
+ "<|12.90|>": 51010,
248
+ "<|12.92|>": 51011,
249
+ "<|12.94|>": 51012,
250
+ "<|12.96|>": 51013,
251
+ "<|12.98|>": 51014,
252
+ "<|13.00|>": 51015,
253
+ "<|13.02|>": 51016,
254
+ "<|13.04|>": 51017,
255
+ "<|13.06|>": 51018,
256
+ "<|13.08|>": 51019,
257
+ "<|13.10|>": 51020,
258
+ "<|13.12|>": 51021,
259
+ "<|13.14|>": 51022,
260
+ "<|13.16|>": 51023,
261
+ "<|13.18|>": 51024,
262
+ "<|13.20|>": 51025,
263
+ "<|13.22|>": 51026,
264
+ "<|13.24|>": 51027,
265
+ "<|13.26|>": 51028,
266
+ "<|13.28|>": 51029,
267
+ "<|13.30|>": 51030,
268
+ "<|13.32|>": 51031,
269
+ "<|13.34|>": 51032,
270
+ "<|13.36|>": 51033,
271
+ "<|13.38|>": 51034,
272
+ "<|13.40|>": 51035,
273
+ "<|13.42|>": 51036,
274
+ "<|13.44|>": 51037,
275
+ "<|13.46|>": 51038,
276
+ "<|13.48|>": 51039,
277
+ "<|13.50|>": 51040,
278
+ "<|13.52|>": 51041,
279
+ "<|13.54|>": 51042,
280
+ "<|13.56|>": 51043,
281
+ "<|13.58|>": 51044,
282
+ "<|13.60|>": 51045,
283
+ "<|13.62|>": 51046,
284
+ "<|13.64|>": 51047,
285
+ "<|13.66|>": 51048,
286
+ "<|13.68|>": 51049,
287
+ "<|13.70|>": 51050,
288
+ "<|13.72|>": 51051,
289
+ "<|13.74|>": 51052,
290
+ "<|13.76|>": 51053,
291
+ "<|13.78|>": 51054,
292
+ "<|13.80|>": 51055,
293
+ "<|13.82|>": 51056,
294
+ "<|13.84|>": 51057,
295
+ "<|13.86|>": 51058,
296
+ "<|13.88|>": 51059,
297
+ "<|13.90|>": 51060,
298
+ "<|13.92|>": 51061,
299
+ "<|13.94|>": 51062,
300
+ "<|13.96|>": 51063,
301
+ "<|13.98|>": 51064,
302
+ "<|14.00|>": 51065,
303
+ "<|14.02|>": 51066,
304
+ "<|14.04|>": 51067,
305
+ "<|14.06|>": 51068,
306
+ "<|14.08|>": 51069,
307
+ "<|14.10|>": 51070,
308
+ "<|14.12|>": 51071,
309
+ "<|14.14|>": 51072,
310
+ "<|14.16|>": 51073,
311
+ "<|14.18|>": 51074,
312
+ "<|14.20|>": 51075,
313
+ "<|14.22|>": 51076,
314
+ "<|14.24|>": 51077,
315
+ "<|14.26|>": 51078,
316
+ "<|14.28|>": 51079,
317
+ "<|14.30|>": 51080,
318
+ "<|14.32|>": 51081,
319
+ "<|14.34|>": 51082,
320
+ "<|14.36|>": 51083,
321
+ "<|14.38|>": 51084,
322
+ "<|14.40|>": 51085,
323
+ "<|14.42|>": 51086,
324
+ "<|14.44|>": 51087,
325
+ "<|14.46|>": 51088,
326
+ "<|14.48|>": 51089,
327
+ "<|14.50|>": 51090,
328
+ "<|14.52|>": 51091,
329
+ "<|14.54|>": 51092,
330
+ "<|14.56|>": 51093,
331
+ "<|14.58|>": 51094,
332
+ "<|14.60|>": 51095,
333
+ "<|14.62|>": 51096,
334
+ "<|14.64|>": 51097,
335
+ "<|14.66|>": 51098,
336
+ "<|14.68|>": 51099,
337
+ "<|14.70|>": 51100,
338
+ "<|14.72|>": 51101,
339
+ "<|14.74|>": 51102,
340
+ "<|14.76|>": 51103,
341
+ "<|14.78|>": 51104,
342
+ "<|14.80|>": 51105,
343
+ "<|14.82|>": 51106,
344
+ "<|14.84|>": 51107,
345
+ "<|14.86|>": 51108,
346
+ "<|14.88|>": 51109,
347
+ "<|14.90|>": 51110,
348
+ "<|14.92|>": 51111,
349
+ "<|14.94|>": 51112,
350
+ "<|14.96|>": 51113,
351
+ "<|14.98|>": 51114,
352
+ "<|15.00|>": 51115,
353
+ "<|15.02|>": 51116,
354
+ "<|15.04|>": 51117,
355
+ "<|15.06|>": 51118,
356
+ "<|15.08|>": 51119,
357
+ "<|15.10|>": 51120,
358
+ "<|15.12|>": 51121,
359
+ "<|15.14|>": 51122,
360
+ "<|15.16|>": 51123,
361
+ "<|15.18|>": 51124,
362
+ "<|15.20|>": 51125,
363
+ "<|15.22|>": 51126,
364
+ "<|15.24|>": 51127,
365
+ "<|15.26|>": 51128,
366
+ "<|15.28|>": 51129,
367
+ "<|15.30|>": 51130,
368
+ "<|15.32|>": 51131,
369
+ "<|15.34|>": 51132,
370
+ "<|15.36|>": 51133,
371
+ "<|15.38|>": 51134,
372
+ "<|15.40|>": 51135,
373
+ "<|15.42|>": 51136,
374
+ "<|15.44|>": 51137,
375
+ "<|15.46|>": 51138,
376
+ "<|15.48|>": 51139,
377
+ "<|15.50|>": 51140,
378
+ "<|15.52|>": 51141,
379
+ "<|15.54|>": 51142,
380
+ "<|15.56|>": 51143,
381
+ "<|15.58|>": 51144,
382
+ "<|15.60|>": 51145,
383
+ "<|15.62|>": 51146,
384
+ "<|15.64|>": 51147,
385
+ "<|15.66|>": 51148,
386
+ "<|15.68|>": 51149,
387
+ "<|15.70|>": 51150,
388
+ "<|15.72|>": 51151,
389
+ "<|15.74|>": 51152,
390
+ "<|15.76|>": 51153,
391
+ "<|15.78|>": 51154,
392
+ "<|15.80|>": 51155,
393
+ "<|15.82|>": 51156,
394
+ "<|15.84|>": 51157,
395
+ "<|15.86|>": 51158,
396
+ "<|15.88|>": 51159,
397
+ "<|15.90|>": 51160,
398
+ "<|15.92|>": 51161,
399
+ "<|15.94|>": 51162,
400
+ "<|15.96|>": 51163,
401
+ "<|15.98|>": 51164,
402
+ "<|16.00|>": 51165,
403
+ "<|16.02|>": 51166,
404
+ "<|16.04|>": 51167,
405
+ "<|16.06|>": 51168,
406
+ "<|16.08|>": 51169,
407
+ "<|16.10|>": 51170,
408
+ "<|16.12|>": 51171,
409
+ "<|16.14|>": 51172,
410
+ "<|16.16|>": 51173,
411
+ "<|16.18|>": 51174,
412
+ "<|16.20|>": 51175,
413
+ "<|16.22|>": 51176,
414
+ "<|16.24|>": 51177,
415
+ "<|16.26|>": 51178,
416
+ "<|16.28|>": 51179,
417
+ "<|16.30|>": 51180,
418
+ "<|16.32|>": 51181,
419
+ "<|16.34|>": 51182,
420
+ "<|16.36|>": 51183,
421
+ "<|16.38|>": 51184,
422
+ "<|16.40|>": 51185,
423
+ "<|16.42|>": 51186,
424
+ "<|16.44|>": 51187,
425
+ "<|16.46|>": 51188,
426
+ "<|16.48|>": 51189,
427
+ "<|16.50|>": 51190,
428
+ "<|16.52|>": 51191,
429
+ "<|16.54|>": 51192,
430
+ "<|16.56|>": 51193,
431
+ "<|16.58|>": 51194,
432
+ "<|16.60|>": 51195,
433
+ "<|16.62|>": 51196,
434
+ "<|16.64|>": 51197,
435
+ "<|16.66|>": 51198,
436
+ "<|16.68|>": 51199,
437
+ "<|16.70|>": 51200,
438
+ "<|16.72|>": 51201,
439
+ "<|16.74|>": 51202,
440
+ "<|16.76|>": 51203,
441
+ "<|16.78|>": 51204,
442
+ "<|16.80|>": 51205,
443
+ "<|16.82|>": 51206,
444
+ "<|16.84|>": 51207,
445
+ "<|16.86|>": 51208,
446
+ "<|16.88|>": 51209,
447
+ "<|16.90|>": 51210,
448
+ "<|16.92|>": 51211,
449
+ "<|16.94|>": 51212,
450
+ "<|16.96|>": 51213,
451
+ "<|16.98|>": 51214,
452
+ "<|17.00|>": 51215,
453
+ "<|17.02|>": 51216,
454
+ "<|17.04|>": 51217,
455
+ "<|17.06|>": 51218,
456
+ "<|17.08|>": 51219,
457
+ "<|17.10|>": 51220,
458
+ "<|17.12|>": 51221,
459
+ "<|17.14|>": 51222,
460
+ "<|17.16|>": 51223,
461
+ "<|17.18|>": 51224,
462
+ "<|17.20|>": 51225,
463
+ "<|17.22|>": 51226,
464
+ "<|17.24|>": 51227,
465
+ "<|17.26|>": 51228,
466
+ "<|17.28|>": 51229,
467
+ "<|17.30|>": 51230,
468
+ "<|17.32|>": 51231,
469
+ "<|17.34|>": 51232,
470
+ "<|17.36|>": 51233,
471
+ "<|17.38|>": 51234,
472
+ "<|17.40|>": 51235,
473
+ "<|17.42|>": 51236,
474
+ "<|17.44|>": 51237,
475
+ "<|17.46|>": 51238,
476
+ "<|17.48|>": 51239,
477
+ "<|17.50|>": 51240,
478
+ "<|17.52|>": 51241,
479
+ "<|17.54|>": 51242,
480
+ "<|17.56|>": 51243,
481
+ "<|17.58|>": 51244,
482
+ "<|17.60|>": 51245,
483
+ "<|17.62|>": 51246,
484
+ "<|17.64|>": 51247,
485
+ "<|17.66|>": 51248,
486
+ "<|17.68|>": 51249,
487
+ "<|17.70|>": 51250,
488
+ "<|17.72|>": 51251,
489
+ "<|17.74|>": 51252,
490
+ "<|17.76|>": 51253,
491
+ "<|17.78|>": 51254,
492
+ "<|17.80|>": 51255,
493
+ "<|17.82|>": 51256,
494
+ "<|17.84|>": 51257,
495
+ "<|17.86|>": 51258,
496
+ "<|17.88|>": 51259,
497
+ "<|17.90|>": 51260,
498
+ "<|17.92|>": 51261,
499
+ "<|17.94|>": 51262,
500
+ "<|17.96|>": 51263,
501
+ "<|17.98|>": 51264,
502
+ "<|18.00|>": 51265,
503
+ "<|18.02|>": 51266,
504
+ "<|18.04|>": 51267,
505
+ "<|18.06|>": 51268,
506
+ "<|18.08|>": 51269,
507
+ "<|18.10|>": 51270,
508
+ "<|18.12|>": 51271,
509
+ "<|18.14|>": 51272,
510
+ "<|18.16|>": 51273,
511
+ "<|18.18|>": 51274,
512
+ "<|18.20|>": 51275,
513
+ "<|18.22|>": 51276,
514
+ "<|18.24|>": 51277,
515
+ "<|18.26|>": 51278,
516
+ "<|18.28|>": 51279,
517
+ "<|18.30|>": 51280,
518
+ "<|18.32|>": 51281,
519
+ "<|18.34|>": 51282,
520
+ "<|18.36|>": 51283,
521
+ "<|18.38|>": 51284,
522
+ "<|18.40|>": 51285,
523
+ "<|18.42|>": 51286,
524
+ "<|18.44|>": 51287,
525
+ "<|18.46|>": 51288,
526
+ "<|18.48|>": 51289,
527
+ "<|18.50|>": 51290,
528
+ "<|18.52|>": 51291,
529
+ "<|18.54|>": 51292,
530
+ "<|18.56|>": 51293,
531
+ "<|18.58|>": 51294,
532
+ "<|18.60|>": 51295,
533
+ "<|18.62|>": 51296,
534
+ "<|18.64|>": 51297,
535
+ "<|18.66|>": 51298,
536
+ "<|18.68|>": 51299,
537
+ "<|18.70|>": 51300,
538
+ "<|18.72|>": 51301,
539
+ "<|18.74|>": 51302,
540
+ "<|18.76|>": 51303,
541
+ "<|18.78|>": 51304,
542
+ "<|18.80|>": 51305,
543
+ "<|18.82|>": 51306,
544
+ "<|18.84|>": 51307,
545
+ "<|18.86|>": 51308,
546
+ "<|18.88|>": 51309,
547
+ "<|18.90|>": 51310,
548
+ "<|18.92|>": 51311,
549
+ "<|18.94|>": 51312,
550
+ "<|18.96|>": 51313,
551
+ "<|18.98|>": 51314,
552
+ "<|19.00|>": 51315,
553
+ "<|19.02|>": 51316,
554
+ "<|19.04|>": 51317,
555
+ "<|19.06|>": 51318,
556
+ "<|19.08|>": 51319,
557
+ "<|19.10|>": 51320,
558
+ "<|19.12|>": 51321,
559
+ "<|19.14|>": 51322,
560
+ "<|19.16|>": 51323,
561
+ "<|19.18|>": 51324,
562
+ "<|19.20|>": 51325,
563
+ "<|19.22|>": 51326,
564
+ "<|19.24|>": 51327,
565
+ "<|19.26|>": 51328,
566
+ "<|19.28|>": 51329,
567
+ "<|19.30|>": 51330,
568
+ "<|19.32|>": 51331,
569
+ "<|19.34|>": 51332,
570
+ "<|19.36|>": 51333,
571
+ "<|19.38|>": 51334,
572
+ "<|19.40|>": 51335,
573
+ "<|19.42|>": 51336,
574
+ "<|19.44|>": 51337,
575
+ "<|19.46|>": 51338,
576
+ "<|19.48|>": 51339,
577
+ "<|19.50|>": 51340,
578
+ "<|19.52|>": 51341,
579
+ "<|19.54|>": 51342,
580
+ "<|19.56|>": 51343,
581
+ "<|19.58|>": 51344,
582
+ "<|19.60|>": 51345,
583
+ "<|19.62|>": 51346,
584
+ "<|19.64|>": 51347,
585
+ "<|19.66|>": 51348,
586
+ "<|19.68|>": 51349,
587
+ "<|19.70|>": 51350,
588
+ "<|19.72|>": 51351,
589
+ "<|19.74|>": 51352,
590
+ "<|19.76|>": 51353,
591
+ "<|19.78|>": 51354,
592
+ "<|19.80|>": 51355,
593
+ "<|19.82|>": 51356,
594
+ "<|19.84|>": 51357,
595
+ "<|19.86|>": 51358,
596
+ "<|19.88|>": 51359,
597
+ "<|19.90|>": 51360,
598
+ "<|19.92|>": 51361,
599
+ "<|19.94|>": 51362,
600
+ "<|19.96|>": 51363,
601
+ "<|19.98|>": 51364,
602
+ "<|2.00|>": 50465,
603
+ "<|2.02|>": 50466,
604
+ "<|2.04|>": 50467,
605
+ "<|2.06|>": 50468,
606
+ "<|2.08|>": 50469,
607
+ "<|2.10|>": 50470,
608
+ "<|2.12|>": 50471,
609
+ "<|2.14|>": 50472,
610
+ "<|2.16|>": 50473,
611
+ "<|2.18|>": 50474,
612
+ "<|2.20|>": 50475,
613
+ "<|2.22|>": 50476,
614
+ "<|2.24|>": 50477,
615
+ "<|2.26|>": 50478,
616
+ "<|2.28|>": 50479,
617
+ "<|2.30|>": 50480,
618
+ "<|2.32|>": 50481,
619
+ "<|2.34|>": 50482,
620
+ "<|2.36|>": 50483,
621
+ "<|2.38|>": 50484,
622
+ "<|2.40|>": 50485,
623
+ "<|2.42|>": 50486,
624
+ "<|2.44|>": 50487,
625
+ "<|2.46|>": 50488,
626
+ "<|2.48|>": 50489,
627
+ "<|2.50|>": 50490,
628
+ "<|2.52|>": 50491,
629
+ "<|2.54|>": 50492,
630
+ "<|2.56|>": 50493,
631
+ "<|2.58|>": 50494,
632
+ "<|2.60|>": 50495,
633
+ "<|2.62|>": 50496,
634
+ "<|2.64|>": 50497,
635
+ "<|2.66|>": 50498,
636
+ "<|2.68|>": 50499,
637
+ "<|2.70|>": 50500,
638
+ "<|2.72|>": 50501,
639
+ "<|2.74|>": 50502,
640
+ "<|2.76|>": 50503,
641
+ "<|2.78|>": 50504,
642
+ "<|2.80|>": 50505,
643
+ "<|2.82|>": 50506,
644
+ "<|2.84|>": 50507,
645
+ "<|2.86|>": 50508,
646
+ "<|2.88|>": 50509,
647
+ "<|2.90|>": 50510,
648
+ "<|2.92|>": 50511,
649
+ "<|2.94|>": 50512,
650
+ "<|2.96|>": 50513,
651
+ "<|2.98|>": 50514,
652
+ "<|20.00|>": 51365,
653
+ "<|20.02|>": 51366,
654
+ "<|20.04|>": 51367,
655
+ "<|20.06|>": 51368,
656
+ "<|20.08|>": 51369,
657
+ "<|20.10|>": 51370,
658
+ "<|20.12|>": 51371,
659
+ "<|20.14|>": 51372,
660
+ "<|20.16|>": 51373,
661
+ "<|20.18|>": 51374,
662
+ "<|20.20|>": 51375,
663
+ "<|20.22|>": 51376,
664
+ "<|20.24|>": 51377,
665
+ "<|20.26|>": 51378,
666
+ "<|20.28|>": 51379,
667
+ "<|20.30|>": 51380,
668
+ "<|20.32|>": 51381,
669
+ "<|20.34|>": 51382,
670
+ "<|20.36|>": 51383,
671
+ "<|20.38|>": 51384,
672
+ "<|20.40|>": 51385,
673
+ "<|20.42|>": 51386,
674
+ "<|20.44|>": 51387,
675
+ "<|20.46|>": 51388,
676
+ "<|20.48|>": 51389,
677
+ "<|20.50|>": 51390,
678
+ "<|20.52|>": 51391,
679
+ "<|20.54|>": 51392,
680
+ "<|20.56|>": 51393,
681
+ "<|20.58|>": 51394,
682
+ "<|20.60|>": 51395,
683
+ "<|20.62|>": 51396,
684
+ "<|20.64|>": 51397,
685
+ "<|20.66|>": 51398,
686
+ "<|20.68|>": 51399,
687
+ "<|20.70|>": 51400,
688
+ "<|20.72|>": 51401,
689
+ "<|20.74|>": 51402,
690
+ "<|20.76|>": 51403,
691
+ "<|20.78|>": 51404,
692
+ "<|20.80|>": 51405,
693
+ "<|20.82|>": 51406,
694
+ "<|20.84|>": 51407,
695
+ "<|20.86|>": 51408,
696
+ "<|20.88|>": 51409,
697
+ "<|20.90|>": 51410,
698
+ "<|20.92|>": 51411,
699
+ "<|20.94|>": 51412,
700
+ "<|20.96|>": 51413,
701
+ "<|20.98|>": 51414,
702
+ "<|21.00|>": 51415,
703
+ "<|21.02|>": 51416,
704
+ "<|21.04|>": 51417,
705
+ "<|21.06|>": 51418,
706
+ "<|21.08|>": 51419,
707
+ "<|21.10|>": 51420,
708
+ "<|21.12|>": 51421,
709
+ "<|21.14|>": 51422,
710
+ "<|21.16|>": 51423,
711
+ "<|21.18|>": 51424,
712
+ "<|21.20|>": 51425,
713
+ "<|21.22|>": 51426,
714
+ "<|21.24|>": 51427,
715
+ "<|21.26|>": 51428,
716
+ "<|21.28|>": 51429,
717
+ "<|21.30|>": 51430,
718
+ "<|21.32|>": 51431,
719
+ "<|21.34|>": 51432,
720
+ "<|21.36|>": 51433,
721
+ "<|21.38|>": 51434,
722
+ "<|21.40|>": 51435,
723
+ "<|21.42|>": 51436,
724
+ "<|21.44|>": 51437,
725
+ "<|21.46|>": 51438,
726
+ "<|21.48|>": 51439,
727
+ "<|21.50|>": 51440,
728
+ "<|21.52|>": 51441,
729
+ "<|21.54|>": 51442,
730
+ "<|21.56|>": 51443,
731
+ "<|21.58|>": 51444,
732
+ "<|21.60|>": 51445,
733
+ "<|21.62|>": 51446,
734
+ "<|21.64|>": 51447,
735
+ "<|21.66|>": 51448,
736
+ "<|21.68|>": 51449,
737
+ "<|21.70|>": 51450,
738
+ "<|21.72|>": 51451,
739
+ "<|21.74|>": 51452,
740
+ "<|21.76|>": 51453,
741
+ "<|21.78|>": 51454,
742
+ "<|21.80|>": 51455,
743
+ "<|21.82|>": 51456,
744
+ "<|21.84|>": 51457,
745
+ "<|21.86|>": 51458,
746
+ "<|21.88|>": 51459,
747
+ "<|21.90|>": 51460,
748
+ "<|21.92|>": 51461,
749
+ "<|21.94|>": 51462,
750
+ "<|21.96|>": 51463,
751
+ "<|21.98|>": 51464,
752
+ "<|22.00|>": 51465,
753
+ "<|22.02|>": 51466,
754
+ "<|22.04|>": 51467,
755
+ "<|22.06|>": 51468,
756
+ "<|22.08|>": 51469,
757
+ "<|22.10|>": 51470,
758
+ "<|22.12|>": 51471,
759
+ "<|22.14|>": 51472,
760
+ "<|22.16|>": 51473,
761
+ "<|22.18|>": 51474,
762
+ "<|22.20|>": 51475,
763
+ "<|22.22|>": 51476,
764
+ "<|22.24|>": 51477,
765
+ "<|22.26|>": 51478,
766
+ "<|22.28|>": 51479,
767
+ "<|22.30|>": 51480,
768
+ "<|22.32|>": 51481,
769
+ "<|22.34|>": 51482,
770
+ "<|22.36|>": 51483,
771
+ "<|22.38|>": 51484,
772
+ "<|22.40|>": 51485,
773
+ "<|22.42|>": 51486,
774
+ "<|22.44|>": 51487,
775
+ "<|22.46|>": 51488,
776
+ "<|22.48|>": 51489,
777
+ "<|22.50|>": 51490,
778
+ "<|22.52|>": 51491,
779
+ "<|22.54|>": 51492,
780
+ "<|22.56|>": 51493,
781
+ "<|22.58|>": 51494,
782
+ "<|22.60|>": 51495,
783
+ "<|22.62|>": 51496,
784
+ "<|22.64|>": 51497,
785
+ "<|22.66|>": 51498,
786
+ "<|22.68|>": 51499,
787
+ "<|22.70|>": 51500,
788
+ "<|22.72|>": 51501,
789
+ "<|22.74|>": 51502,
790
+ "<|22.76|>": 51503,
791
+ "<|22.78|>": 51504,
792
+ "<|22.80|>": 51505,
793
+ "<|22.82|>": 51506,
794
+ "<|22.84|>": 51507,
795
+ "<|22.86|>": 51508,
796
+ "<|22.88|>": 51509,
797
+ "<|22.90|>": 51510,
798
+ "<|22.92|>": 51511,
799
+ "<|22.94|>": 51512,
800
+ "<|22.96|>": 51513,
801
+ "<|22.98|>": 51514,
802
+ "<|23.00|>": 51515,
803
+ "<|23.02|>": 51516,
804
+ "<|23.04|>": 51517,
805
+ "<|23.06|>": 51518,
806
+ "<|23.08|>": 51519,
807
+ "<|23.10|>": 51520,
808
+ "<|23.12|>": 51521,
809
+ "<|23.14|>": 51522,
810
+ "<|23.16|>": 51523,
811
+ "<|23.18|>": 51524,
812
+ "<|23.20|>": 51525,
813
+ "<|23.22|>": 51526,
814
+ "<|23.24|>": 51527,
815
+ "<|23.26|>": 51528,
816
+ "<|23.28|>": 51529,
817
+ "<|23.30|>": 51530,
818
+ "<|23.32|>": 51531,
819
+ "<|23.34|>": 51532,
820
+ "<|23.36|>": 51533,
821
+ "<|23.38|>": 51534,
822
+ "<|23.40|>": 51535,
823
+ "<|23.42|>": 51536,
824
+ "<|23.44|>": 51537,
825
+ "<|23.46|>": 51538,
826
+ "<|23.48|>": 51539,
827
+ "<|23.50|>": 51540,
828
+ "<|23.52|>": 51541,
829
+ "<|23.54|>": 51542,
830
+ "<|23.56|>": 51543,
831
+ "<|23.58|>": 51544,
832
+ "<|23.60|>": 51545,
833
+ "<|23.62|>": 51546,
834
+ "<|23.64|>": 51547,
835
+ "<|23.66|>": 51548,
836
+ "<|23.68|>": 51549,
837
+ "<|23.70|>": 51550,
838
+ "<|23.72|>": 51551,
839
+ "<|23.74|>": 51552,
840
+ "<|23.76|>": 51553,
841
+ "<|23.78|>": 51554,
842
+ "<|23.80|>": 51555,
843
+ "<|23.82|>": 51556,
844
+ "<|23.84|>": 51557,
845
+ "<|23.86|>": 51558,
846
+ "<|23.88|>": 51559,
847
+ "<|23.90|>": 51560,
848
+ "<|23.92|>": 51561,
849
+ "<|23.94|>": 51562,
850
+ "<|23.96|>": 51563,
851
+ "<|23.98|>": 51564,
852
+ "<|24.00|>": 51565,
853
+ "<|24.02|>": 51566,
854
+ "<|24.04|>": 51567,
855
+ "<|24.06|>": 51568,
856
+ "<|24.08|>": 51569,
857
+ "<|24.10|>": 51570,
858
+ "<|24.12|>": 51571,
859
+ "<|24.14|>": 51572,
860
+ "<|24.16|>": 51573,
861
+ "<|24.18|>": 51574,
862
+ "<|24.20|>": 51575,
863
+ "<|24.22|>": 51576,
864
+ "<|24.24|>": 51577,
865
+ "<|24.26|>": 51578,
866
+ "<|24.28|>": 51579,
867
+ "<|24.30|>": 51580,
868
+ "<|24.32|>": 51581,
869
+ "<|24.34|>": 51582,
870
+ "<|24.36|>": 51583,
871
+ "<|24.38|>": 51584,
872
+ "<|24.40|>": 51585,
873
+ "<|24.42|>": 51586,
874
+ "<|24.44|>": 51587,
875
+ "<|24.46|>": 51588,
876
+ "<|24.48|>": 51589,
877
+ "<|24.50|>": 51590,
878
+ "<|24.52|>": 51591,
879
+ "<|24.54|>": 51592,
880
+ "<|24.56|>": 51593,
881
+ "<|24.58|>": 51594,
882
+ "<|24.60|>": 51595,
883
+ "<|24.62|>": 51596,
884
+ "<|24.64|>": 51597,
885
+ "<|24.66|>": 51598,
886
+ "<|24.68|>": 51599,
887
+ "<|24.70|>": 51600,
888
+ "<|24.72|>": 51601,
889
+ "<|24.74|>": 51602,
890
+ "<|24.76|>": 51603,
891
+ "<|24.78|>": 51604,
892
+ "<|24.80|>": 51605,
893
+ "<|24.82|>": 51606,
894
+ "<|24.84|>": 51607,
895
+ "<|24.86|>": 51608,
896
+ "<|24.88|>": 51609,
897
+ "<|24.90|>": 51610,
898
+ "<|24.92|>": 51611,
899
+ "<|24.94|>": 51612,
900
+ "<|24.96|>": 51613,
901
+ "<|24.98|>": 51614,
902
+ "<|25.00|>": 51615,
903
+ "<|25.02|>": 51616,
904
+ "<|25.04|>": 51617,
905
+ "<|25.06|>": 51618,
906
+ "<|25.08|>": 51619,
907
+ "<|25.10|>": 51620,
908
+ "<|25.12|>": 51621,
909
+ "<|25.14|>": 51622,
910
+ "<|25.16|>": 51623,
911
+ "<|25.18|>": 51624,
912
+ "<|25.20|>": 51625,
913
+ "<|25.22|>": 51626,
914
+ "<|25.24|>": 51627,
915
+ "<|25.26|>": 51628,
916
+ "<|25.28|>": 51629,
917
+ "<|25.30|>": 51630,
918
+ "<|25.32|>": 51631,
919
+ "<|25.34|>": 51632,
920
+ "<|25.36|>": 51633,
921
+ "<|25.38|>": 51634,
922
+ "<|25.40|>": 51635,
923
+ "<|25.42|>": 51636,
924
+ "<|25.44|>": 51637,
925
+ "<|25.46|>": 51638,
926
+ "<|25.48|>": 51639,
927
+ "<|25.50|>": 51640,
928
+ "<|25.52|>": 51641,
929
+ "<|25.54|>": 51642,
930
+ "<|25.56|>": 51643,
931
+ "<|25.58|>": 51644,
932
+ "<|25.60|>": 51645,
933
+ "<|25.62|>": 51646,
934
+ "<|25.64|>": 51647,
935
+ "<|25.66|>": 51648,
936
+ "<|25.68|>": 51649,
937
+ "<|25.70|>": 51650,
938
+ "<|25.72|>": 51651,
939
+ "<|25.74|>": 51652,
940
+ "<|25.76|>": 51653,
941
+ "<|25.78|>": 51654,
942
+ "<|25.80|>": 51655,
943
+ "<|25.82|>": 51656,
944
+ "<|25.84|>": 51657,
945
+ "<|25.86|>": 51658,
946
+ "<|25.88|>": 51659,
947
+ "<|25.90|>": 51660,
948
+ "<|25.92|>": 51661,
949
+ "<|25.94|>": 51662,
950
+ "<|25.96|>": 51663,
951
+ "<|25.98|>": 51664,
952
+ "<|26.00|>": 51665,
953
+ "<|26.02|>": 51666,
954
+ "<|26.04|>": 51667,
955
+ "<|26.06|>": 51668,
956
+ "<|26.08|>": 51669,
957
+ "<|26.10|>": 51670,
958
+ "<|26.12|>": 51671,
959
+ "<|26.14|>": 51672,
960
+ "<|26.16|>": 51673,
961
+ "<|26.18|>": 51674,
962
+ "<|26.20|>": 51675,
963
+ "<|26.22|>": 51676,
964
+ "<|26.24|>": 51677,
965
+ "<|26.26|>": 51678,
966
+ "<|26.28|>": 51679,
967
+ "<|26.30|>": 51680,
968
+ "<|26.32|>": 51681,
969
+ "<|26.34|>": 51682,
970
+ "<|26.36|>": 51683,
971
+ "<|26.38|>": 51684,
972
+ "<|26.40|>": 51685,
973
+ "<|26.42|>": 51686,
974
+ "<|26.44|>": 51687,
975
+ "<|26.46|>": 51688,
976
+ "<|26.48|>": 51689,
977
+ "<|26.50|>": 51690,
978
+ "<|26.52|>": 51691,
979
+ "<|26.54|>": 51692,
980
+ "<|26.56|>": 51693,
981
+ "<|26.58|>": 51694,
982
+ "<|26.60|>": 51695,
983
+ "<|26.62|>": 51696,
984
+ "<|26.64|>": 51697,
985
+ "<|26.66|>": 51698,
986
+ "<|26.68|>": 51699,
987
+ "<|26.70|>": 51700,
988
+ "<|26.72|>": 51701,
989
+ "<|26.74|>": 51702,
990
+ "<|26.76|>": 51703,
991
+ "<|26.78|>": 51704,
992
+ "<|26.80|>": 51705,
993
+ "<|26.82|>": 51706,
994
+ "<|26.84|>": 51707,
995
+ "<|26.86|>": 51708,
996
+ "<|26.88|>": 51709,
997
+ "<|26.90|>": 51710,
998
+ "<|26.92|>": 51711,
999
+ "<|26.94|>": 51712,
1000
+ "<|26.96|>": 51713,
1001
+ "<|26.98|>": 51714,
1002
+ "<|27.00|>": 51715,
1003
+ "<|27.02|>": 51716,
1004
+ "<|27.04|>": 51717,
1005
+ "<|27.06|>": 51718,
1006
+ "<|27.08|>": 51719,
1007
+ "<|27.10|>": 51720,
1008
+ "<|27.12|>": 51721,
1009
+ "<|27.14|>": 51722,
1010
+ "<|27.16|>": 51723,
1011
+ "<|27.18|>": 51724,
1012
+ "<|27.20|>": 51725,
1013
+ "<|27.22|>": 51726,
1014
+ "<|27.24|>": 51727,
1015
+ "<|27.26|>": 51728,
1016
+ "<|27.28|>": 51729,
1017
+ "<|27.30|>": 51730,
1018
+ "<|27.32|>": 51731,
1019
+ "<|27.34|>": 51732,
1020
+ "<|27.36|>": 51733,
1021
+ "<|27.38|>": 51734,
1022
+ "<|27.40|>": 51735,
1023
+ "<|27.42|>": 51736,
1024
+ "<|27.44|>": 51737,
1025
+ "<|27.46|>": 51738,
1026
+ "<|27.48|>": 51739,
1027
+ "<|27.50|>": 51740,
1028
+ "<|27.52|>": 51741,
1029
+ "<|27.54|>": 51742,
1030
+ "<|27.56|>": 51743,
1031
+ "<|27.58|>": 51744,
1032
+ "<|27.60|>": 51745,
1033
+ "<|27.62|>": 51746,
1034
+ "<|27.64|>": 51747,
1035
+ "<|27.66|>": 51748,
1036
+ "<|27.68|>": 51749,
1037
+ "<|27.70|>": 51750,
1038
+ "<|27.72|>": 51751,
1039
+ "<|27.74|>": 51752,
1040
+ "<|27.76|>": 51753,
1041
+ "<|27.78|>": 51754,
1042
+ "<|27.80|>": 51755,
1043
+ "<|27.82|>": 51756,
1044
+ "<|27.84|>": 51757,
1045
+ "<|27.86|>": 51758,
1046
+ "<|27.88|>": 51759,
1047
+ "<|27.90|>": 51760,
1048
+ "<|27.92|>": 51761,
1049
+ "<|27.94|>": 51762,
1050
+ "<|27.96|>": 51763,
1051
+ "<|27.98|>": 51764,
1052
+ "<|28.00|>": 51765,
1053
+ "<|28.02|>": 51766,
1054
+ "<|28.04|>": 51767,
1055
+ "<|28.06|>": 51768,
1056
+ "<|28.08|>": 51769,
1057
+ "<|28.10|>": 51770,
1058
+ "<|28.12|>": 51771,
1059
+ "<|28.14|>": 51772,
1060
+ "<|28.16|>": 51773,
1061
+ "<|28.18|>": 51774,
1062
+ "<|28.20|>": 51775,
1063
+ "<|28.22|>": 51776,
1064
+ "<|28.24|>": 51777,
1065
+ "<|28.26|>": 51778,
1066
+ "<|28.28|>": 51779,
1067
+ "<|28.30|>": 51780,
1068
+ "<|28.32|>": 51781,
1069
+ "<|28.34|>": 51782,
1070
+ "<|28.36|>": 51783,
1071
+ "<|28.38|>": 51784,
1072
+ "<|28.40|>": 51785,
1073
+ "<|28.42|>": 51786,
1074
+ "<|28.44|>": 51787,
1075
+ "<|28.46|>": 51788,
1076
+ "<|28.48|>": 51789,
1077
+ "<|28.50|>": 51790,
1078
+ "<|28.52|>": 51791,
1079
+ "<|28.54|>": 51792,
1080
+ "<|28.56|>": 51793,
1081
+ "<|28.58|>": 51794,
1082
+ "<|28.60|>": 51795,
1083
+ "<|28.62|>": 51796,
1084
+ "<|28.64|>": 51797,
1085
+ "<|28.66|>": 51798,
1086
+ "<|28.68|>": 51799,
1087
+ "<|28.70|>": 51800,
1088
+ "<|28.72|>": 51801,
1089
+ "<|28.74|>": 51802,
1090
+ "<|28.76|>": 51803,
1091
+ "<|28.78|>": 51804,
1092
+ "<|28.80|>": 51805,
1093
+ "<|28.82|>": 51806,
1094
+ "<|28.84|>": 51807,
1095
+ "<|28.86|>": 51808,
1096
+ "<|28.88|>": 51809,
1097
+ "<|28.90|>": 51810,
1098
+ "<|28.92|>": 51811,
1099
+ "<|28.94|>": 51812,
1100
+ "<|28.96|>": 51813,
1101
+ "<|28.98|>": 51814,
1102
+ "<|29.00|>": 51815,
1103
+ "<|29.02|>": 51816,
1104
+ "<|29.04|>": 51817,
1105
+ "<|29.06|>": 51818,
1106
+ "<|29.08|>": 51819,
1107
+ "<|29.10|>": 51820,
1108
+ "<|29.12|>": 51821,
1109
+ "<|29.14|>": 51822,
1110
+ "<|29.16|>": 51823,
1111
+ "<|29.18|>": 51824,
1112
+ "<|29.20|>": 51825,
1113
+ "<|29.22|>": 51826,
1114
+ "<|29.24|>": 51827,
1115
+ "<|29.26|>": 51828,
1116
+ "<|29.28|>": 51829,
1117
+ "<|29.30|>": 51830,
1118
+ "<|29.32|>": 51831,
1119
+ "<|29.34|>": 51832,
1120
+ "<|29.36|>": 51833,
1121
+ "<|29.38|>": 51834,
1122
+ "<|29.40|>": 51835,
1123
+ "<|29.42|>": 51836,
1124
+ "<|29.44|>": 51837,
1125
+ "<|29.46|>": 51838,
1126
+ "<|29.48|>": 51839,
1127
+ "<|29.50|>": 51840,
1128
+ "<|29.52|>": 51841,
1129
+ "<|29.54|>": 51842,
1130
+ "<|29.56|>": 51843,
1131
+ "<|29.58|>": 51844,
1132
+ "<|29.60|>": 51845,
1133
+ "<|29.62|>": 51846,
1134
+ "<|29.64|>": 51847,
1135
+ "<|29.66|>": 51848,
1136
+ "<|29.68|>": 51849,
1137
+ "<|29.70|>": 51850,
1138
+ "<|29.72|>": 51851,
1139
+ "<|29.74|>": 51852,
1140
+ "<|29.76|>": 51853,
1141
+ "<|29.78|>": 51854,
1142
+ "<|29.80|>": 51855,
1143
+ "<|29.82|>": 51856,
1144
+ "<|29.84|>": 51857,
1145
+ "<|29.86|>": 51858,
1146
+ "<|29.88|>": 51859,
1147
+ "<|29.90|>": 51860,
1148
+ "<|29.92|>": 51861,
1149
+ "<|29.94|>": 51862,
1150
+ "<|29.96|>": 51863,
1151
+ "<|29.98|>": 51864,
1152
+ "<|3.00|>": 50515,
1153
+ "<|3.02|>": 50516,
1154
+ "<|3.04|>": 50517,
1155
+ "<|3.06|>": 50518,
1156
+ "<|3.08|>": 50519,
1157
+ "<|3.10|>": 50520,
1158
+ "<|3.12|>": 50521,
1159
+ "<|3.14|>": 50522,
1160
+ "<|3.16|>": 50523,
1161
+ "<|3.18|>": 50524,
1162
+ "<|3.20|>": 50525,
1163
+ "<|3.22|>": 50526,
1164
+ "<|3.24|>": 50527,
1165
+ "<|3.26|>": 50528,
1166
+ "<|3.28|>": 50529,
1167
+ "<|3.30|>": 50530,
1168
+ "<|3.32|>": 50531,
1169
+ "<|3.34|>": 50532,
1170
+ "<|3.36|>": 50533,
1171
+ "<|3.38|>": 50534,
1172
+ "<|3.40|>": 50535,
1173
+ "<|3.42|>": 50536,
1174
+ "<|3.44|>": 50537,
1175
+ "<|3.46|>": 50538,
1176
+ "<|3.48|>": 50539,
1177
+ "<|3.50|>": 50540,
1178
+ "<|3.52|>": 50541,
1179
+ "<|3.54|>": 50542,
1180
+ "<|3.56|>": 50543,
1181
+ "<|3.58|>": 50544,
1182
+ "<|3.60|>": 50545,
1183
+ "<|3.62|>": 50546,
1184
+ "<|3.64|>": 50547,
1185
+ "<|3.66|>": 50548,
1186
+ "<|3.68|>": 50549,
1187
+ "<|3.70|>": 50550,
1188
+ "<|3.72|>": 50551,
1189
+ "<|3.74|>": 50552,
1190
+ "<|3.76|>": 50553,
1191
+ "<|3.78|>": 50554,
1192
+ "<|3.80|>": 50555,
1193
+ "<|3.82|>": 50556,
1194
+ "<|3.84|>": 50557,
1195
+ "<|3.86|>": 50558,
1196
+ "<|3.88|>": 50559,
1197
+ "<|3.90|>": 50560,
1198
+ "<|3.92|>": 50561,
1199
+ "<|3.94|>": 50562,
1200
+ "<|3.96|>": 50563,
1201
+ "<|3.98|>": 50564,
1202
+ "<|30.00|>": 51865,
1203
+ "<|4.00|>": 50565,
1204
+ "<|4.02|>": 50566,
1205
+ "<|4.04|>": 50567,
1206
+ "<|4.06|>": 50568,
1207
+ "<|4.08|>": 50569,
1208
+ "<|4.10|>": 50570,
1209
+ "<|4.12|>": 50571,
1210
+ "<|4.14|>": 50572,
1211
+ "<|4.16|>": 50573,
1212
+ "<|4.18|>": 50574,
1213
+ "<|4.20|>": 50575,
1214
+ "<|4.22|>": 50576,
1215
+ "<|4.24|>": 50577,
1216
+ "<|4.26|>": 50578,
1217
+ "<|4.28|>": 50579,
1218
+ "<|4.30|>": 50580,
1219
+ "<|4.32|>": 50581,
1220
+ "<|4.34|>": 50582,
1221
+ "<|4.36|>": 50583,
1222
+ "<|4.38|>": 50584,
1223
+ "<|4.40|>": 50585,
1224
+ "<|4.42|>": 50586,
1225
+ "<|4.44|>": 50587,
1226
+ "<|4.46|>": 50588,
1227
+ "<|4.48|>": 50589,
1228
+ "<|4.50|>": 50590,
1229
+ "<|4.52|>": 50591,
1230
+ "<|4.54|>": 50592,
1231
+ "<|4.56|>": 50593,
1232
+ "<|4.58|>": 50594,
1233
+ "<|4.60|>": 50595,
1234
+ "<|4.62|>": 50596,
1235
+ "<|4.64|>": 50597,
1236
+ "<|4.66|>": 50598,
1237
+ "<|4.68|>": 50599,
1238
+ "<|4.70|>": 50600,
1239
+ "<|4.72|>": 50601,
1240
+ "<|4.74|>": 50602,
1241
+ "<|4.76|>": 50603,
1242
+ "<|4.78|>": 50604,
1243
+ "<|4.80|>": 50605,
1244
+ "<|4.82|>": 50606,
1245
+ "<|4.84|>": 50607,
1246
+ "<|4.86|>": 50608,
1247
+ "<|4.88|>": 50609,
1248
+ "<|4.90|>": 50610,
1249
+ "<|4.92|>": 50611,
1250
+ "<|4.94|>": 50612,
1251
+ "<|4.96|>": 50613,
1252
+ "<|4.98|>": 50614,
1253
+ "<|5.00|>": 50615,
1254
+ "<|5.02|>": 50616,
1255
+ "<|5.04|>": 50617,
1256
+ "<|5.06|>": 50618,
1257
+ "<|5.08|>": 50619,
1258
+ "<|5.10|>": 50620,
1259
+ "<|5.12|>": 50621,
1260
+ "<|5.14|>": 50622,
1261
+ "<|5.16|>": 50623,
1262
+ "<|5.18|>": 50624,
1263
+ "<|5.20|>": 50625,
1264
+ "<|5.22|>": 50626,
1265
+ "<|5.24|>": 50627,
1266
+ "<|5.26|>": 50628,
1267
+ "<|5.28|>": 50629,
1268
+ "<|5.30|>": 50630,
1269
+ "<|5.32|>": 50631,
1270
+ "<|5.34|>": 50632,
1271
+ "<|5.36|>": 50633,
1272
+ "<|5.38|>": 50634,
1273
+ "<|5.40|>": 50635,
1274
+ "<|5.42|>": 50636,
1275
+ "<|5.44|>": 50637,
1276
+ "<|5.46|>": 50638,
1277
+ "<|5.48|>": 50639,
1278
+ "<|5.50|>": 50640,
1279
+ "<|5.52|>": 50641,
1280
+ "<|5.54|>": 50642,
1281
+ "<|5.56|>": 50643,
1282
+ "<|5.58|>": 50644,
1283
+ "<|5.60|>": 50645,
1284
+ "<|5.62|>": 50646,
1285
+ "<|5.64|>": 50647,
1286
+ "<|5.66|>": 50648,
1287
+ "<|5.68|>": 50649,
1288
+ "<|5.70|>": 50650,
1289
+ "<|5.72|>": 50651,
1290
+ "<|5.74|>": 50652,
1291
+ "<|5.76|>": 50653,
1292
+ "<|5.78|>": 50654,
1293
+ "<|5.80|>": 50655,
1294
+ "<|5.82|>": 50656,
1295
+ "<|5.84|>": 50657,
1296
+ "<|5.86|>": 50658,
1297
+ "<|5.88|>": 50659,
1298
+ "<|5.90|>": 50660,
1299
+ "<|5.92|>": 50661,
1300
+ "<|5.94|>": 50662,
1301
+ "<|5.96|>": 50663,
1302
+ "<|5.98|>": 50664,
1303
+ "<|6.00|>": 50665,
1304
+ "<|6.02|>": 50666,
1305
+ "<|6.04|>": 50667,
1306
+ "<|6.06|>": 50668,
1307
+ "<|6.08|>": 50669,
1308
+ "<|6.10|>": 50670,
1309
+ "<|6.12|>": 50671,
1310
+ "<|6.14|>": 50672,
1311
+ "<|6.16|>": 50673,
1312
+ "<|6.18|>": 50674,
1313
+ "<|6.20|>": 50675,
1314
+ "<|6.22|>": 50676,
1315
+ "<|6.24|>": 50677,
1316
+ "<|6.26|>": 50678,
1317
+ "<|6.28|>": 50679,
1318
+ "<|6.30|>": 50680,
1319
+ "<|6.32|>": 50681,
1320
+ "<|6.34|>": 50682,
1321
+ "<|6.36|>": 50683,
1322
+ "<|6.38|>": 50684,
1323
+ "<|6.40|>": 50685,
1324
+ "<|6.42|>": 50686,
1325
+ "<|6.44|>": 50687,
1326
+ "<|6.46|>": 50688,
1327
+ "<|6.48|>": 50689,
1328
+ "<|6.50|>": 50690,
1329
+ "<|6.52|>": 50691,
1330
+ "<|6.54|>": 50692,
1331
+ "<|6.56|>": 50693,
1332
+ "<|6.58|>": 50694,
1333
+ "<|6.60|>": 50695,
1334
+ "<|6.62|>": 50696,
1335
+ "<|6.64|>": 50697,
1336
+ "<|6.66|>": 50698,
1337
+ "<|6.68|>": 50699,
1338
+ "<|6.70|>": 50700,
1339
+ "<|6.72|>": 50701,
1340
+ "<|6.74|>": 50702,
1341
+ "<|6.76|>": 50703,
1342
+ "<|6.78|>": 50704,
1343
+ "<|6.80|>": 50705,
1344
+ "<|6.82|>": 50706,
1345
+ "<|6.84|>": 50707,
1346
+ "<|6.86|>": 50708,
1347
+ "<|6.88|>": 50709,
1348
+ "<|6.90|>": 50710,
1349
+ "<|6.92|>": 50711,
1350
+ "<|6.94|>": 50712,
1351
+ "<|6.96|>": 50713,
1352
+ "<|6.98|>": 50714,
1353
+ "<|7.00|>": 50715,
1354
+ "<|7.02|>": 50716,
1355
+ "<|7.04|>": 50717,
1356
+ "<|7.06|>": 50718,
1357
+ "<|7.08|>": 50719,
1358
+ "<|7.10|>": 50720,
1359
+ "<|7.12|>": 50721,
1360
+ "<|7.14|>": 50722,
1361
+ "<|7.16|>": 50723,
1362
+ "<|7.18|>": 50724,
1363
+ "<|7.20|>": 50725,
1364
+ "<|7.22|>": 50726,
1365
+ "<|7.24|>": 50727,
1366
+ "<|7.26|>": 50728,
1367
+ "<|7.28|>": 50729,
1368
+ "<|7.30|>": 50730,
1369
+ "<|7.32|>": 50731,
1370
+ "<|7.34|>": 50732,
1371
+ "<|7.36|>": 50733,
1372
+ "<|7.38|>": 50734,
1373
+ "<|7.40|>": 50735,
1374
+ "<|7.42|>": 50736,
1375
+ "<|7.44|>": 50737,
1376
+ "<|7.46|>": 50738,
1377
+ "<|7.48|>": 50739,
1378
+ "<|7.50|>": 50740,
1379
+ "<|7.52|>": 50741,
1380
+ "<|7.54|>": 50742,
1381
+ "<|7.56|>": 50743,
1382
+ "<|7.58|>": 50744,
1383
+ "<|7.60|>": 50745,
1384
+ "<|7.62|>": 50746,
1385
+ "<|7.64|>": 50747,
1386
+ "<|7.66|>": 50748,
1387
+ "<|7.68|>": 50749,
1388
+ "<|7.70|>": 50750,
1389
+ "<|7.72|>": 50751,
1390
+ "<|7.74|>": 50752,
1391
+ "<|7.76|>": 50753,
1392
+ "<|7.78|>": 50754,
1393
+ "<|7.80|>": 50755,
1394
+ "<|7.82|>": 50756,
1395
+ "<|7.84|>": 50757,
1396
+ "<|7.86|>": 50758,
1397
+ "<|7.88|>": 50759,
1398
+ "<|7.90|>": 50760,
1399
+ "<|7.92|>": 50761,
1400
+ "<|7.94|>": 50762,
1401
+ "<|7.96|>": 50763,
1402
+ "<|7.98|>": 50764,
1403
+ "<|8.00|>": 50765,
1404
+ "<|8.02|>": 50766,
1405
+ "<|8.04|>": 50767,
1406
+ "<|8.06|>": 50768,
1407
+ "<|8.08|>": 50769,
1408
+ "<|8.10|>": 50770,
1409
+ "<|8.12|>": 50771,
1410
+ "<|8.14|>": 50772,
1411
+ "<|8.16|>": 50773,
1412
+ "<|8.18|>": 50774,
1413
+ "<|8.20|>": 50775,
1414
+ "<|8.22|>": 50776,
1415
+ "<|8.24|>": 50777,
1416
+ "<|8.26|>": 50778,
1417
+ "<|8.28|>": 50779,
1418
+ "<|8.30|>": 50780,
1419
+ "<|8.32|>": 50781,
1420
+ "<|8.34|>": 50782,
1421
+ "<|8.36|>": 50783,
1422
+ "<|8.38|>": 50784,
1423
+ "<|8.40|>": 50785,
1424
+ "<|8.42|>": 50786,
1425
+ "<|8.44|>": 50787,
1426
+ "<|8.46|>": 50788,
1427
+ "<|8.48|>": 50789,
1428
+ "<|8.50|>": 50790,
1429
+ "<|8.52|>": 50791,
1430
+ "<|8.54|>": 50792,
1431
+ "<|8.56|>": 50793,
1432
+ "<|8.58|>": 50794,
1433
+ "<|8.60|>": 50795,
1434
+ "<|8.62|>": 50796,
1435
+ "<|8.64|>": 50797,
1436
+ "<|8.66|>": 50798,
1437
+ "<|8.68|>": 50799,
1438
+ "<|8.70|>": 50800,
1439
+ "<|8.72|>": 50801,
1440
+ "<|8.74|>": 50802,
1441
+ "<|8.76|>": 50803,
1442
+ "<|8.78|>": 50804,
1443
+ "<|8.80|>": 50805,
1444
+ "<|8.82|>": 50806,
1445
+ "<|8.84|>": 50807,
1446
+ "<|8.86|>": 50808,
1447
+ "<|8.88|>": 50809,
1448
+ "<|8.90|>": 50810,
1449
+ "<|8.92|>": 50811,
1450
+ "<|8.94|>": 50812,
1451
+ "<|8.96|>": 50813,
1452
+ "<|8.98|>": 50814,
1453
+ "<|9.00|>": 50815,
1454
+ "<|9.02|>": 50816,
1455
+ "<|9.04|>": 50817,
1456
+ "<|9.06|>": 50818,
1457
+ "<|9.08|>": 50819,
1458
+ "<|9.10|>": 50820,
1459
+ "<|9.12|>": 50821,
1460
+ "<|9.14|>": 50822,
1461
+ "<|9.16|>": 50823,
1462
+ "<|9.18|>": 50824,
1463
+ "<|9.20|>": 50825,
1464
+ "<|9.22|>": 50826,
1465
+ "<|9.24|>": 50827,
1466
+ "<|9.26|>": 50828,
1467
+ "<|9.28|>": 50829,
1468
+ "<|9.30|>": 50830,
1469
+ "<|9.32|>": 50831,
1470
+ "<|9.34|>": 50832,
1471
+ "<|9.36|>": 50833,
1472
+ "<|9.38|>": 50834,
1473
+ "<|9.40|>": 50835,
1474
+ "<|9.42|>": 50836,
1475
+ "<|9.44|>": 50837,
1476
+ "<|9.46|>": 50838,
1477
+ "<|9.48|>": 50839,
1478
+ "<|9.50|>": 50840,
1479
+ "<|9.52|>": 50841,
1480
+ "<|9.54|>": 50842,
1481
+ "<|9.56|>": 50843,
1482
+ "<|9.58|>": 50844,
1483
+ "<|9.60|>": 50845,
1484
+ "<|9.62|>": 50846,
1485
+ "<|9.64|>": 50847,
1486
+ "<|9.66|>": 50848,
1487
+ "<|9.68|>": 50849,
1488
+ "<|9.70|>": 50850,
1489
+ "<|9.72|>": 50851,
1490
+ "<|9.74|>": 50852,
1491
+ "<|9.76|>": 50853,
1492
+ "<|9.78|>": 50854,
1493
+ "<|9.80|>": 50855,
1494
+ "<|9.82|>": 50856,
1495
+ "<|9.84|>": 50857,
1496
+ "<|9.86|>": 50858,
1497
+ "<|9.88|>": 50859,
1498
+ "<|9.90|>": 50860,
1499
+ "<|9.92|>": 50861,
1500
+ "<|9.94|>": 50862,
1501
+ "<|9.96|>": 50863,
1502
+ "<|9.98|>": 50864,
1503
+ "<|af|>": 50327,
1504
+ "<|am|>": 50334,
1505
+ "<|ar|>": 50272,
1506
+ "<|as|>": 50350,
1507
+ "<|az|>": 50304,
1508
+ "<|ba|>": 50355,
1509
+ "<|be|>": 50330,
1510
+ "<|bg|>": 50292,
1511
+ "<|bn|>": 50302,
1512
+ "<|bo|>": 50347,
1513
+ "<|br|>": 50309,
1514
+ "<|bs|>": 50315,
1515
+ "<|ca|>": 50270,
1516
+ "<|cs|>": 50283,
1517
+ "<|cy|>": 50297,
1518
+ "<|da|>": 50285,
1519
+ "<|de|>": 50261,
1520
+ "<|el|>": 50281,
1521
+ "<|endoftext|>": 50257,
1522
+ "<|en|>": 50259,
1523
+ "<|es|>": 50262,
1524
+ "<|et|>": 50307,
1525
+ "<|eu|>": 50310,
1526
+ "<|fa|>": 50300,
1527
+ "<|fi|>": 50277,
1528
+ "<|fo|>": 50338,
1529
+ "<|fr|>": 50265,
1530
+ "<|gl|>": 50319,
1531
+ "<|gu|>": 50333,
1532
+ "<|haw|>": 50352,
1533
+ "<|ha|>": 50354,
1534
+ "<|he|>": 50279,
1535
+ "<|hi|>": 50276,
1536
+ "<|hr|>": 50291,
1537
+ "<|ht|>": 50339,
1538
+ "<|hu|>": 50286,
1539
+ "<|hy|>": 50312,
1540
+ "<|id|>": 50275,
1541
+ "<|is|>": 50311,
1542
+ "<|it|>": 50274,
1543
+ "<|ja|>": 50266,
1544
+ "<|jw|>": 50356,
1545
+ "<|ka|>": 50329,
1546
+ "<|kk|>": 50316,
1547
+ "<|km|>": 50323,
1548
+ "<|kn|>": 50306,
1549
+ "<|ko|>": 50264,
1550
+ "<|la|>": 50294,
1551
+ "<|lb|>": 50345,
1552
+ "<|ln|>": 50353,
1553
+ "<|lo|>": 50336,
1554
+ "<|lt|>": 50293,
1555
+ "<|lv|>": 50301,
1556
+ "<|mg|>": 50349,
1557
+ "<|mi|>": 50295,
1558
+ "<|mk|>": 50308,
1559
+ "<|ml|>": 50296,
1560
+ "<|mn|>": 50314,
1561
+ "<|mr|>": 50320,
1562
+ "<|ms|>": 50282,
1563
+ "<|mt|>": 50343,
1564
+ "<|my|>": 50346,
1565
+ "<|ne|>": 50313,
1566
+ "<|nl|>": 50271,
1567
+ "<|nn|>": 50342,
1568
+ "<|nospeech|>": 50363,
1569
+ "<|notimestamps|>": 50364,
1570
+ "<|no|>": 50288,
1571
+ "<|oc|>": 50328,
1572
+ "<|pa|>": 50321,
1573
+ "<|pl|>": 50269,
1574
+ "<|ps|>": 50340,
1575
+ "<|pt|>": 50267,
1576
+ "<|ro|>": 50284,
1577
+ "<|ru|>": 50263,
1578
+ "<|sa|>": 50344,
1579
+ "<|sd|>": 50332,
1580
+ "<|si|>": 50322,
1581
+ "<|sk|>": 50298,
1582
+ "<|sl|>": 50305,
1583
+ "<|sn|>": 50324,
1584
+ "<|so|>": 50326,
1585
+ "<|sq|>": 50317,
1586
+ "<|sr|>": 50303,
1587
+ "<|startoflm|>": 50361,
1588
+ "<|startofprev|>": 50362,
1589
+ "<|startoftranscript|>": 50258,
1590
+ "<|su|>": 50357,
1591
+ "<|sv|>": 50273,
1592
+ "<|sw|>": 50318,
1593
+ "<|ta|>": 50287,
1594
+ "<|te|>": 50299,
1595
+ "<|tg|>": 50331,
1596
+ "<|th|>": 50289,
1597
+ "<|tk|>": 50341,
1598
+ "<|tl|>": 50348,
1599
+ "<|transcribe|>": 50360,
1600
+ "<|translate|>": 50359,
1601
+ "<|tr|>": 50268,
1602
+ "<|tt|>": 50351,
1603
+ "<|uk|>": 50280,
1604
+ "<|ur|>": 50290,
1605
+ "<|uz|>": 50337,
1606
+ "<|vi|>": 50278,
1607
+ "<|yi|>": 50335,
1608
+ "<|yo|>": 50325,
1609
+ "<|yue|>": 50358,
1610
+ "<|zh|>": 50260
1611
+ }
checkpoint-1000/added_tokens.json ADDED
@@ -0,0 +1,1611 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "<|0.00|>": 50365,
3
+ "<|0.02|>": 50366,
4
+ "<|0.04|>": 50367,
5
+ "<|0.06|>": 50368,
6
+ "<|0.08|>": 50369,
7
+ "<|0.10|>": 50370,
8
+ "<|0.12|>": 50371,
9
+ "<|0.14|>": 50372,
10
+ "<|0.16|>": 50373,
11
+ "<|0.18|>": 50374,
12
+ "<|0.20|>": 50375,
13
+ "<|0.22|>": 50376,
14
+ "<|0.24|>": 50377,
15
+ "<|0.26|>": 50378,
16
+ "<|0.28|>": 50379,
17
+ "<|0.30|>": 50380,
18
+ "<|0.32|>": 50381,
19
+ "<|0.34|>": 50382,
20
+ "<|0.36|>": 50383,
21
+ "<|0.38|>": 50384,
22
+ "<|0.40|>": 50385,
23
+ "<|0.42|>": 50386,
24
+ "<|0.44|>": 50387,
25
+ "<|0.46|>": 50388,
26
+ "<|0.48|>": 50389,
27
+ "<|0.50|>": 50390,
28
+ "<|0.52|>": 50391,
29
+ "<|0.54|>": 50392,
30
+ "<|0.56|>": 50393,
31
+ "<|0.58|>": 50394,
32
+ "<|0.60|>": 50395,
33
+ "<|0.62|>": 50396,
34
+ "<|0.64|>": 50397,
35
+ "<|0.66|>": 50398,
36
+ "<|0.68|>": 50399,
37
+ "<|0.70|>": 50400,
38
+ "<|0.72|>": 50401,
39
+ "<|0.74|>": 50402,
40
+ "<|0.76|>": 50403,
41
+ "<|0.78|>": 50404,
42
+ "<|0.80|>": 50405,
43
+ "<|0.82|>": 50406,
44
+ "<|0.84|>": 50407,
45
+ "<|0.86|>": 50408,
46
+ "<|0.88|>": 50409,
47
+ "<|0.90|>": 50410,
48
+ "<|0.92|>": 50411,
49
+ "<|0.94|>": 50412,
50
+ "<|0.96|>": 50413,
51
+ "<|0.98|>": 50414,
52
+ "<|1.00|>": 50415,
53
+ "<|1.02|>": 50416,
54
+ "<|1.04|>": 50417,
55
+ "<|1.06|>": 50418,
56
+ "<|1.08|>": 50419,
57
+ "<|1.10|>": 50420,
58
+ "<|1.12|>": 50421,
59
+ "<|1.14|>": 50422,
60
+ "<|1.16|>": 50423,
61
+ "<|1.18|>": 50424,
62
+ "<|1.20|>": 50425,
63
+ "<|1.22|>": 50426,
64
+ "<|1.24|>": 50427,
65
+ "<|1.26|>": 50428,
66
+ "<|1.28|>": 50429,
67
+ "<|1.30|>": 50430,
68
+ "<|1.32|>": 50431,
69
+ "<|1.34|>": 50432,
70
+ "<|1.36|>": 50433,
71
+ "<|1.38|>": 50434,
72
+ "<|1.40|>": 50435,
73
+ "<|1.42|>": 50436,
74
+ "<|1.44|>": 50437,
75
+ "<|1.46|>": 50438,
76
+ "<|1.48|>": 50439,
77
+ "<|1.50|>": 50440,
78
+ "<|1.52|>": 50441,
79
+ "<|1.54|>": 50442,
80
+ "<|1.56|>": 50443,
81
+ "<|1.58|>": 50444,
82
+ "<|1.60|>": 50445,
83
+ "<|1.62|>": 50446,
84
+ "<|1.64|>": 50447,
85
+ "<|1.66|>": 50448,
86
+ "<|1.68|>": 50449,
87
+ "<|1.70|>": 50450,
88
+ "<|1.72|>": 50451,
89
+ "<|1.74|>": 50452,
90
+ "<|1.76|>": 50453,
91
+ "<|1.78|>": 50454,
92
+ "<|1.80|>": 50455,
93
+ "<|1.82|>": 50456,
94
+ "<|1.84|>": 50457,
95
+ "<|1.86|>": 50458,
96
+ "<|1.88|>": 50459,
97
+ "<|1.90|>": 50460,
98
+ "<|1.92|>": 50461,
99
+ "<|1.94|>": 50462,
100
+ "<|1.96|>": 50463,
101
+ "<|1.98|>": 50464,
102
+ "<|10.00|>": 50865,
103
+ "<|10.02|>": 50866,
104
+ "<|10.04|>": 50867,
105
+ "<|10.06|>": 50868,
106
+ "<|10.08|>": 50869,
107
+ "<|10.10|>": 50870,
108
+ "<|10.12|>": 50871,
109
+ "<|10.14|>": 50872,
110
+ "<|10.16|>": 50873,
111
+ "<|10.18|>": 50874,
112
+ "<|10.20|>": 50875,
113
+ "<|10.22|>": 50876,
114
+ "<|10.24|>": 50877,
115
+ "<|10.26|>": 50878,
116
+ "<|10.28|>": 50879,
117
+ "<|10.30|>": 50880,
118
+ "<|10.32|>": 50881,
119
+ "<|10.34|>": 50882,
120
+ "<|10.36|>": 50883,
121
+ "<|10.38|>": 50884,
122
+ "<|10.40|>": 50885,
123
+ "<|10.42|>": 50886,
124
+ "<|10.44|>": 50887,
125
+ "<|10.46|>": 50888,
126
+ "<|10.48|>": 50889,
127
+ "<|10.50|>": 50890,
128
+ "<|10.52|>": 50891,
129
+ "<|10.54|>": 50892,
130
+ "<|10.56|>": 50893,
131
+ "<|10.58|>": 50894,
132
+ "<|10.60|>": 50895,
133
+ "<|10.62|>": 50896,
134
+ "<|10.64|>": 50897,
135
+ "<|10.66|>": 50898,
136
+ "<|10.68|>": 50899,
137
+ "<|10.70|>": 50900,
138
+ "<|10.72|>": 50901,
139
+ "<|10.74|>": 50902,
140
+ "<|10.76|>": 50903,
141
+ "<|10.78|>": 50904,
142
+ "<|10.80|>": 50905,
143
+ "<|10.82|>": 50906,
144
+ "<|10.84|>": 50907,
145
+ "<|10.86|>": 50908,
146
+ "<|10.88|>": 50909,
147
+ "<|10.90|>": 50910,
148
+ "<|10.92|>": 50911,
149
+ "<|10.94|>": 50912,
150
+ "<|10.96|>": 50913,
151
+ "<|10.98|>": 50914,
152
+ "<|11.00|>": 50915,
153
+ "<|11.02|>": 50916,
154
+ "<|11.04|>": 50917,
155
+ "<|11.06|>": 50918,
156
+ "<|11.08|>": 50919,
157
+ "<|11.10|>": 50920,
158
+ "<|11.12|>": 50921,
159
+ "<|11.14|>": 50922,
160
+ "<|11.16|>": 50923,
161
+ "<|11.18|>": 50924,
162
+ "<|11.20|>": 50925,
163
+ "<|11.22|>": 50926,
164
+ "<|11.24|>": 50927,
165
+ "<|11.26|>": 50928,
166
+ "<|11.28|>": 50929,
167
+ "<|11.30|>": 50930,
168
+ "<|11.32|>": 50931,
169
+ "<|11.34|>": 50932,
170
+ "<|11.36|>": 50933,
171
+ "<|11.38|>": 50934,
172
+ "<|11.40|>": 50935,
173
+ "<|11.42|>": 50936,
174
+ "<|11.44|>": 50937,
175
+ "<|11.46|>": 50938,
176
+ "<|11.48|>": 50939,
177
+ "<|11.50|>": 50940,
178
+ "<|11.52|>": 50941,
179
+ "<|11.54|>": 50942,
180
+ "<|11.56|>": 50943,
181
+ "<|11.58|>": 50944,
182
+ "<|11.60|>": 50945,
183
+ "<|11.62|>": 50946,
184
+ "<|11.64|>": 50947,
185
+ "<|11.66|>": 50948,
186
+ "<|11.68|>": 50949,
187
+ "<|11.70|>": 50950,
188
+ "<|11.72|>": 50951,
189
+ "<|11.74|>": 50952,
190
+ "<|11.76|>": 50953,
191
+ "<|11.78|>": 50954,
192
+ "<|11.80|>": 50955,
193
+ "<|11.82|>": 50956,
194
+ "<|11.84|>": 50957,
195
+ "<|11.86|>": 50958,
196
+ "<|11.88|>": 50959,
197
+ "<|11.90|>": 50960,
198
+ "<|11.92|>": 50961,
199
+ "<|11.94|>": 50962,
200
+ "<|11.96|>": 50963,
201
+ "<|11.98|>": 50964,
202
+ "<|12.00|>": 50965,
203
+ "<|12.02|>": 50966,
204
+ "<|12.04|>": 50967,
205
+ "<|12.06|>": 50968,
206
+ "<|12.08|>": 50969,
207
+ "<|12.10|>": 50970,
208
+ "<|12.12|>": 50971,
209
+ "<|12.14|>": 50972,
210
+ "<|12.16|>": 50973,
211
+ "<|12.18|>": 50974,
212
+ "<|12.20|>": 50975,
213
+ "<|12.22|>": 50976,
214
+ "<|12.24|>": 50977,
215
+ "<|12.26|>": 50978,
216
+ "<|12.28|>": 50979,
217
+ "<|12.30|>": 50980,
218
+ "<|12.32|>": 50981,
219
+ "<|12.34|>": 50982,
220
+ "<|12.36|>": 50983,
221
+ "<|12.38|>": 50984,
222
+ "<|12.40|>": 50985,
223
+ "<|12.42|>": 50986,
224
+ "<|12.44|>": 50987,
225
+ "<|12.46|>": 50988,
226
+ "<|12.48|>": 50989,
227
+ "<|12.50|>": 50990,
228
+ "<|12.52|>": 50991,
229
+ "<|12.54|>": 50992,
230
+ "<|12.56|>": 50993,
231
+ "<|12.58|>": 50994,
232
+ "<|12.60|>": 50995,
233
+ "<|12.62|>": 50996,
234
+ "<|12.64|>": 50997,
235
+ "<|12.66|>": 50998,
236
+ "<|12.68|>": 50999,
237
+ "<|12.70|>": 51000,
238
+ "<|12.72|>": 51001,
239
+ "<|12.74|>": 51002,
240
+ "<|12.76|>": 51003,
241
+ "<|12.78|>": 51004,
242
+ "<|12.80|>": 51005,
243
+ "<|12.82|>": 51006,
244
+ "<|12.84|>": 51007,
245
+ "<|12.86|>": 51008,
246
+ "<|12.88|>": 51009,
247
+ "<|12.90|>": 51010,
248
+ "<|12.92|>": 51011,
249
+ "<|12.94|>": 51012,
250
+ "<|12.96|>": 51013,
251
+ "<|12.98|>": 51014,
252
+ "<|13.00|>": 51015,
253
+ "<|13.02|>": 51016,
254
+ "<|13.04|>": 51017,
255
+ "<|13.06|>": 51018,
256
+ "<|13.08|>": 51019,
257
+ "<|13.10|>": 51020,
258
+ "<|13.12|>": 51021,
259
+ "<|13.14|>": 51022,
260
+ "<|13.16|>": 51023,
261
+ "<|13.18|>": 51024,
262
+ "<|13.20|>": 51025,
263
+ "<|13.22|>": 51026,
264
+ "<|13.24|>": 51027,
265
+ "<|13.26|>": 51028,
266
+ "<|13.28|>": 51029,
267
+ "<|13.30|>": 51030,
268
+ "<|13.32|>": 51031,
269
+ "<|13.34|>": 51032,
270
+ "<|13.36|>": 51033,
271
+ "<|13.38|>": 51034,
272
+ "<|13.40|>": 51035,
273
+ "<|13.42|>": 51036,
274
+ "<|13.44|>": 51037,
275
+ "<|13.46|>": 51038,
276
+ "<|13.48|>": 51039,
277
+ "<|13.50|>": 51040,
278
+ "<|13.52|>": 51041,
279
+ "<|13.54|>": 51042,
280
+ "<|13.56|>": 51043,
281
+ "<|13.58|>": 51044,
282
+ "<|13.60|>": 51045,
283
+ "<|13.62|>": 51046,
284
+ "<|13.64|>": 51047,
285
+ "<|13.66|>": 51048,
286
+ "<|13.68|>": 51049,
287
+ "<|13.70|>": 51050,
288
+ "<|13.72|>": 51051,
289
+ "<|13.74|>": 51052,
290
+ "<|13.76|>": 51053,
291
+ "<|13.78|>": 51054,
292
+ "<|13.80|>": 51055,
293
+ "<|13.82|>": 51056,
294
+ "<|13.84|>": 51057,
295
+ "<|13.86|>": 51058,
296
+ "<|13.88|>": 51059,
297
+ "<|13.90|>": 51060,
298
+ "<|13.92|>": 51061,
299
+ "<|13.94|>": 51062,
300
+ "<|13.96|>": 51063,
301
+ "<|13.98|>": 51064,
302
+ "<|14.00|>": 51065,
303
+ "<|14.02|>": 51066,
304
+ "<|14.04|>": 51067,
305
+ "<|14.06|>": 51068,
306
+ "<|14.08|>": 51069,
307
+ "<|14.10|>": 51070,
308
+ "<|14.12|>": 51071,
309
+ "<|14.14|>": 51072,
310
+ "<|14.16|>": 51073,
311
+ "<|14.18|>": 51074,
312
+ "<|14.20|>": 51075,
313
+ "<|14.22|>": 51076,
314
+ "<|14.24|>": 51077,
315
+ "<|14.26|>": 51078,
316
+ "<|14.28|>": 51079,
317
+ "<|14.30|>": 51080,
318
+ "<|14.32|>": 51081,
319
+ "<|14.34|>": 51082,
320
+ "<|14.36|>": 51083,
321
+ "<|14.38|>": 51084,
322
+ "<|14.40|>": 51085,
323
+ "<|14.42|>": 51086,
324
+ "<|14.44|>": 51087,
325
+ "<|14.46|>": 51088,
326
+ "<|14.48|>": 51089,
327
+ "<|14.50|>": 51090,
328
+ "<|14.52|>": 51091,
329
+ "<|14.54|>": 51092,
330
+ "<|14.56|>": 51093,
331
+ "<|14.58|>": 51094,
332
+ "<|14.60|>": 51095,
333
+ "<|14.62|>": 51096,
334
+ "<|14.64|>": 51097,
335
+ "<|14.66|>": 51098,
336
+ "<|14.68|>": 51099,
337
+ "<|14.70|>": 51100,
338
+ "<|14.72|>": 51101,
339
+ "<|14.74|>": 51102,
340
+ "<|14.76|>": 51103,
341
+ "<|14.78|>": 51104,
342
+ "<|14.80|>": 51105,
343
+ "<|14.82|>": 51106,
344
+ "<|14.84|>": 51107,
345
+ "<|14.86|>": 51108,
346
+ "<|14.88|>": 51109,
347
+ "<|14.90|>": 51110,
348
+ "<|14.92|>": 51111,
349
+ "<|14.94|>": 51112,
350
+ "<|14.96|>": 51113,
351
+ "<|14.98|>": 51114,
352
+ "<|15.00|>": 51115,
353
+ "<|15.02|>": 51116,
354
+ "<|15.04|>": 51117,
355
+ "<|15.06|>": 51118,
356
+ "<|15.08|>": 51119,
357
+ "<|15.10|>": 51120,
358
+ "<|15.12|>": 51121,
359
+ "<|15.14|>": 51122,
360
+ "<|15.16|>": 51123,
361
+ "<|15.18|>": 51124,
362
+ "<|15.20|>": 51125,
363
+ "<|15.22|>": 51126,
364
+ "<|15.24|>": 51127,
365
+ "<|15.26|>": 51128,
366
+ "<|15.28|>": 51129,
367
+ "<|15.30|>": 51130,
368
+ "<|15.32|>": 51131,
369
+ "<|15.34|>": 51132,
370
+ "<|15.36|>": 51133,
371
+ "<|15.38|>": 51134,
372
+ "<|15.40|>": 51135,
373
+ "<|15.42|>": 51136,
374
+ "<|15.44|>": 51137,
375
+ "<|15.46|>": 51138,
376
+ "<|15.48|>": 51139,
377
+ "<|15.50|>": 51140,
378
+ "<|15.52|>": 51141,
379
+ "<|15.54|>": 51142,
380
+ "<|15.56|>": 51143,
381
+ "<|15.58|>": 51144,
382
+ "<|15.60|>": 51145,
383
+ "<|15.62|>": 51146,
384
+ "<|15.64|>": 51147,
385
+ "<|15.66|>": 51148,
386
+ "<|15.68|>": 51149,
387
+ "<|15.70|>": 51150,
388
+ "<|15.72|>": 51151,
389
+ "<|15.74|>": 51152,
390
+ "<|15.76|>": 51153,
391
+ "<|15.78|>": 51154,
392
+ "<|15.80|>": 51155,
393
+ "<|15.82|>": 51156,
394
+ "<|15.84|>": 51157,
395
+ "<|15.86|>": 51158,
396
+ "<|15.88|>": 51159,
397
+ "<|15.90|>": 51160,
398
+ "<|15.92|>": 51161,
399
+ "<|15.94|>": 51162,
400
+ "<|15.96|>": 51163,
401
+ "<|15.98|>": 51164,
402
+ "<|16.00|>": 51165,
403
+ "<|16.02|>": 51166,
404
+ "<|16.04|>": 51167,
405
+ "<|16.06|>": 51168,
406
+ "<|16.08|>": 51169,
407
+ "<|16.10|>": 51170,
408
+ "<|16.12|>": 51171,
409
+ "<|16.14|>": 51172,
410
+ "<|16.16|>": 51173,
411
+ "<|16.18|>": 51174,
412
+ "<|16.20|>": 51175,
413
+ "<|16.22|>": 51176,
414
+ "<|16.24|>": 51177,
415
+ "<|16.26|>": 51178,
416
+ "<|16.28|>": 51179,
417
+ "<|16.30|>": 51180,
418
+ "<|16.32|>": 51181,
419
+ "<|16.34|>": 51182,
420
+ "<|16.36|>": 51183,
421
+ "<|16.38|>": 51184,
422
+ "<|16.40|>": 51185,
423
+ "<|16.42|>": 51186,
424
+ "<|16.44|>": 51187,
425
+ "<|16.46|>": 51188,
426
+ "<|16.48|>": 51189,
427
+ "<|16.50|>": 51190,
428
+ "<|16.52|>": 51191,
429
+ "<|16.54|>": 51192,
430
+ "<|16.56|>": 51193,
431
+ "<|16.58|>": 51194,
432
+ "<|16.60|>": 51195,
433
+ "<|16.62|>": 51196,
434
+ "<|16.64|>": 51197,
435
+ "<|16.66|>": 51198,
436
+ "<|16.68|>": 51199,
437
+ "<|16.70|>": 51200,
438
+ "<|16.72|>": 51201,
439
+ "<|16.74|>": 51202,
440
+ "<|16.76|>": 51203,
441
+ "<|16.78|>": 51204,
442
+ "<|16.80|>": 51205,
443
+ "<|16.82|>": 51206,
444
+ "<|16.84|>": 51207,
445
+ "<|16.86|>": 51208,
446
+ "<|16.88|>": 51209,
447
+ "<|16.90|>": 51210,
448
+ "<|16.92|>": 51211,
449
+ "<|16.94|>": 51212,
450
+ "<|16.96|>": 51213,
451
+ "<|16.98|>": 51214,
452
+ "<|17.00|>": 51215,
453
+ "<|17.02|>": 51216,
454
+ "<|17.04|>": 51217,
455
+ "<|17.06|>": 51218,
456
+ "<|17.08|>": 51219,
457
+ "<|17.10|>": 51220,
458
+ "<|17.12|>": 51221,
459
+ "<|17.14|>": 51222,
460
+ "<|17.16|>": 51223,
461
+ "<|17.18|>": 51224,
462
+ "<|17.20|>": 51225,
463
+ "<|17.22|>": 51226,
464
+ "<|17.24|>": 51227,
465
+ "<|17.26|>": 51228,
466
+ "<|17.28|>": 51229,
467
+ "<|17.30|>": 51230,
468
+ "<|17.32|>": 51231,
469
+ "<|17.34|>": 51232,
470
+ "<|17.36|>": 51233,
471
+ "<|17.38|>": 51234,
472
+ "<|17.40|>": 51235,
473
+ "<|17.42|>": 51236,
474
+ "<|17.44|>": 51237,
475
+ "<|17.46|>": 51238,
476
+ "<|17.48|>": 51239,
477
+ "<|17.50|>": 51240,
478
+ "<|17.52|>": 51241,
479
+ "<|17.54|>": 51242,
480
+ "<|17.56|>": 51243,
481
+ "<|17.58|>": 51244,
482
+ "<|17.60|>": 51245,
483
+ "<|17.62|>": 51246,
484
+ "<|17.64|>": 51247,
485
+ "<|17.66|>": 51248,
486
+ "<|17.68|>": 51249,
487
+ "<|17.70|>": 51250,
488
+ "<|17.72|>": 51251,
489
+ "<|17.74|>": 51252,
490
+ "<|17.76|>": 51253,
491
+ "<|17.78|>": 51254,
492
+ "<|17.80|>": 51255,
493
+ "<|17.82|>": 51256,
494
+ "<|17.84|>": 51257,
495
+ "<|17.86|>": 51258,
496
+ "<|17.88|>": 51259,
497
+ "<|17.90|>": 51260,
498
+ "<|17.92|>": 51261,
499
+ "<|17.94|>": 51262,
500
+ "<|17.96|>": 51263,
501
+ "<|17.98|>": 51264,
502
+ "<|18.00|>": 51265,
503
+ "<|18.02|>": 51266,
504
+ "<|18.04|>": 51267,
505
+ "<|18.06|>": 51268,
506
+ "<|18.08|>": 51269,
507
+ "<|18.10|>": 51270,
508
+ "<|18.12|>": 51271,
509
+ "<|18.14|>": 51272,
510
+ "<|18.16|>": 51273,
511
+ "<|18.18|>": 51274,
512
+ "<|18.20|>": 51275,
513
+ "<|18.22|>": 51276,
514
+ "<|18.24|>": 51277,
515
+ "<|18.26|>": 51278,
516
+ "<|18.28|>": 51279,
517
+ "<|18.30|>": 51280,
518
+ "<|18.32|>": 51281,
519
+ "<|18.34|>": 51282,
520
+ "<|18.36|>": 51283,
521
+ "<|18.38|>": 51284,
522
+ "<|18.40|>": 51285,
523
+ "<|18.42|>": 51286,
524
+ "<|18.44|>": 51287,
525
+ "<|18.46|>": 51288,
526
+ "<|18.48|>": 51289,
527
+ "<|18.50|>": 51290,
528
+ "<|18.52|>": 51291,
529
+ "<|18.54|>": 51292,
530
+ "<|18.56|>": 51293,
531
+ "<|18.58|>": 51294,
532
+ "<|18.60|>": 51295,
533
+ "<|18.62|>": 51296,
534
+ "<|18.64|>": 51297,
535
+ "<|18.66|>": 51298,
536
+ "<|18.68|>": 51299,
537
+ "<|18.70|>": 51300,
538
+ "<|18.72|>": 51301,
539
+ "<|18.74|>": 51302,
540
+ "<|18.76|>": 51303,
541
+ "<|18.78|>": 51304,
542
+ "<|18.80|>": 51305,
543
+ "<|18.82|>": 51306,
544
+ "<|18.84|>": 51307,
545
+ "<|18.86|>": 51308,
546
+ "<|18.88|>": 51309,
547
+ "<|18.90|>": 51310,
548
+ "<|18.92|>": 51311,
549
+ "<|18.94|>": 51312,
550
+ "<|18.96|>": 51313,
551
+ "<|18.98|>": 51314,
552
+ "<|19.00|>": 51315,
553
+ "<|19.02|>": 51316,
554
+ "<|19.04|>": 51317,
555
+ "<|19.06|>": 51318,
556
+ "<|19.08|>": 51319,
557
+ "<|19.10|>": 51320,
558
+ "<|19.12|>": 51321,
559
+ "<|19.14|>": 51322,
560
+ "<|19.16|>": 51323,
561
+ "<|19.18|>": 51324,
562
+ "<|19.20|>": 51325,
563
+ "<|19.22|>": 51326,
564
+ "<|19.24|>": 51327,
565
+ "<|19.26|>": 51328,
566
+ "<|19.28|>": 51329,
567
+ "<|19.30|>": 51330,
568
+ "<|19.32|>": 51331,
569
+ "<|19.34|>": 51332,
570
+ "<|19.36|>": 51333,
571
+ "<|19.38|>": 51334,
572
+ "<|19.40|>": 51335,
573
+ "<|19.42|>": 51336,
574
+ "<|19.44|>": 51337,
575
+ "<|19.46|>": 51338,
576
+ "<|19.48|>": 51339,
577
+ "<|19.50|>": 51340,
578
+ "<|19.52|>": 51341,
579
+ "<|19.54|>": 51342,
580
+ "<|19.56|>": 51343,
581
+ "<|19.58|>": 51344,
582
+ "<|19.60|>": 51345,
583
+ "<|19.62|>": 51346,
584
+ "<|19.64|>": 51347,
585
+ "<|19.66|>": 51348,
586
+ "<|19.68|>": 51349,
587
+ "<|19.70|>": 51350,
588
+ "<|19.72|>": 51351,
589
+ "<|19.74|>": 51352,
590
+ "<|19.76|>": 51353,
591
+ "<|19.78|>": 51354,
592
+ "<|19.80|>": 51355,
593
+ "<|19.82|>": 51356,
594
+ "<|19.84|>": 51357,
595
+ "<|19.86|>": 51358,
596
+ "<|19.88|>": 51359,
597
+ "<|19.90|>": 51360,
598
+ "<|19.92|>": 51361,
599
+ "<|19.94|>": 51362,
600
+ "<|19.96|>": 51363,
601
+ "<|19.98|>": 51364,
602
+ "<|2.00|>": 50465,
603
+ "<|2.02|>": 50466,
604
+ "<|2.04|>": 50467,
605
+ "<|2.06|>": 50468,
606
+ "<|2.08|>": 50469,
607
+ "<|2.10|>": 50470,
608
+ "<|2.12|>": 50471,
609
+ "<|2.14|>": 50472,
610
+ "<|2.16|>": 50473,
611
+ "<|2.18|>": 50474,
612
+ "<|2.20|>": 50475,
613
+ "<|2.22|>": 50476,
614
+ "<|2.24|>": 50477,
615
+ "<|2.26|>": 50478,
616
+ "<|2.28|>": 50479,
617
+ "<|2.30|>": 50480,
618
+ "<|2.32|>": 50481,
619
+ "<|2.34|>": 50482,
620
+ "<|2.36|>": 50483,
621
+ "<|2.38|>": 50484,
622
+ "<|2.40|>": 50485,
623
+ "<|2.42|>": 50486,
624
+ "<|2.44|>": 50487,
625
+ "<|2.46|>": 50488,
626
+ "<|2.48|>": 50489,
627
+ "<|2.50|>": 50490,
628
+ "<|2.52|>": 50491,
629
+ "<|2.54|>": 50492,
630
+ "<|2.56|>": 50493,
631
+ "<|2.58|>": 50494,
632
+ "<|2.60|>": 50495,
633
+ "<|2.62|>": 50496,
634
+ "<|2.64|>": 50497,
635
+ "<|2.66|>": 50498,
636
+ "<|2.68|>": 50499,
637
+ "<|2.70|>": 50500,
638
+ "<|2.72|>": 50501,
639
+ "<|2.74|>": 50502,
640
+ "<|2.76|>": 50503,
641
+ "<|2.78|>": 50504,
642
+ "<|2.80|>": 50505,
643
+ "<|2.82|>": 50506,
644
+ "<|2.84|>": 50507,
645
+ "<|2.86|>": 50508,
646
+ "<|2.88|>": 50509,
647
+ "<|2.90|>": 50510,
648
+ "<|2.92|>": 50511,
649
+ "<|2.94|>": 50512,
650
+ "<|2.96|>": 50513,
651
+ "<|2.98|>": 50514,
652
+ "<|20.00|>": 51365,
653
+ "<|20.02|>": 51366,
654
+ "<|20.04|>": 51367,
655
+ "<|20.06|>": 51368,
656
+ "<|20.08|>": 51369,
657
+ "<|20.10|>": 51370,
658
+ "<|20.12|>": 51371,
659
+ "<|20.14|>": 51372,
660
+ "<|20.16|>": 51373,
661
+ "<|20.18|>": 51374,
662
+ "<|20.20|>": 51375,
663
+ "<|20.22|>": 51376,
664
+ "<|20.24|>": 51377,
665
+ "<|20.26|>": 51378,
666
+ "<|20.28|>": 51379,
667
+ "<|20.30|>": 51380,
668
+ "<|20.32|>": 51381,
669
+ "<|20.34|>": 51382,
670
+ "<|20.36|>": 51383,
671
+ "<|20.38|>": 51384,
672
+ "<|20.40|>": 51385,
673
+ "<|20.42|>": 51386,
674
+ "<|20.44|>": 51387,
675
+ "<|20.46|>": 51388,
676
+ "<|20.48|>": 51389,
677
+ "<|20.50|>": 51390,
678
+ "<|20.52|>": 51391,
679
+ "<|20.54|>": 51392,
680
+ "<|20.56|>": 51393,
681
+ "<|20.58|>": 51394,
682
+ "<|20.60|>": 51395,
683
+ "<|20.62|>": 51396,
684
+ "<|20.64|>": 51397,
685
+ "<|20.66|>": 51398,
686
+ "<|20.68|>": 51399,
687
+ "<|20.70|>": 51400,
688
+ "<|20.72|>": 51401,
689
+ "<|20.74|>": 51402,
690
+ "<|20.76|>": 51403,
691
+ "<|20.78|>": 51404,
692
+ "<|20.80|>": 51405,
693
+ "<|20.82|>": 51406,
694
+ "<|20.84|>": 51407,
695
+ "<|20.86|>": 51408,
696
+ "<|20.88|>": 51409,
697
+ "<|20.90|>": 51410,
698
+ "<|20.92|>": 51411,
699
+ "<|20.94|>": 51412,
700
+ "<|20.96|>": 51413,
701
+ "<|20.98|>": 51414,
702
+ "<|21.00|>": 51415,
703
+ "<|21.02|>": 51416,
704
+ "<|21.04|>": 51417,
705
+ "<|21.06|>": 51418,
706
+ "<|21.08|>": 51419,
707
+ "<|21.10|>": 51420,
708
+ "<|21.12|>": 51421,
709
+ "<|21.14|>": 51422,
710
+ "<|21.16|>": 51423,
711
+ "<|21.18|>": 51424,
712
+ "<|21.20|>": 51425,
713
+ "<|21.22|>": 51426,
714
+ "<|21.24|>": 51427,
715
+ "<|21.26|>": 51428,
716
+ "<|21.28|>": 51429,
717
+ "<|21.30|>": 51430,
718
+ "<|21.32|>": 51431,
719
+ "<|21.34|>": 51432,
720
+ "<|21.36|>": 51433,
721
+ "<|21.38|>": 51434,
722
+ "<|21.40|>": 51435,
723
+ "<|21.42|>": 51436,
724
+ "<|21.44|>": 51437,
725
+ "<|21.46|>": 51438,
726
+ "<|21.48|>": 51439,
727
+ "<|21.50|>": 51440,
728
+ "<|21.52|>": 51441,
729
+ "<|21.54|>": 51442,
730
+ "<|21.56|>": 51443,
731
+ "<|21.58|>": 51444,
732
+ "<|21.60|>": 51445,
733
+ "<|21.62|>": 51446,
734
+ "<|21.64|>": 51447,
735
+ "<|21.66|>": 51448,
736
+ "<|21.68|>": 51449,
737
+ "<|21.70|>": 51450,
738
+ "<|21.72|>": 51451,
739
+ "<|21.74|>": 51452,
740
+ "<|21.76|>": 51453,
741
+ "<|21.78|>": 51454,
742
+ "<|21.80|>": 51455,
743
+ "<|21.82|>": 51456,
744
+ "<|21.84|>": 51457,
745
+ "<|21.86|>": 51458,
746
+ "<|21.88|>": 51459,
747
+ "<|21.90|>": 51460,
748
+ "<|21.92|>": 51461,
749
+ "<|21.94|>": 51462,
750
+ "<|21.96|>": 51463,
751
+ "<|21.98|>": 51464,
752
+ "<|22.00|>": 51465,
753
+ "<|22.02|>": 51466,
754
+ "<|22.04|>": 51467,
755
+ "<|22.06|>": 51468,
756
+ "<|22.08|>": 51469,
757
+ "<|22.10|>": 51470,
758
+ "<|22.12|>": 51471,
759
+ "<|22.14|>": 51472,
760
+ "<|22.16|>": 51473,
761
+ "<|22.18|>": 51474,
762
+ "<|22.20|>": 51475,
763
+ "<|22.22|>": 51476,
764
+ "<|22.24|>": 51477,
765
+ "<|22.26|>": 51478,
766
+ "<|22.28|>": 51479,
767
+ "<|22.30|>": 51480,
768
+ "<|22.32|>": 51481,
769
+ "<|22.34|>": 51482,
770
+ "<|22.36|>": 51483,
771
+ "<|22.38|>": 51484,
772
+ "<|22.40|>": 51485,
773
+ "<|22.42|>": 51486,
774
+ "<|22.44|>": 51487,
775
+ "<|22.46|>": 51488,
776
+ "<|22.48|>": 51489,
777
+ "<|22.50|>": 51490,
778
+ "<|22.52|>": 51491,
779
+ "<|22.54|>": 51492,
780
+ "<|22.56|>": 51493,
781
+ "<|22.58|>": 51494,
782
+ "<|22.60|>": 51495,
783
+ "<|22.62|>": 51496,
784
+ "<|22.64|>": 51497,
785
+ "<|22.66|>": 51498,
786
+ "<|22.68|>": 51499,
787
+ "<|22.70|>": 51500,
788
+ "<|22.72|>": 51501,
789
+ "<|22.74|>": 51502,
790
+ "<|22.76|>": 51503,
791
+ "<|22.78|>": 51504,
792
+ "<|22.80|>": 51505,
793
+ "<|22.82|>": 51506,
794
+ "<|22.84|>": 51507,
795
+ "<|22.86|>": 51508,
796
+ "<|22.88|>": 51509,
797
+ "<|22.90|>": 51510,
798
+ "<|22.92|>": 51511,
799
+ "<|22.94|>": 51512,
800
+ "<|22.96|>": 51513,
801
+ "<|22.98|>": 51514,
802
+ "<|23.00|>": 51515,
803
+ "<|23.02|>": 51516,
804
+ "<|23.04|>": 51517,
805
+ "<|23.06|>": 51518,
806
+ "<|23.08|>": 51519,
807
+ "<|23.10|>": 51520,
808
+ "<|23.12|>": 51521,
809
+ "<|23.14|>": 51522,
810
+ "<|23.16|>": 51523,
811
+ "<|23.18|>": 51524,
812
+ "<|23.20|>": 51525,
813
+ "<|23.22|>": 51526,
814
+ "<|23.24|>": 51527,
815
+ "<|23.26|>": 51528,
816
+ "<|23.28|>": 51529,
817
+ "<|23.30|>": 51530,
818
+ "<|23.32|>": 51531,
819
+ "<|23.34|>": 51532,
820
+ "<|23.36|>": 51533,
821
+ "<|23.38|>": 51534,
822
+ "<|23.40|>": 51535,
823
+ "<|23.42|>": 51536,
824
+ "<|23.44|>": 51537,
825
+ "<|23.46|>": 51538,
826
+ "<|23.48|>": 51539,
827
+ "<|23.50|>": 51540,
828
+ "<|23.52|>": 51541,
829
+ "<|23.54|>": 51542,
830
+ "<|23.56|>": 51543,
831
+ "<|23.58|>": 51544,
832
+ "<|23.60|>": 51545,
833
+ "<|23.62|>": 51546,
834
+ "<|23.64|>": 51547,
835
+ "<|23.66|>": 51548,
836
+ "<|23.68|>": 51549,
837
+ "<|23.70|>": 51550,
838
+ "<|23.72|>": 51551,
839
+ "<|23.74|>": 51552,
840
+ "<|23.76|>": 51553,
841
+ "<|23.78|>": 51554,
842
+ "<|23.80|>": 51555,
843
+ "<|23.82|>": 51556,
844
+ "<|23.84|>": 51557,
845
+ "<|23.86|>": 51558,
846
+ "<|23.88|>": 51559,
847
+ "<|23.90|>": 51560,
848
+ "<|23.92|>": 51561,
849
+ "<|23.94|>": 51562,
850
+ "<|23.96|>": 51563,
851
+ "<|23.98|>": 51564,
852
+ "<|24.00|>": 51565,
853
+ "<|24.02|>": 51566,
854
+ "<|24.04|>": 51567,
855
+ "<|24.06|>": 51568,
856
+ "<|24.08|>": 51569,
857
+ "<|24.10|>": 51570,
858
+ "<|24.12|>": 51571,
859
+ "<|24.14|>": 51572,
860
+ "<|24.16|>": 51573,
861
+ "<|24.18|>": 51574,
862
+ "<|24.20|>": 51575,
863
+ "<|24.22|>": 51576,
864
+ "<|24.24|>": 51577,
865
+ "<|24.26|>": 51578,
866
+ "<|24.28|>": 51579,
867
+ "<|24.30|>": 51580,
868
+ "<|24.32|>": 51581,
869
+ "<|24.34|>": 51582,
870
+ "<|24.36|>": 51583,
871
+ "<|24.38|>": 51584,
872
+ "<|24.40|>": 51585,
873
+ "<|24.42|>": 51586,
874
+ "<|24.44|>": 51587,
875
+ "<|24.46|>": 51588,
876
+ "<|24.48|>": 51589,
877
+ "<|24.50|>": 51590,
878
+ "<|24.52|>": 51591,
879
+ "<|24.54|>": 51592,
880
+ "<|24.56|>": 51593,
881
+ "<|24.58|>": 51594,
882
+ "<|24.60|>": 51595,
883
+ "<|24.62|>": 51596,
884
+ "<|24.64|>": 51597,
885
+ "<|24.66|>": 51598,
886
+ "<|24.68|>": 51599,
887
+ "<|24.70|>": 51600,
888
+ "<|24.72|>": 51601,
889
+ "<|24.74|>": 51602,
890
+ "<|24.76|>": 51603,
891
+ "<|24.78|>": 51604,
892
+ "<|24.80|>": 51605,
893
+ "<|24.82|>": 51606,
894
+ "<|24.84|>": 51607,
895
+ "<|24.86|>": 51608,
896
+ "<|24.88|>": 51609,
897
+ "<|24.90|>": 51610,
898
+ "<|24.92|>": 51611,
899
+ "<|24.94|>": 51612,
900
+ "<|24.96|>": 51613,
901
+ "<|24.98|>": 51614,
902
+ "<|25.00|>": 51615,
903
+ "<|25.02|>": 51616,
904
+ "<|25.04|>": 51617,
905
+ "<|25.06|>": 51618,
906
+ "<|25.08|>": 51619,
907
+ "<|25.10|>": 51620,
908
+ "<|25.12|>": 51621,
909
+ "<|25.14|>": 51622,
910
+ "<|25.16|>": 51623,
911
+ "<|25.18|>": 51624,
912
+ "<|25.20|>": 51625,
913
+ "<|25.22|>": 51626,
914
+ "<|25.24|>": 51627,
915
+ "<|25.26|>": 51628,
916
+ "<|25.28|>": 51629,
917
+ "<|25.30|>": 51630,
918
+ "<|25.32|>": 51631,
919
+ "<|25.34|>": 51632,
920
+ "<|25.36|>": 51633,
921
+ "<|25.38|>": 51634,
922
+ "<|25.40|>": 51635,
923
+ "<|25.42|>": 51636,
924
+ "<|25.44|>": 51637,
925
+ "<|25.46|>": 51638,
926
+ "<|25.48|>": 51639,
927
+ "<|25.50|>": 51640,
928
+ "<|25.52|>": 51641,
929
+ "<|25.54|>": 51642,
930
+ "<|25.56|>": 51643,
931
+ "<|25.58|>": 51644,
932
+ "<|25.60|>": 51645,
933
+ "<|25.62|>": 51646,
934
+ "<|25.64|>": 51647,
935
+ "<|25.66|>": 51648,
936
+ "<|25.68|>": 51649,
937
+ "<|25.70|>": 51650,
938
+ "<|25.72|>": 51651,
939
+ "<|25.74|>": 51652,
940
+ "<|25.76|>": 51653,
941
+ "<|25.78|>": 51654,
942
+ "<|25.80|>": 51655,
943
+ "<|25.82|>": 51656,
944
+ "<|25.84|>": 51657,
945
+ "<|25.86|>": 51658,
946
+ "<|25.88|>": 51659,
947
+ "<|25.90|>": 51660,
948
+ "<|25.92|>": 51661,
949
+ "<|25.94|>": 51662,
950
+ "<|25.96|>": 51663,
951
+ "<|25.98|>": 51664,
952
+ "<|26.00|>": 51665,
953
+ "<|26.02|>": 51666,
954
+ "<|26.04|>": 51667,
955
+ "<|26.06|>": 51668,
956
+ "<|26.08|>": 51669,
957
+ "<|26.10|>": 51670,
958
+ "<|26.12|>": 51671,
959
+ "<|26.14|>": 51672,
960
+ "<|26.16|>": 51673,
961
+ "<|26.18|>": 51674,
962
+ "<|26.20|>": 51675,
963
+ "<|26.22|>": 51676,
964
+ "<|26.24|>": 51677,
965
+ "<|26.26|>": 51678,
966
+ "<|26.28|>": 51679,
967
+ "<|26.30|>": 51680,
968
+ "<|26.32|>": 51681,
969
+ "<|26.34|>": 51682,
970
+ "<|26.36|>": 51683,
971
+ "<|26.38|>": 51684,
972
+ "<|26.40|>": 51685,
973
+ "<|26.42|>": 51686,
974
+ "<|26.44|>": 51687,
975
+ "<|26.46|>": 51688,
976
+ "<|26.48|>": 51689,
977
+ "<|26.50|>": 51690,
978
+ "<|26.52|>": 51691,
979
+ "<|26.54|>": 51692,
980
+ "<|26.56|>": 51693,
981
+ "<|26.58|>": 51694,
982
+ "<|26.60|>": 51695,
983
+ "<|26.62|>": 51696,
984
+ "<|26.64|>": 51697,
985
+ "<|26.66|>": 51698,
986
+ "<|26.68|>": 51699,
987
+ "<|26.70|>": 51700,
988
+ "<|26.72|>": 51701,
989
+ "<|26.74|>": 51702,
990
+ "<|26.76|>": 51703,
991
+ "<|26.78|>": 51704,
992
+ "<|26.80|>": 51705,
993
+ "<|26.82|>": 51706,
994
+ "<|26.84|>": 51707,
995
+ "<|26.86|>": 51708,
996
+ "<|26.88|>": 51709,
997
+ "<|26.90|>": 51710,
998
+ "<|26.92|>": 51711,
999
+ "<|26.94|>": 51712,
1000
+ "<|26.96|>": 51713,
1001
+ "<|26.98|>": 51714,
1002
+ "<|27.00|>": 51715,
1003
+ "<|27.02|>": 51716,
1004
+ "<|27.04|>": 51717,
1005
+ "<|27.06|>": 51718,
1006
+ "<|27.08|>": 51719,
1007
+ "<|27.10|>": 51720,
1008
+ "<|27.12|>": 51721,
1009
+ "<|27.14|>": 51722,
1010
+ "<|27.16|>": 51723,
1011
+ "<|27.18|>": 51724,
1012
+ "<|27.20|>": 51725,
1013
+ "<|27.22|>": 51726,
1014
+ "<|27.24|>": 51727,
1015
+ "<|27.26|>": 51728,
1016
+ "<|27.28|>": 51729,
1017
+ "<|27.30|>": 51730,
1018
+ "<|27.32|>": 51731,
1019
+ "<|27.34|>": 51732,
1020
+ "<|27.36|>": 51733,
1021
+ "<|27.38|>": 51734,
1022
+ "<|27.40|>": 51735,
1023
+ "<|27.42|>": 51736,
1024
+ "<|27.44|>": 51737,
1025
+ "<|27.46|>": 51738,
1026
+ "<|27.48|>": 51739,
1027
+ "<|27.50|>": 51740,
1028
+ "<|27.52|>": 51741,
1029
+ "<|27.54|>": 51742,
1030
+ "<|27.56|>": 51743,
1031
+ "<|27.58|>": 51744,
1032
+ "<|27.60|>": 51745,
1033
+ "<|27.62|>": 51746,
1034
+ "<|27.64|>": 51747,
1035
+ "<|27.66|>": 51748,
1036
+ "<|27.68|>": 51749,
1037
+ "<|27.70|>": 51750,
1038
+ "<|27.72|>": 51751,
1039
+ "<|27.74|>": 51752,
1040
+ "<|27.76|>": 51753,
1041
+ "<|27.78|>": 51754,
1042
+ "<|27.80|>": 51755,
1043
+ "<|27.82|>": 51756,
1044
+ "<|27.84|>": 51757,
1045
+ "<|27.86|>": 51758,
1046
+ "<|27.88|>": 51759,
1047
+ "<|27.90|>": 51760,
1048
+ "<|27.92|>": 51761,
1049
+ "<|27.94|>": 51762,
1050
+ "<|27.96|>": 51763,
1051
+ "<|27.98|>": 51764,
1052
+ "<|28.00|>": 51765,
1053
+ "<|28.02|>": 51766,
1054
+ "<|28.04|>": 51767,
1055
+ "<|28.06|>": 51768,
1056
+ "<|28.08|>": 51769,
1057
+ "<|28.10|>": 51770,
1058
+ "<|28.12|>": 51771,
1059
+ "<|28.14|>": 51772,
1060
+ "<|28.16|>": 51773,
1061
+ "<|28.18|>": 51774,
1062
+ "<|28.20|>": 51775,
1063
+ "<|28.22|>": 51776,
1064
+ "<|28.24|>": 51777,
1065
+ "<|28.26|>": 51778,
1066
+ "<|28.28|>": 51779,
1067
+ "<|28.30|>": 51780,
1068
+ "<|28.32|>": 51781,
1069
+ "<|28.34|>": 51782,
1070
+ "<|28.36|>": 51783,
1071
+ "<|28.38|>": 51784,
1072
+ "<|28.40|>": 51785,
1073
+ "<|28.42|>": 51786,
1074
+ "<|28.44|>": 51787,
1075
+ "<|28.46|>": 51788,
1076
+ "<|28.48|>": 51789,
1077
+ "<|28.50|>": 51790,
1078
+ "<|28.52|>": 51791,
1079
+ "<|28.54|>": 51792,
1080
+ "<|28.56|>": 51793,
1081
+ "<|28.58|>": 51794,
1082
+ "<|28.60|>": 51795,
1083
+ "<|28.62|>": 51796,
1084
+ "<|28.64|>": 51797,
1085
+ "<|28.66|>": 51798,
1086
+ "<|28.68|>": 51799,
1087
+ "<|28.70|>": 51800,
1088
+ "<|28.72|>": 51801,
1089
+ "<|28.74|>": 51802,
1090
+ "<|28.76|>": 51803,
1091
+ "<|28.78|>": 51804,
1092
+ "<|28.80|>": 51805,
1093
+ "<|28.82|>": 51806,
1094
+ "<|28.84|>": 51807,
1095
+ "<|28.86|>": 51808,
1096
+ "<|28.88|>": 51809,
1097
+ "<|28.90|>": 51810,
1098
+ "<|28.92|>": 51811,
1099
+ "<|28.94|>": 51812,
1100
+ "<|28.96|>": 51813,
1101
+ "<|28.98|>": 51814,
1102
+ "<|29.00|>": 51815,
1103
+ "<|29.02|>": 51816,
1104
+ "<|29.04|>": 51817,
1105
+ "<|29.06|>": 51818,
1106
+ "<|29.08|>": 51819,
1107
+ "<|29.10|>": 51820,
1108
+ "<|29.12|>": 51821,
1109
+ "<|29.14|>": 51822,
1110
+ "<|29.16|>": 51823,
1111
+ "<|29.18|>": 51824,
1112
+ "<|29.20|>": 51825,
1113
+ "<|29.22|>": 51826,
1114
+ "<|29.24|>": 51827,
1115
+ "<|29.26|>": 51828,
1116
+ "<|29.28|>": 51829,
1117
+ "<|29.30|>": 51830,
1118
+ "<|29.32|>": 51831,
1119
+ "<|29.34|>": 51832,
1120
+ "<|29.36|>": 51833,
1121
+ "<|29.38|>": 51834,
1122
+ "<|29.40|>": 51835,
1123
+ "<|29.42|>": 51836,
1124
+ "<|29.44|>": 51837,
1125
+ "<|29.46|>": 51838,
1126
+ "<|29.48|>": 51839,
1127
+ "<|29.50|>": 51840,
1128
+ "<|29.52|>": 51841,
1129
+ "<|29.54|>": 51842,
1130
+ "<|29.56|>": 51843,
1131
+ "<|29.58|>": 51844,
1132
+ "<|29.60|>": 51845,
1133
+ "<|29.62|>": 51846,
1134
+ "<|29.64|>": 51847,
1135
+ "<|29.66|>": 51848,
1136
+ "<|29.68|>": 51849,
1137
+ "<|29.70|>": 51850,
1138
+ "<|29.72|>": 51851,
1139
+ "<|29.74|>": 51852,
1140
+ "<|29.76|>": 51853,
1141
+ "<|29.78|>": 51854,
1142
+ "<|29.80|>": 51855,
1143
+ "<|29.82|>": 51856,
1144
+ "<|29.84|>": 51857,
1145
+ "<|29.86|>": 51858,
1146
+ "<|29.88|>": 51859,
1147
+ "<|29.90|>": 51860,
1148
+ "<|29.92|>": 51861,
1149
+ "<|29.94|>": 51862,
1150
+ "<|29.96|>": 51863,
1151
+ "<|29.98|>": 51864,
1152
+ "<|3.00|>": 50515,
1153
+ "<|3.02|>": 50516,
1154
+ "<|3.04|>": 50517,
1155
+ "<|3.06|>": 50518,
1156
+ "<|3.08|>": 50519,
1157
+ "<|3.10|>": 50520,
1158
+ "<|3.12|>": 50521,
1159
+ "<|3.14|>": 50522,
1160
+ "<|3.16|>": 50523,
1161
+ "<|3.18|>": 50524,
1162
+ "<|3.20|>": 50525,
1163
+ "<|3.22|>": 50526,
1164
+ "<|3.24|>": 50527,
1165
+ "<|3.26|>": 50528,
1166
+ "<|3.28|>": 50529,
1167
+ "<|3.30|>": 50530,
1168
+ "<|3.32|>": 50531,
1169
+ "<|3.34|>": 50532,
1170
+ "<|3.36|>": 50533,
1171
+ "<|3.38|>": 50534,
1172
+ "<|3.40|>": 50535,
1173
+ "<|3.42|>": 50536,
1174
+ "<|3.44|>": 50537,
1175
+ "<|3.46|>": 50538,
1176
+ "<|3.48|>": 50539,
1177
+ "<|3.50|>": 50540,
1178
+ "<|3.52|>": 50541,
1179
+ "<|3.54|>": 50542,
1180
+ "<|3.56|>": 50543,
1181
+ "<|3.58|>": 50544,
1182
+ "<|3.60|>": 50545,
1183
+ "<|3.62|>": 50546,
1184
+ "<|3.64|>": 50547,
1185
+ "<|3.66|>": 50548,
1186
+ "<|3.68|>": 50549,
1187
+ "<|3.70|>": 50550,
1188
+ "<|3.72|>": 50551,
1189
+ "<|3.74|>": 50552,
1190
+ "<|3.76|>": 50553,
1191
+ "<|3.78|>": 50554,
1192
+ "<|3.80|>": 50555,
1193
+ "<|3.82|>": 50556,
1194
+ "<|3.84|>": 50557,
1195
+ "<|3.86|>": 50558,
1196
+ "<|3.88|>": 50559,
1197
+ "<|3.90|>": 50560,
1198
+ "<|3.92|>": 50561,
1199
+ "<|3.94|>": 50562,
1200
+ "<|3.96|>": 50563,
1201
+ "<|3.98|>": 50564,
1202
+ "<|30.00|>": 51865,
1203
+ "<|4.00|>": 50565,
1204
+ "<|4.02|>": 50566,
1205
+ "<|4.04|>": 50567,
1206
+ "<|4.06|>": 50568,
1207
+ "<|4.08|>": 50569,
1208
+ "<|4.10|>": 50570,
1209
+ "<|4.12|>": 50571,
1210
+ "<|4.14|>": 50572,
1211
+ "<|4.16|>": 50573,
1212
+ "<|4.18|>": 50574,
1213
+ "<|4.20|>": 50575,
1214
+ "<|4.22|>": 50576,
1215
+ "<|4.24|>": 50577,
1216
+ "<|4.26|>": 50578,
1217
+ "<|4.28|>": 50579,
1218
+ "<|4.30|>": 50580,
1219
+ "<|4.32|>": 50581,
1220
+ "<|4.34|>": 50582,
1221
+ "<|4.36|>": 50583,
1222
+ "<|4.38|>": 50584,
1223
+ "<|4.40|>": 50585,
1224
+ "<|4.42|>": 50586,
1225
+ "<|4.44|>": 50587,
1226
+ "<|4.46|>": 50588,
1227
+ "<|4.48|>": 50589,
1228
+ "<|4.50|>": 50590,
1229
+ "<|4.52|>": 50591,
1230
+ "<|4.54|>": 50592,
1231
+ "<|4.56|>": 50593,
1232
+ "<|4.58|>": 50594,
1233
+ "<|4.60|>": 50595,
1234
+ "<|4.62|>": 50596,
1235
+ "<|4.64|>": 50597,
1236
+ "<|4.66|>": 50598,
1237
+ "<|4.68|>": 50599,
1238
+ "<|4.70|>": 50600,
1239
+ "<|4.72|>": 50601,
1240
+ "<|4.74|>": 50602,
1241
+ "<|4.76|>": 50603,
1242
+ "<|4.78|>": 50604,
1243
+ "<|4.80|>": 50605,
1244
+ "<|4.82|>": 50606,
1245
+ "<|4.84|>": 50607,
1246
+ "<|4.86|>": 50608,
1247
+ "<|4.88|>": 50609,
1248
+ "<|4.90|>": 50610,
1249
+ "<|4.92|>": 50611,
1250
+ "<|4.94|>": 50612,
1251
+ "<|4.96|>": 50613,
1252
+ "<|4.98|>": 50614,
1253
+ "<|5.00|>": 50615,
1254
+ "<|5.02|>": 50616,
1255
+ "<|5.04|>": 50617,
1256
+ "<|5.06|>": 50618,
1257
+ "<|5.08|>": 50619,
1258
+ "<|5.10|>": 50620,
1259
+ "<|5.12|>": 50621,
1260
+ "<|5.14|>": 50622,
1261
+ "<|5.16|>": 50623,
1262
+ "<|5.18|>": 50624,
1263
+ "<|5.20|>": 50625,
1264
+ "<|5.22|>": 50626,
1265
+ "<|5.24|>": 50627,
1266
+ "<|5.26|>": 50628,
1267
+ "<|5.28|>": 50629,
1268
+ "<|5.30|>": 50630,
1269
+ "<|5.32|>": 50631,
1270
+ "<|5.34|>": 50632,
1271
+ "<|5.36|>": 50633,
1272
+ "<|5.38|>": 50634,
1273
+ "<|5.40|>": 50635,
1274
+ "<|5.42|>": 50636,
1275
+ "<|5.44|>": 50637,
1276
+ "<|5.46|>": 50638,
1277
+ "<|5.48|>": 50639,
1278
+ "<|5.50|>": 50640,
1279
+ "<|5.52|>": 50641,
1280
+ "<|5.54|>": 50642,
1281
+ "<|5.56|>": 50643,
1282
+ "<|5.58|>": 50644,
1283
+ "<|5.60|>": 50645,
1284
+ "<|5.62|>": 50646,
1285
+ "<|5.64|>": 50647,
1286
+ "<|5.66|>": 50648,
1287
+ "<|5.68|>": 50649,
1288
+ "<|5.70|>": 50650,
1289
+ "<|5.72|>": 50651,
1290
+ "<|5.74|>": 50652,
1291
+ "<|5.76|>": 50653,
1292
+ "<|5.78|>": 50654,
1293
+ "<|5.80|>": 50655,
1294
+ "<|5.82|>": 50656,
1295
+ "<|5.84|>": 50657,
1296
+ "<|5.86|>": 50658,
1297
+ "<|5.88|>": 50659,
1298
+ "<|5.90|>": 50660,
1299
+ "<|5.92|>": 50661,
1300
+ "<|5.94|>": 50662,
1301
+ "<|5.96|>": 50663,
1302
+ "<|5.98|>": 50664,
1303
+ "<|6.00|>": 50665,
1304
+ "<|6.02|>": 50666,
1305
+ "<|6.04|>": 50667,
1306
+ "<|6.06|>": 50668,
1307
+ "<|6.08|>": 50669,
1308
+ "<|6.10|>": 50670,
1309
+ "<|6.12|>": 50671,
1310
+ "<|6.14|>": 50672,
1311
+ "<|6.16|>": 50673,
1312
+ "<|6.18|>": 50674,
1313
+ "<|6.20|>": 50675,
1314
+ "<|6.22|>": 50676,
1315
+ "<|6.24|>": 50677,
1316
+ "<|6.26|>": 50678,
1317
+ "<|6.28|>": 50679,
1318
+ "<|6.30|>": 50680,
1319
+ "<|6.32|>": 50681,
1320
+ "<|6.34|>": 50682,
1321
+ "<|6.36|>": 50683,
1322
+ "<|6.38|>": 50684,
1323
+ "<|6.40|>": 50685,
1324
+ "<|6.42|>": 50686,
1325
+ "<|6.44|>": 50687,
1326
+ "<|6.46|>": 50688,
1327
+ "<|6.48|>": 50689,
1328
+ "<|6.50|>": 50690,
1329
+ "<|6.52|>": 50691,
1330
+ "<|6.54|>": 50692,
1331
+ "<|6.56|>": 50693,
1332
+ "<|6.58|>": 50694,
1333
+ "<|6.60|>": 50695,
1334
+ "<|6.62|>": 50696,
1335
+ "<|6.64|>": 50697,
1336
+ "<|6.66|>": 50698,
1337
+ "<|6.68|>": 50699,
1338
+ "<|6.70|>": 50700,
1339
+ "<|6.72|>": 50701,
1340
+ "<|6.74|>": 50702,
1341
+ "<|6.76|>": 50703,
1342
+ "<|6.78|>": 50704,
1343
+ "<|6.80|>": 50705,
1344
+ "<|6.82|>": 50706,
1345
+ "<|6.84|>": 50707,
1346
+ "<|6.86|>": 50708,
1347
+ "<|6.88|>": 50709,
1348
+ "<|6.90|>": 50710,
1349
+ "<|6.92|>": 50711,
1350
+ "<|6.94|>": 50712,
1351
+ "<|6.96|>": 50713,
1352
+ "<|6.98|>": 50714,
1353
+ "<|7.00|>": 50715,
1354
+ "<|7.02|>": 50716,
1355
+ "<|7.04|>": 50717,
1356
+ "<|7.06|>": 50718,
1357
+ "<|7.08|>": 50719,
1358
+ "<|7.10|>": 50720,
1359
+ "<|7.12|>": 50721,
1360
+ "<|7.14|>": 50722,
1361
+ "<|7.16|>": 50723,
1362
+ "<|7.18|>": 50724,
1363
+ "<|7.20|>": 50725,
1364
+ "<|7.22|>": 50726,
1365
+ "<|7.24|>": 50727,
1366
+ "<|7.26|>": 50728,
1367
+ "<|7.28|>": 50729,
1368
+ "<|7.30|>": 50730,
1369
+ "<|7.32|>": 50731,
1370
+ "<|7.34|>": 50732,
1371
+ "<|7.36|>": 50733,
1372
+ "<|7.38|>": 50734,
1373
+ "<|7.40|>": 50735,
1374
+ "<|7.42|>": 50736,
1375
+ "<|7.44|>": 50737,
1376
+ "<|7.46|>": 50738,
1377
+ "<|7.48|>": 50739,
1378
+ "<|7.50|>": 50740,
1379
+ "<|7.52|>": 50741,
1380
+ "<|7.54|>": 50742,
1381
+ "<|7.56|>": 50743,
1382
+ "<|7.58|>": 50744,
1383
+ "<|7.60|>": 50745,
1384
+ "<|7.62|>": 50746,
1385
+ "<|7.64|>": 50747,
1386
+ "<|7.66|>": 50748,
1387
+ "<|7.68|>": 50749,
1388
+ "<|7.70|>": 50750,
1389
+ "<|7.72|>": 50751,
1390
+ "<|7.74|>": 50752,
1391
+ "<|7.76|>": 50753,
1392
+ "<|7.78|>": 50754,
1393
+ "<|7.80|>": 50755,
1394
+ "<|7.82|>": 50756,
1395
+ "<|7.84|>": 50757,
1396
+ "<|7.86|>": 50758,
1397
+ "<|7.88|>": 50759,
1398
+ "<|7.90|>": 50760,
1399
+ "<|7.92|>": 50761,
1400
+ "<|7.94|>": 50762,
1401
+ "<|7.96|>": 50763,
1402
+ "<|7.98|>": 50764,
1403
+ "<|8.00|>": 50765,
1404
+ "<|8.02|>": 50766,
1405
+ "<|8.04|>": 50767,
1406
+ "<|8.06|>": 50768,
1407
+ "<|8.08|>": 50769,
1408
+ "<|8.10|>": 50770,
1409
+ "<|8.12|>": 50771,
1410
+ "<|8.14|>": 50772,
1411
+ "<|8.16|>": 50773,
1412
+ "<|8.18|>": 50774,
1413
+ "<|8.20|>": 50775,
1414
+ "<|8.22|>": 50776,
1415
+ "<|8.24|>": 50777,
1416
+ "<|8.26|>": 50778,
1417
+ "<|8.28|>": 50779,
1418
+ "<|8.30|>": 50780,
1419
+ "<|8.32|>": 50781,
1420
+ "<|8.34|>": 50782,
1421
+ "<|8.36|>": 50783,
1422
+ "<|8.38|>": 50784,
1423
+ "<|8.40|>": 50785,
1424
+ "<|8.42|>": 50786,
1425
+ "<|8.44|>": 50787,
1426
+ "<|8.46|>": 50788,
1427
+ "<|8.48|>": 50789,
1428
+ "<|8.50|>": 50790,
1429
+ "<|8.52|>": 50791,
1430
+ "<|8.54|>": 50792,
1431
+ "<|8.56|>": 50793,
1432
+ "<|8.58|>": 50794,
1433
+ "<|8.60|>": 50795,
1434
+ "<|8.62|>": 50796,
1435
+ "<|8.64|>": 50797,
1436
+ "<|8.66|>": 50798,
1437
+ "<|8.68|>": 50799,
1438
+ "<|8.70|>": 50800,
1439
+ "<|8.72|>": 50801,
1440
+ "<|8.74|>": 50802,
1441
+ "<|8.76|>": 50803,
1442
+ "<|8.78|>": 50804,
1443
+ "<|8.80|>": 50805,
1444
+ "<|8.82|>": 50806,
1445
+ "<|8.84|>": 50807,
1446
+ "<|8.86|>": 50808,
1447
+ "<|8.88|>": 50809,
1448
+ "<|8.90|>": 50810,
1449
+ "<|8.92|>": 50811,
1450
+ "<|8.94|>": 50812,
1451
+ "<|8.96|>": 50813,
1452
+ "<|8.98|>": 50814,
1453
+ "<|9.00|>": 50815,
1454
+ "<|9.02|>": 50816,
1455
+ "<|9.04|>": 50817,
1456
+ "<|9.06|>": 50818,
1457
+ "<|9.08|>": 50819,
1458
+ "<|9.10|>": 50820,
1459
+ "<|9.12|>": 50821,
1460
+ "<|9.14|>": 50822,
1461
+ "<|9.16|>": 50823,
1462
+ "<|9.18|>": 50824,
1463
+ "<|9.20|>": 50825,
1464
+ "<|9.22|>": 50826,
1465
+ "<|9.24|>": 50827,
1466
+ "<|9.26|>": 50828,
1467
+ "<|9.28|>": 50829,
1468
+ "<|9.30|>": 50830,
1469
+ "<|9.32|>": 50831,
1470
+ "<|9.34|>": 50832,
1471
+ "<|9.36|>": 50833,
1472
+ "<|9.38|>": 50834,
1473
+ "<|9.40|>": 50835,
1474
+ "<|9.42|>": 50836,
1475
+ "<|9.44|>": 50837,
1476
+ "<|9.46|>": 50838,
1477
+ "<|9.48|>": 50839,
1478
+ "<|9.50|>": 50840,
1479
+ "<|9.52|>": 50841,
1480
+ "<|9.54|>": 50842,
1481
+ "<|9.56|>": 50843,
1482
+ "<|9.58|>": 50844,
1483
+ "<|9.60|>": 50845,
1484
+ "<|9.62|>": 50846,
1485
+ "<|9.64|>": 50847,
1486
+ "<|9.66|>": 50848,
1487
+ "<|9.68|>": 50849,
1488
+ "<|9.70|>": 50850,
1489
+ "<|9.72|>": 50851,
1490
+ "<|9.74|>": 50852,
1491
+ "<|9.76|>": 50853,
1492
+ "<|9.78|>": 50854,
1493
+ "<|9.80|>": 50855,
1494
+ "<|9.82|>": 50856,
1495
+ "<|9.84|>": 50857,
1496
+ "<|9.86|>": 50858,
1497
+ "<|9.88|>": 50859,
1498
+ "<|9.90|>": 50860,
1499
+ "<|9.92|>": 50861,
1500
+ "<|9.94|>": 50862,
1501
+ "<|9.96|>": 50863,
1502
+ "<|9.98|>": 50864,
1503
+ "<|af|>": 50327,
1504
+ "<|am|>": 50334,
1505
+ "<|ar|>": 50272,
1506
+ "<|as|>": 50350,
1507
+ "<|az|>": 50304,
1508
+ "<|ba|>": 50355,
1509
+ "<|be|>": 50330,
1510
+ "<|bg|>": 50292,
1511
+ "<|bn|>": 50302,
1512
+ "<|bo|>": 50347,
1513
+ "<|br|>": 50309,
1514
+ "<|bs|>": 50315,
1515
+ "<|ca|>": 50270,
1516
+ "<|cs|>": 50283,
1517
+ "<|cy|>": 50297,
1518
+ "<|da|>": 50285,
1519
+ "<|de|>": 50261,
1520
+ "<|el|>": 50281,
1521
+ "<|endoftext|>": 50257,
1522
+ "<|en|>": 50259,
1523
+ "<|es|>": 50262,
1524
+ "<|et|>": 50307,
1525
+ "<|eu|>": 50310,
1526
+ "<|fa|>": 50300,
1527
+ "<|fi|>": 50277,
1528
+ "<|fo|>": 50338,
1529
+ "<|fr|>": 50265,
1530
+ "<|gl|>": 50319,
1531
+ "<|gu|>": 50333,
1532
+ "<|haw|>": 50352,
1533
+ "<|ha|>": 50354,
1534
+ "<|he|>": 50279,
1535
+ "<|hi|>": 50276,
1536
+ "<|hr|>": 50291,
1537
+ "<|ht|>": 50339,
1538
+ "<|hu|>": 50286,
1539
+ "<|hy|>": 50312,
1540
+ "<|id|>": 50275,
1541
+ "<|is|>": 50311,
1542
+ "<|it|>": 50274,
1543
+ "<|ja|>": 50266,
1544
+ "<|jw|>": 50356,
1545
+ "<|ka|>": 50329,
1546
+ "<|kk|>": 50316,
1547
+ "<|km|>": 50323,
1548
+ "<|kn|>": 50306,
1549
+ "<|ko|>": 50264,
1550
+ "<|la|>": 50294,
1551
+ "<|lb|>": 50345,
1552
+ "<|ln|>": 50353,
1553
+ "<|lo|>": 50336,
1554
+ "<|lt|>": 50293,
1555
+ "<|lv|>": 50301,
1556
+ "<|mg|>": 50349,
1557
+ "<|mi|>": 50295,
1558
+ "<|mk|>": 50308,
1559
+ "<|ml|>": 50296,
1560
+ "<|mn|>": 50314,
1561
+ "<|mr|>": 50320,
1562
+ "<|ms|>": 50282,
1563
+ "<|mt|>": 50343,
1564
+ "<|my|>": 50346,
1565
+ "<|ne|>": 50313,
1566
+ "<|nl|>": 50271,
1567
+ "<|nn|>": 50342,
1568
+ "<|nospeech|>": 50363,
1569
+ "<|notimestamps|>": 50364,
1570
+ "<|no|>": 50288,
1571
+ "<|oc|>": 50328,
1572
+ "<|pa|>": 50321,
1573
+ "<|pl|>": 50269,
1574
+ "<|ps|>": 50340,
1575
+ "<|pt|>": 50267,
1576
+ "<|ro|>": 50284,
1577
+ "<|ru|>": 50263,
1578
+ "<|sa|>": 50344,
1579
+ "<|sd|>": 50332,
1580
+ "<|si|>": 50322,
1581
+ "<|sk|>": 50298,
1582
+ "<|sl|>": 50305,
1583
+ "<|sn|>": 50324,
1584
+ "<|so|>": 50326,
1585
+ "<|sq|>": 50317,
1586
+ "<|sr|>": 50303,
1587
+ "<|startoflm|>": 50361,
1588
+ "<|startofprev|>": 50362,
1589
+ "<|startoftranscript|>": 50258,
1590
+ "<|su|>": 50357,
1591
+ "<|sv|>": 50273,
1592
+ "<|sw|>": 50318,
1593
+ "<|ta|>": 50287,
1594
+ "<|te|>": 50299,
1595
+ "<|tg|>": 50331,
1596
+ "<|th|>": 50289,
1597
+ "<|tk|>": 50341,
1598
+ "<|tl|>": 50348,
1599
+ "<|transcribe|>": 50360,
1600
+ "<|translate|>": 50359,
1601
+ "<|tr|>": 50268,
1602
+ "<|tt|>": 50351,
1603
+ "<|uk|>": 50280,
1604
+ "<|ur|>": 50290,
1605
+ "<|uz|>": 50337,
1606
+ "<|vi|>": 50278,
1607
+ "<|yi|>": 50335,
1608
+ "<|yo|>": 50325,
1609
+ "<|yue|>": 50358,
1610
+ "<|zh|>": 50260
1611
+ }
checkpoint-1000/config.json ADDED
@@ -0,0 +1,288 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "./",
3
+ "activation_dropout": 0.0,
4
+ "activation_function": "gelu",
5
+ "alignment_heads": [
6
+ [
7
+ 7,
8
+ 0
9
+ ],
10
+ [
11
+ 10,
12
+ 17
13
+ ],
14
+ [
15
+ 12,
16
+ 18
17
+ ],
18
+ [
19
+ 13,
20
+ 12
21
+ ],
22
+ [
23
+ 16,
24
+ 1
25
+ ],
26
+ [
27
+ 17,
28
+ 14
29
+ ],
30
+ [
31
+ 19,
32
+ 11
33
+ ],
34
+ [
35
+ 21,
36
+ 4
37
+ ],
38
+ [
39
+ 24,
40
+ 1
41
+ ],
42
+ [
43
+ 25,
44
+ 6
45
+ ]
46
+ ],
47
+ "apply_spec_augment": false,
48
+ "architectures": [
49
+ "WhisperForConditionalGeneration"
50
+ ],
51
+ "attention_dropout": 0.0,
52
+ "begin_suppress_tokens": [
53
+ 220,
54
+ 50257
55
+ ],
56
+ "bos_token_id": 50257,
57
+ "classifier_proj_size": 256,
58
+ "d_model": 1280,
59
+ "decoder_attention_heads": 20,
60
+ "decoder_ffn_dim": 5120,
61
+ "decoder_layerdrop": 0,
62
+ "decoder_layers": 2,
63
+ "decoder_start_token_id": 50258,
64
+ "dropout": 0.0,
65
+ "encoder_attention_heads": 20,
66
+ "encoder_ffn_dim": 5120,
67
+ "encoder_layerdrop": 0,
68
+ "encoder_layers": 32,
69
+ "eos_token_id": 50257,
70
+ "init_std": 0.02,
71
+ "is_encoder_decoder": true,
72
+ "lang_ids": [
73
+ 50259,
74
+ 50260,
75
+ 50261,
76
+ 50262,
77
+ 50263,
78
+ 50264,
79
+ 50265,
80
+ 50266,
81
+ 50267,
82
+ 50268,
83
+ 50269,
84
+ 50270,
85
+ 50271,
86
+ 50272,
87
+ 50273,
88
+ 50274,
89
+ 50275,
90
+ 50276,
91
+ 50277,
92
+ 50278,
93
+ 50279,
94
+ 50280,
95
+ 50281,
96
+ 50282,
97
+ 50283,
98
+ 50284,
99
+ 50285,
100
+ 50286,
101
+ 50287,
102
+ 50288,
103
+ 50289,
104
+ 50290,
105
+ 50291,
106
+ 50292,
107
+ 50293,
108
+ 50294,
109
+ 50295,
110
+ 50296,
111
+ 50297,
112
+ 50298,
113
+ 50299,
114
+ 50300,
115
+ 50301,
116
+ 50302,
117
+ 50303,
118
+ 50304,
119
+ 50305,
120
+ 50306,
121
+ 50307,
122
+ 50308,
123
+ 50309,
124
+ 50310,
125
+ 50311,
126
+ 50312,
127
+ 50313,
128
+ 50314,
129
+ 50315,
130
+ 50316,
131
+ 50317,
132
+ 50318,
133
+ 50319,
134
+ 50320,
135
+ 50321,
136
+ 50322,
137
+ 50323,
138
+ 50324,
139
+ 50325,
140
+ 50326,
141
+ 50327,
142
+ 50328,
143
+ 50329,
144
+ 50330,
145
+ 50331,
146
+ 50332,
147
+ 50333,
148
+ 50334,
149
+ 50335,
150
+ 50336,
151
+ 50337,
152
+ 50338,
153
+ 50339,
154
+ 50340,
155
+ 50341,
156
+ 50342,
157
+ 50343,
158
+ 50344,
159
+ 50345,
160
+ 50346,
161
+ 50347,
162
+ 50348,
163
+ 50349,
164
+ 50350,
165
+ 50351,
166
+ 50352,
167
+ 50353,
168
+ 50354,
169
+ 50355,
170
+ 50356,
171
+ 50357,
172
+ 50358
173
+ ],
174
+ "mask_feature_length": 10,
175
+ "mask_feature_min_masks": 0,
176
+ "mask_feature_prob": 0,
177
+ "mask_time_length": 10,
178
+ "mask_time_min_masks": 2,
179
+ "mask_time_prob": 0.05,
180
+ "max_length": 448,
181
+ "max_source_positions": 1500,
182
+ "max_target_positions": 448,
183
+ "median_filter_width": 7,
184
+ "model_type": "whisper",
185
+ "num_hidden_layers": 32,
186
+ "num_mel_bins": 128,
187
+ "pad_token_id": 50256,
188
+ "scale_embedding": false,
189
+ "suppress_ids": [
190
+ 1,
191
+ 2,
192
+ 7,
193
+ 8,
194
+ 9,
195
+ 10,
196
+ 14,
197
+ 25,
198
+ 26,
199
+ 27,
200
+ 28,
201
+ 29,
202
+ 31,
203
+ 58,
204
+ 59,
205
+ 60,
206
+ 61,
207
+ 62,
208
+ 63,
209
+ 90,
210
+ 91,
211
+ 92,
212
+ 93,
213
+ 359,
214
+ 503,
215
+ 522,
216
+ 542,
217
+ 873,
218
+ 893,
219
+ 902,
220
+ 918,
221
+ 922,
222
+ 931,
223
+ 1350,
224
+ 1853,
225
+ 1982,
226
+ 2460,
227
+ 2627,
228
+ 3246,
229
+ 3253,
230
+ 3268,
231
+ 3536,
232
+ 3846,
233
+ 3961,
234
+ 4183,
235
+ 4667,
236
+ 6585,
237
+ 6647,
238
+ 7273,
239
+ 9061,
240
+ 9383,
241
+ 10428,
242
+ 10929,
243
+ 11938,
244
+ 12033,
245
+ 12331,
246
+ 12562,
247
+ 13793,
248
+ 14157,
249
+ 14635,
250
+ 15265,
251
+ 15618,
252
+ 16553,
253
+ 16604,
254
+ 18362,
255
+ 18956,
256
+ 20075,
257
+ 21675,
258
+ 22520,
259
+ 26130,
260
+ 26161,
261
+ 26435,
262
+ 28279,
263
+ 29464,
264
+ 31650,
265
+ 32302,
266
+ 32470,
267
+ 36865,
268
+ 42863,
269
+ 47425,
270
+ 49870,
271
+ 50254,
272
+ 50258,
273
+ 50359,
274
+ 50360,
275
+ 50361,
276
+ 50362,
277
+ 50363
278
+ ],
279
+ "suppress_ids_begin": [
280
+ 220,
281
+ 50257
282
+ ],
283
+ "torch_dtype": "float32",
284
+ "transformers_version": "4.46.2",
285
+ "use_cache": true,
286
+ "use_weighted_layer_sum": false,
287
+ "vocab_size": 51866
288
+ }
checkpoint-1000/flax_model.msgpack ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a04f4f83603f30a2ef2cda5b555cf138fde78de8b57c0d754bf8de44e7a2e76c
3
+ size 1512831199
checkpoint-1000/generation_config.json ADDED
@@ -0,0 +1,271 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alignment_heads": [
3
+ [
4
+ 7,
5
+ 0
6
+ ],
7
+ [
8
+ 10,
9
+ 17
10
+ ],
11
+ [
12
+ 12,
13
+ 18
14
+ ],
15
+ [
16
+ 13,
17
+ 12
18
+ ],
19
+ [
20
+ 16,
21
+ 1
22
+ ],
23
+ [
24
+ 17,
25
+ 14
26
+ ],
27
+ [
28
+ 19,
29
+ 11
30
+ ],
31
+ [
32
+ 21,
33
+ 4
34
+ ],
35
+ [
36
+ 24,
37
+ 1
38
+ ],
39
+ [
40
+ 25,
41
+ 6
42
+ ]
43
+ ],
44
+ "begin_suppress_tokens": [
45
+ 220,
46
+ 50257
47
+ ],
48
+ "bos_token_id": 50257,
49
+ "decoder_start_token_id": 50258,
50
+ "eos_token_id": 50257,
51
+ "forced_decoder_ids": [
52
+ [
53
+ 1,
54
+ 50259
55
+ ],
56
+ [
57
+ 2,
58
+ 50360
59
+ ],
60
+ [
61
+ 3,
62
+ 50364
63
+ ]
64
+ ],
65
+ "is_multilingual": true,
66
+ "lang_to_id": {
67
+ "<|af|>": 50327,
68
+ "<|am|>": 50334,
69
+ "<|ar|>": 50272,
70
+ "<|as|>": 50350,
71
+ "<|az|>": 50304,
72
+ "<|ba|>": 50355,
73
+ "<|be|>": 50330,
74
+ "<|bg|>": 50292,
75
+ "<|bn|>": 50302,
76
+ "<|bo|>": 50347,
77
+ "<|br|>": 50309,
78
+ "<|bs|>": 50315,
79
+ "<|ca|>": 50270,
80
+ "<|cs|>": 50283,
81
+ "<|cy|>": 50297,
82
+ "<|da|>": 50285,
83
+ "<|de|>": 50261,
84
+ "<|el|>": 50281,
85
+ "<|en|>": 50259,
86
+ "<|es|>": 50262,
87
+ "<|et|>": 50307,
88
+ "<|eu|>": 50310,
89
+ "<|fa|>": 50300,
90
+ "<|fi|>": 50277,
91
+ "<|fo|>": 50338,
92
+ "<|fr|>": 50265,
93
+ "<|gl|>": 50319,
94
+ "<|gu|>": 50333,
95
+ "<|haw|>": 50352,
96
+ "<|ha|>": 50354,
97
+ "<|he|>": 50279,
98
+ "<|hi|>": 50276,
99
+ "<|hr|>": 50291,
100
+ "<|ht|>": 50339,
101
+ "<|hu|>": 50286,
102
+ "<|hy|>": 50312,
103
+ "<|id|>": 50275,
104
+ "<|is|>": 50311,
105
+ "<|it|>": 50274,
106
+ "<|ja|>": 50266,
107
+ "<|jw|>": 50356,
108
+ "<|ka|>": 50329,
109
+ "<|kk|>": 50316,
110
+ "<|km|>": 50323,
111
+ "<|kn|>": 50306,
112
+ "<|ko|>": 50264,
113
+ "<|la|>": 50294,
114
+ "<|lb|>": 50345,
115
+ "<|ln|>": 50353,
116
+ "<|lo|>": 50336,
117
+ "<|lt|>": 50293,
118
+ "<|lv|>": 50301,
119
+ "<|mg|>": 50349,
120
+ "<|mi|>": 50295,
121
+ "<|mk|>": 50308,
122
+ "<|ml|>": 50296,
123
+ "<|mn|>": 50314,
124
+ "<|mr|>": 50320,
125
+ "<|ms|>": 50282,
126
+ "<|mt|>": 50343,
127
+ "<|my|>": 50346,
128
+ "<|ne|>": 50313,
129
+ "<|nl|>": 50271,
130
+ "<|nn|>": 50342,
131
+ "<|no|>": 50288,
132
+ "<|oc|>": 50328,
133
+ "<|pa|>": 50321,
134
+ "<|pl|>": 50269,
135
+ "<|ps|>": 50340,
136
+ "<|pt|>": 50267,
137
+ "<|ro|>": 50284,
138
+ "<|ru|>": 50263,
139
+ "<|sa|>": 50344,
140
+ "<|sd|>": 50332,
141
+ "<|si|>": 50322,
142
+ "<|sk|>": 50298,
143
+ "<|sl|>": 50305,
144
+ "<|sn|>": 50324,
145
+ "<|so|>": 50326,
146
+ "<|sq|>": 50317,
147
+ "<|sr|>": 50303,
148
+ "<|su|>": 50357,
149
+ "<|sv|>": 50273,
150
+ "<|sw|>": 50318,
151
+ "<|ta|>": 50287,
152
+ "<|te|>": 50299,
153
+ "<|tg|>": 50331,
154
+ "<|th|>": 50289,
155
+ "<|tk|>": 50341,
156
+ "<|tl|>": 50348,
157
+ "<|tr|>": 50268,
158
+ "<|tt|>": 50351,
159
+ "<|uk|>": 50280,
160
+ "<|ur|>": 50290,
161
+ "<|uz|>": 50337,
162
+ "<|vi|>": 50278,
163
+ "<|yi|>": 50335,
164
+ "<|yo|>": 50325,
165
+ "<|yue|>": 50358,
166
+ "<|zh|>": 50260
167
+ },
168
+ "language": "<|en|>",
169
+ "max_initial_timestamp_index": 1,
170
+ "max_length": 448,
171
+ "no_timestamps_token_id": 50364,
172
+ "pad_token_id": 50257,
173
+ "return_timestamps": false,
174
+ "suppress_tokens": [
175
+ 1,
176
+ 2,
177
+ 7,
178
+ 8,
179
+ 9,
180
+ 10,
181
+ 14,
182
+ 25,
183
+ 26,
184
+ 27,
185
+ 28,
186
+ 29,
187
+ 31,
188
+ 58,
189
+ 59,
190
+ 60,
191
+ 61,
192
+ 62,
193
+ 63,
194
+ 90,
195
+ 91,
196
+ 92,
197
+ 93,
198
+ 359,
199
+ 503,
200
+ 522,
201
+ 542,
202
+ 873,
203
+ 893,
204
+ 902,
205
+ 918,
206
+ 922,
207
+ 931,
208
+ 1350,
209
+ 1853,
210
+ 1982,
211
+ 2460,
212
+ 2627,
213
+ 3246,
214
+ 3253,
215
+ 3268,
216
+ 3536,
217
+ 3846,
218
+ 3961,
219
+ 4183,
220
+ 4667,
221
+ 6585,
222
+ 6647,
223
+ 7273,
224
+ 9061,
225
+ 9383,
226
+ 10428,
227
+ 10929,
228
+ 11938,
229
+ 12033,
230
+ 12331,
231
+ 12562,
232
+ 13793,
233
+ 14157,
234
+ 14635,
235
+ 15265,
236
+ 15618,
237
+ 16553,
238
+ 16604,
239
+ 18362,
240
+ 18956,
241
+ 20075,
242
+ 21675,
243
+ 22520,
244
+ 26130,
245
+ 26161,
246
+ 26435,
247
+ 28279,
248
+ 29464,
249
+ 31650,
250
+ 32302,
251
+ 32470,
252
+ 36865,
253
+ 42863,
254
+ 47425,
255
+ 49870,
256
+ 50254,
257
+ 50258,
258
+ 50359,
259
+ 50360,
260
+ 50361,
261
+ 50362,
262
+ 50363
263
+ ],
264
+ "task": "transcribe",
265
+ "task_to_id": {
266
+ "transcribe": 50360,
267
+ "translate": 50359
268
+ },
269
+ "transformers_version": "4.46.2",
270
+ "use_scan": false
271
+ }
checkpoint-1000/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-1000/preprocessor_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "chunk_length": 30,
3
+ "feature_extractor_type": "WhisperFeatureExtractor",
4
+ "feature_size": 128,
5
+ "hop_length": 160,
6
+ "n_fft": 400,
7
+ "n_samples": 480000,
8
+ "nb_max_frames": 3000,
9
+ "padding_side": "right",
10
+ "padding_value": 0.0,
11
+ "processor_class": "WhisperProcessor",
12
+ "return_attention_mask": false,
13
+ "sampling_rate": 16000
14
+ }
checkpoint-1000/special_tokens_map.json ADDED
@@ -0,0 +1,139 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|startoftranscript|>",
4
+ "<|en|>",
5
+ "<|zh|>",
6
+ "<|de|>",
7
+ "<|es|>",
8
+ "<|ru|>",
9
+ "<|ko|>",
10
+ "<|fr|>",
11
+ "<|ja|>",
12
+ "<|pt|>",
13
+ "<|tr|>",
14
+ "<|pl|>",
15
+ "<|ca|>",
16
+ "<|nl|>",
17
+ "<|ar|>",
18
+ "<|sv|>",
19
+ "<|it|>",
20
+ "<|id|>",
21
+ "<|hi|>",
22
+ "<|fi|>",
23
+ "<|vi|>",
24
+ "<|he|>",
25
+ "<|uk|>",
26
+ "<|el|>",
27
+ "<|ms|>",
28
+ "<|cs|>",
29
+ "<|ro|>",
30
+ "<|da|>",
31
+ "<|hu|>",
32
+ "<|ta|>",
33
+ "<|no|>",
34
+ "<|th|>",
35
+ "<|ur|>",
36
+ "<|hr|>",
37
+ "<|bg|>",
38
+ "<|lt|>",
39
+ "<|la|>",
40
+ "<|mi|>",
41
+ "<|ml|>",
42
+ "<|cy|>",
43
+ "<|sk|>",
44
+ "<|te|>",
45
+ "<|fa|>",
46
+ "<|lv|>",
47
+ "<|bn|>",
48
+ "<|sr|>",
49
+ "<|az|>",
50
+ "<|sl|>",
51
+ "<|kn|>",
52
+ "<|et|>",
53
+ "<|mk|>",
54
+ "<|br|>",
55
+ "<|eu|>",
56
+ "<|is|>",
57
+ "<|hy|>",
58
+ "<|ne|>",
59
+ "<|mn|>",
60
+ "<|bs|>",
61
+ "<|kk|>",
62
+ "<|sq|>",
63
+ "<|sw|>",
64
+ "<|gl|>",
65
+ "<|mr|>",
66
+ "<|pa|>",
67
+ "<|si|>",
68
+ "<|km|>",
69
+ "<|sn|>",
70
+ "<|yo|>",
71
+ "<|so|>",
72
+ "<|af|>",
73
+ "<|oc|>",
74
+ "<|ka|>",
75
+ "<|be|>",
76
+ "<|tg|>",
77
+ "<|sd|>",
78
+ "<|gu|>",
79
+ "<|am|>",
80
+ "<|yi|>",
81
+ "<|lo|>",
82
+ "<|uz|>",
83
+ "<|fo|>",
84
+ "<|ht|>",
85
+ "<|ps|>",
86
+ "<|tk|>",
87
+ "<|nn|>",
88
+ "<|mt|>",
89
+ "<|sa|>",
90
+ "<|lb|>",
91
+ "<|my|>",
92
+ "<|bo|>",
93
+ "<|tl|>",
94
+ "<|mg|>",
95
+ "<|as|>",
96
+ "<|tt|>",
97
+ "<|haw|>",
98
+ "<|ln|>",
99
+ "<|ha|>",
100
+ "<|ba|>",
101
+ "<|jw|>",
102
+ "<|su|>",
103
+ "<|yue|>",
104
+ "<|translate|>",
105
+ "<|transcribe|>",
106
+ "<|startoflm|>",
107
+ "<|startofprev|>",
108
+ "<|nospeech|>",
109
+ "<|notimestamps|>"
110
+ ],
111
+ "bos_token": {
112
+ "content": "<|endoftext|>",
113
+ "lstrip": false,
114
+ "normalized": false,
115
+ "rstrip": false,
116
+ "single_word": false
117
+ },
118
+ "eos_token": {
119
+ "content": "<|endoftext|>",
120
+ "lstrip": false,
121
+ "normalized": false,
122
+ "rstrip": false,
123
+ "single_word": false
124
+ },
125
+ "pad_token": {
126
+ "content": "<|endoftext|>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false
131
+ },
132
+ "unk_token": {
133
+ "content": "<|endoftext|>",
134
+ "lstrip": false,
135
+ "normalized": false,
136
+ "rstrip": false,
137
+ "single_word": false
138
+ }
139
+ }
checkpoint-1000/tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-1000/train_state.msgpack ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7c7ad83a7e9cb5ca3e7b8aba9b6aaebf06c17d0d0ccea63ca51127348a443bff
3
+ size 7564063736
checkpoint-1000/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
config.json ADDED
@@ -0,0 +1,288 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "./",
3
+ "activation_dropout": 0.0,
4
+ "activation_function": "gelu",
5
+ "alignment_heads": [
6
+ [
7
+ 7,
8
+ 0
9
+ ],
10
+ [
11
+ 10,
12
+ 17
13
+ ],
14
+ [
15
+ 12,
16
+ 18
17
+ ],
18
+ [
19
+ 13,
20
+ 12
21
+ ],
22
+ [
23
+ 16,
24
+ 1
25
+ ],
26
+ [
27
+ 17,
28
+ 14
29
+ ],
30
+ [
31
+ 19,
32
+ 11
33
+ ],
34
+ [
35
+ 21,
36
+ 4
37
+ ],
38
+ [
39
+ 24,
40
+ 1
41
+ ],
42
+ [
43
+ 25,
44
+ 6
45
+ ]
46
+ ],
47
+ "apply_spec_augment": false,
48
+ "architectures": [
49
+ "WhisperForConditionalGeneration"
50
+ ],
51
+ "attention_dropout": 0.0,
52
+ "begin_suppress_tokens": [
53
+ 220,
54
+ 50257
55
+ ],
56
+ "bos_token_id": 50257,
57
+ "classifier_proj_size": 256,
58
+ "d_model": 1280,
59
+ "decoder_attention_heads": 20,
60
+ "decoder_ffn_dim": 5120,
61
+ "decoder_layerdrop": 0,
62
+ "decoder_layers": 2,
63
+ "decoder_start_token_id": 50258,
64
+ "dropout": 0.0,
65
+ "encoder_attention_heads": 20,
66
+ "encoder_ffn_dim": 5120,
67
+ "encoder_layerdrop": 0,
68
+ "encoder_layers": 32,
69
+ "eos_token_id": 50257,
70
+ "init_std": 0.02,
71
+ "is_encoder_decoder": true,
72
+ "lang_ids": [
73
+ 50259,
74
+ 50260,
75
+ 50261,
76
+ 50262,
77
+ 50263,
78
+ 50264,
79
+ 50265,
80
+ 50266,
81
+ 50267,
82
+ 50268,
83
+ 50269,
84
+ 50270,
85
+ 50271,
86
+ 50272,
87
+ 50273,
88
+ 50274,
89
+ 50275,
90
+ 50276,
91
+ 50277,
92
+ 50278,
93
+ 50279,
94
+ 50280,
95
+ 50281,
96
+ 50282,
97
+ 50283,
98
+ 50284,
99
+ 50285,
100
+ 50286,
101
+ 50287,
102
+ 50288,
103
+ 50289,
104
+ 50290,
105
+ 50291,
106
+ 50292,
107
+ 50293,
108
+ 50294,
109
+ 50295,
110
+ 50296,
111
+ 50297,
112
+ 50298,
113
+ 50299,
114
+ 50300,
115
+ 50301,
116
+ 50302,
117
+ 50303,
118
+ 50304,
119
+ 50305,
120
+ 50306,
121
+ 50307,
122
+ 50308,
123
+ 50309,
124
+ 50310,
125
+ 50311,
126
+ 50312,
127
+ 50313,
128
+ 50314,
129
+ 50315,
130
+ 50316,
131
+ 50317,
132
+ 50318,
133
+ 50319,
134
+ 50320,
135
+ 50321,
136
+ 50322,
137
+ 50323,
138
+ 50324,
139
+ 50325,
140
+ 50326,
141
+ 50327,
142
+ 50328,
143
+ 50329,
144
+ 50330,
145
+ 50331,
146
+ 50332,
147
+ 50333,
148
+ 50334,
149
+ 50335,
150
+ 50336,
151
+ 50337,
152
+ 50338,
153
+ 50339,
154
+ 50340,
155
+ 50341,
156
+ 50342,
157
+ 50343,
158
+ 50344,
159
+ 50345,
160
+ 50346,
161
+ 50347,
162
+ 50348,
163
+ 50349,
164
+ 50350,
165
+ 50351,
166
+ 50352,
167
+ 50353,
168
+ 50354,
169
+ 50355,
170
+ 50356,
171
+ 50357,
172
+ 50358
173
+ ],
174
+ "mask_feature_length": 10,
175
+ "mask_feature_min_masks": 0,
176
+ "mask_feature_prob": 0,
177
+ "mask_time_length": 10,
178
+ "mask_time_min_masks": 2,
179
+ "mask_time_prob": 0.05,
180
+ "max_length": 448,
181
+ "max_source_positions": 1500,
182
+ "max_target_positions": 448,
183
+ "median_filter_width": 7,
184
+ "model_type": "whisper",
185
+ "num_hidden_layers": 32,
186
+ "num_mel_bins": 128,
187
+ "pad_token_id": 50256,
188
+ "scale_embedding": false,
189
+ "suppress_ids": [
190
+ 1,
191
+ 2,
192
+ 7,
193
+ 8,
194
+ 9,
195
+ 10,
196
+ 14,
197
+ 25,
198
+ 26,
199
+ 27,
200
+ 28,
201
+ 29,
202
+ 31,
203
+ 58,
204
+ 59,
205
+ 60,
206
+ 61,
207
+ 62,
208
+ 63,
209
+ 90,
210
+ 91,
211
+ 92,
212
+ 93,
213
+ 359,
214
+ 503,
215
+ 522,
216
+ 542,
217
+ 873,
218
+ 893,
219
+ 902,
220
+ 918,
221
+ 922,
222
+ 931,
223
+ 1350,
224
+ 1853,
225
+ 1982,
226
+ 2460,
227
+ 2627,
228
+ 3246,
229
+ 3253,
230
+ 3268,
231
+ 3536,
232
+ 3846,
233
+ 3961,
234
+ 4183,
235
+ 4667,
236
+ 6585,
237
+ 6647,
238
+ 7273,
239
+ 9061,
240
+ 9383,
241
+ 10428,
242
+ 10929,
243
+ 11938,
244
+ 12033,
245
+ 12331,
246
+ 12562,
247
+ 13793,
248
+ 14157,
249
+ 14635,
250
+ 15265,
251
+ 15618,
252
+ 16553,
253
+ 16604,
254
+ 18362,
255
+ 18956,
256
+ 20075,
257
+ 21675,
258
+ 22520,
259
+ 26130,
260
+ 26161,
261
+ 26435,
262
+ 28279,
263
+ 29464,
264
+ 31650,
265
+ 32302,
266
+ 32470,
267
+ 36865,
268
+ 42863,
269
+ 47425,
270
+ 49870,
271
+ 50254,
272
+ 50258,
273
+ 50359,
274
+ 50360,
275
+ 50361,
276
+ 50362,
277
+ 50363
278
+ ],
279
+ "suppress_ids_begin": [
280
+ 220,
281
+ 50257
282
+ ],
283
+ "torch_dtype": "float32",
284
+ "transformers_version": "4.46.2",
285
+ "use_cache": true,
286
+ "use_weighted_layer_sum": false,
287
+ "vocab_size": 51866
288
+ }
create_student_model.py ADDED
@@ -0,0 +1,226 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python
2
+ # coding=utf-8
3
+ # Copyright 2023 The HuggingFace Inc. team. All rights reserved.
4
+ #
5
+ # Licensed under the Apache License, Version 2.0 (the "License");
6
+ # you may not use this file except in compliance with the License.
7
+ # You may obtain a copy of the License at
8
+ #
9
+ # http://www.apache.org/licenses/LICENSE-2.0
10
+ #
11
+ # Unless required by applicable law or agreed to in writing, software
12
+ # distributed under the License is distributed on an "AS IS" BASIS,
13
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14
+ # See the License for the specific language governing permissions and
15
+ # limitations under the License.
16
+ """
17
+ Initialise a student Whisper model from a pre-trained teacher model for
18
+ teacher-student distillation.
19
+ """
20
+
21
+ import argparse
22
+ import copy
23
+ import logging
24
+
25
+ import jax
26
+ import numpy as np
27
+ from flax.core import freeze, unfreeze
28
+ from transformers import GenerationConfig, WhisperFeatureExtractor, WhisperProcessor
29
+
30
+ from distil_whisper import FlaxWhisperForConditionalGeneration
31
+
32
+
33
+ logger = logging.getLogger(__name__)
34
+
35
+
36
+ def parse_args():
37
+ parser = argparse.ArgumentParser(
38
+ description="Initialise a student Whisper model from a teacher model, copying the relevant layer weights and adjusting the processor as necessary."
39
+ )
40
+ parser.add_argument(
41
+ "--teacher_checkpoint",
42
+ type=str,
43
+ required=True,
44
+ help="The HF Hub ID of the teacher checkpoint.",
45
+ )
46
+ parser.add_argument(
47
+ "--subfolder",
48
+ type=str,
49
+ default="",
50
+ help="In case the relevant teacher weights are located inside a subfolder of the model repo on huggingface.co, you "
51
+ "can specify the folder name here.",
52
+ )
53
+ parser.add_argument(
54
+ "--encoder_layers",
55
+ type=int,
56
+ default=None,
57
+ help="Number of encoder layers to use in the student model. Defaults to all layers from the teacher.",
58
+ )
59
+ parser.add_argument(
60
+ "--decoder_layers",
61
+ type=int,
62
+ default=2,
63
+ help="Number of decoder layers to use in the student model. Defaults to 2 layers.",
64
+ )
65
+ parser.add_argument(
66
+ "--max_source_positions",
67
+ type=int,
68
+ default=None,
69
+ help="The maximum sequence length of log-mel filter-bank features that this model might ever be used with. Can "
70
+ "be used to create a student model with a shorter context length than the teacher model. Defaults to the number "
71
+ "of source positions in the teacher model (1500).",
72
+ )
73
+ parser.add_argument(
74
+ "--save_dir",
75
+ type=str,
76
+ required=True,
77
+ help="Where to save the student weights and processor.",
78
+ )
79
+ parser.add_argument(
80
+ "--push_to_hub",
81
+ type=bool,
82
+ required=False,
83
+ default=False,
84
+ help="Whether to push the student weights and processor to the Hub.",
85
+ )
86
+ parser.add_argument(
87
+ "--cache_dir",
88
+ type=str,
89
+ default=None,
90
+ help="Where to store the pretrained models downloaded from huggingface.co",
91
+ )
92
+
93
+ args = parser.parse_args()
94
+ return args
95
+
96
+
97
+ def init_student_model_from_teacher(
98
+ teacher_checkpoint,
99
+ encoder_layers=None,
100
+ decoder_layers=2,
101
+ max_source_positions=None,
102
+ save_dir=None,
103
+ push_to_hub=None,
104
+ cache_dir=None,
105
+ subfolder="",
106
+ ):
107
+ teacher_model, teacher_params = FlaxWhisperForConditionalGeneration.from_pretrained(
108
+ teacher_checkpoint,
109
+ _do_init=False,
110
+ cache_dir=cache_dir,
111
+ subfolder=subfolder,
112
+ )
113
+ processor = WhisperProcessor.from_pretrained(teacher_checkpoint)
114
+ generation_config = GenerationConfig.from_pretrained(teacher_checkpoint)
115
+
116
+ teacher_config = teacher_model.config
117
+ teacher_encoder_layers = teacher_config.encoder_layers
118
+ teacher_decoder_layers = teacher_config.decoder_layers
119
+
120
+ student_config = copy.deepcopy(teacher_config)
121
+ student_config.update(
122
+ {
123
+ "encoder_layers": encoder_layers if encoder_layers is not None else teacher_encoder_layers,
124
+ "decoder_layers": decoder_layers,
125
+ "max_source_positions": (
126
+ max_source_positions if max_source_positions is not None else student_config.max_source_positions
127
+ ),
128
+ }
129
+ )
130
+
131
+ encoder_mapping = np.linspace(0, teacher_encoder_layers - 1, student_config.encoder_layers, dtype=int)
132
+ encoder_mapping[-1] = teacher_encoder_layers - 1
133
+
134
+ encoder_map = {}
135
+ for student_layer, teacher_layer in enumerate(encoder_mapping):
136
+ encoder_map[str(teacher_layer)] = str(student_layer)
137
+
138
+ decoder_mapping = np.linspace(0, teacher_decoder_layers - 1, student_config.decoder_layers, dtype=int)
139
+ decoder_mapping[-1] = teacher_decoder_layers - 1
140
+
141
+ decoder_map = {}
142
+ for student_layer, teacher_layer in enumerate(decoder_mapping):
143
+ decoder_map[str(teacher_layer)] = str(student_layer)
144
+
145
+ # init the student params from the teacher model
146
+ student_params = unfreeze(teacher_params)
147
+ student_params["model"]["decoder"]["layers"] = {}
148
+
149
+ for layer in teacher_params["model"]["decoder"]["layers"]:
150
+ if layer in decoder_map:
151
+ # re-introduce pre-defined layers from the teacher
152
+ student_params["model"]["decoder"]["layers"][decoder_map[layer]] = teacher_params["model"]["decoder"][
153
+ "layers"
154
+ ][layer]
155
+
156
+ if encoder_layers is not None:
157
+ student_params["model"]["encoder"]["layers"] = {}
158
+ for layer in teacher_params["model"]["encoder"]["layers"]:
159
+ if layer in encoder_map:
160
+ # re-introduce pre-defined layers from the teacher
161
+ student_params["model"]["encoder"]["layers"][encoder_map[layer]] = teacher_params["model"]["encoder"][
162
+ "layers"
163
+ ][layer]
164
+
165
+ if max_source_positions is not None:
166
+ # slice the first MAX_SOURCE_POSITIONS embedding weights
167
+ student_params["model"]["encoder"]["embed_positions"]["embedding"] = teacher_params["model"]["encoder"][
168
+ "embed_positions"
169
+ ]["embedding"][: student_config.max_source_positions, :]
170
+ # update the feature extractor to handle the new input length
171
+ chunk_length = int(student_config.max_source_positions * 2 / 100)
172
+ processor.feature_extractor = WhisperFeatureExtractor(chunk_length=chunk_length)
173
+
174
+ # remove the teacher params and model
175
+ del teacher_params, teacher_model
176
+
177
+ # save the converted weights and model
178
+ student_params = freeze(student_params)
179
+ student_model = FlaxWhisperForConditionalGeneration(student_config, _do_init=False)
180
+
181
+ if save_dir is not None:
182
+ student_model.save_pretrained(save_dir, params=student_params)
183
+ # we also need to correctly save the processor and generation config
184
+ processor.save_pretrained(save_dir)
185
+ generation_config.save_pretrained(save_dir)
186
+
187
+ # check we can do a forward pass with the saved model - first load the weights and processor
188
+ logger.info("Checking we can load the saved model...")
189
+ student_model, student_params = FlaxWhisperForConditionalGeneration.from_pretrained(
190
+ save_dir,
191
+ _do_init=False,
192
+ )
193
+ processor = WhisperProcessor.from_pretrained(save_dir)
194
+
195
+ # define some random inputs
196
+ input_features = processor(np.ones(16000), sampling_rate=16000, return_tensors="np").input_features
197
+ decoder_start_token_id = student_model.config.decoder_start_token_id
198
+ decoder_input_ids = np.ones((input_features.shape[0], 1)) * decoder_start_token_id
199
+
200
+ # do a forward pass - outputs will be gibberish for the initialised model so we can't check them
201
+ logger.info("Checking we can run the converted model forward...")
202
+ _ = student_model(input_features, decoder_input_ids=decoder_input_ids, params=student_params).logits
203
+ logger.info("Conversion successful!")
204
+
205
+ if push_to_hub:
206
+ student_model.push_to_hub(save_dir, params=student_params)
207
+ processor.push_to_hub(save_dir)
208
+ generation_config.push_to_hub(save_dir)
209
+
210
+
211
+ if __name__ == "__main__":
212
+ args = parse_args()
213
+
214
+ # Set the verbosity to info of the logger - we only want one process per machine to log things on the screen
215
+ logger.setLevel(logging.INFO if jax.process_index() == 0 else logging.ERROR)
216
+
217
+ init_student_model_from_teacher(
218
+ teacher_checkpoint=args.teacher_checkpoint,
219
+ encoder_layers=args.encoder_layers,
220
+ decoder_layers=args.decoder_layers,
221
+ max_source_positions=args.max_source_positions,
222
+ save_dir=args.save_dir,
223
+ push_to_hub=args.push_to_hub,
224
+ cache_dir=args.cache_dir,
225
+ subfolder=args.subfolder,
226
+ )
distil-small-init/added_tokens.json ADDED
@@ -0,0 +1,1609 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "<|0.00|>": 50364,
3
+ "<|0.02|>": 50365,
4
+ "<|0.04|>": 50366,
5
+ "<|0.06|>": 50367,
6
+ "<|0.08|>": 50368,
7
+ "<|0.10|>": 50369,
8
+ "<|0.12|>": 50370,
9
+ "<|0.14|>": 50371,
10
+ "<|0.16|>": 50372,
11
+ "<|0.18|>": 50373,
12
+ "<|0.20|>": 50374,
13
+ "<|0.22|>": 50375,
14
+ "<|0.24|>": 50376,
15
+ "<|0.26|>": 50377,
16
+ "<|0.28|>": 50378,
17
+ "<|0.30|>": 50379,
18
+ "<|0.32|>": 50380,
19
+ "<|0.34|>": 50381,
20
+ "<|0.36|>": 50382,
21
+ "<|0.38|>": 50383,
22
+ "<|0.40|>": 50384,
23
+ "<|0.42|>": 50385,
24
+ "<|0.44|>": 50386,
25
+ "<|0.46|>": 50387,
26
+ "<|0.48|>": 50388,
27
+ "<|0.50|>": 50389,
28
+ "<|0.52|>": 50390,
29
+ "<|0.54|>": 50391,
30
+ "<|0.56|>": 50392,
31
+ "<|0.58|>": 50393,
32
+ "<|0.60|>": 50394,
33
+ "<|0.62|>": 50395,
34
+ "<|0.64|>": 50396,
35
+ "<|0.66|>": 50397,
36
+ "<|0.68|>": 50398,
37
+ "<|0.70|>": 50399,
38
+ "<|0.72|>": 50400,
39
+ "<|0.74|>": 50401,
40
+ "<|0.76|>": 50402,
41
+ "<|0.78|>": 50403,
42
+ "<|0.80|>": 50404,
43
+ "<|0.82|>": 50405,
44
+ "<|0.84|>": 50406,
45
+ "<|0.86|>": 50407,
46
+ "<|0.88|>": 50408,
47
+ "<|0.90|>": 50409,
48
+ "<|0.92|>": 50410,
49
+ "<|0.94|>": 50411,
50
+ "<|0.96|>": 50412,
51
+ "<|0.98|>": 50413,
52
+ "<|1.00|>": 50414,
53
+ "<|1.02|>": 50415,
54
+ "<|1.04|>": 50416,
55
+ "<|1.06|>": 50417,
56
+ "<|1.08|>": 50418,
57
+ "<|1.10|>": 50419,
58
+ "<|1.12|>": 50420,
59
+ "<|1.14|>": 50421,
60
+ "<|1.16|>": 50422,
61
+ "<|1.18|>": 50423,
62
+ "<|1.20|>": 50424,
63
+ "<|1.22|>": 50425,
64
+ "<|1.24|>": 50426,
65
+ "<|1.26|>": 50427,
66
+ "<|1.28|>": 50428,
67
+ "<|1.30|>": 50429,
68
+ "<|1.32|>": 50430,
69
+ "<|1.34|>": 50431,
70
+ "<|1.36|>": 50432,
71
+ "<|1.38|>": 50433,
72
+ "<|1.40|>": 50434,
73
+ "<|1.42|>": 50435,
74
+ "<|1.44|>": 50436,
75
+ "<|1.46|>": 50437,
76
+ "<|1.48|>": 50438,
77
+ "<|1.50|>": 50439,
78
+ "<|1.52|>": 50440,
79
+ "<|1.54|>": 50441,
80
+ "<|1.56|>": 50442,
81
+ "<|1.58|>": 50443,
82
+ "<|1.60|>": 50444,
83
+ "<|1.62|>": 50445,
84
+ "<|1.64|>": 50446,
85
+ "<|1.66|>": 50447,
86
+ "<|1.68|>": 50448,
87
+ "<|1.70|>": 50449,
88
+ "<|1.72|>": 50450,
89
+ "<|1.74|>": 50451,
90
+ "<|1.76|>": 50452,
91
+ "<|1.78|>": 50453,
92
+ "<|1.80|>": 50454,
93
+ "<|1.82|>": 50455,
94
+ "<|1.84|>": 50456,
95
+ "<|1.86|>": 50457,
96
+ "<|1.88|>": 50458,
97
+ "<|1.90|>": 50459,
98
+ "<|1.92|>": 50460,
99
+ "<|1.94|>": 50461,
100
+ "<|1.96|>": 50462,
101
+ "<|1.98|>": 50463,
102
+ "<|10.00|>": 50864,
103
+ "<|10.02|>": 50865,
104
+ "<|10.04|>": 50866,
105
+ "<|10.06|>": 50867,
106
+ "<|10.08|>": 50868,
107
+ "<|10.10|>": 50869,
108
+ "<|10.12|>": 50870,
109
+ "<|10.14|>": 50871,
110
+ "<|10.16|>": 50872,
111
+ "<|10.18|>": 50873,
112
+ "<|10.20|>": 50874,
113
+ "<|10.22|>": 50875,
114
+ "<|10.24|>": 50876,
115
+ "<|10.26|>": 50877,
116
+ "<|10.28|>": 50878,
117
+ "<|10.30|>": 50879,
118
+ "<|10.32|>": 50880,
119
+ "<|10.34|>": 50881,
120
+ "<|10.36|>": 50882,
121
+ "<|10.38|>": 50883,
122
+ "<|10.40|>": 50884,
123
+ "<|10.42|>": 50885,
124
+ "<|10.44|>": 50886,
125
+ "<|10.46|>": 50887,
126
+ "<|10.48|>": 50888,
127
+ "<|10.50|>": 50889,
128
+ "<|10.52|>": 50890,
129
+ "<|10.54|>": 50891,
130
+ "<|10.56|>": 50892,
131
+ "<|10.58|>": 50893,
132
+ "<|10.60|>": 50894,
133
+ "<|10.62|>": 50895,
134
+ "<|10.64|>": 50896,
135
+ "<|10.66|>": 50897,
136
+ "<|10.68|>": 50898,
137
+ "<|10.70|>": 50899,
138
+ "<|10.72|>": 50900,
139
+ "<|10.74|>": 50901,
140
+ "<|10.76|>": 50902,
141
+ "<|10.78|>": 50903,
142
+ "<|10.80|>": 50904,
143
+ "<|10.82|>": 50905,
144
+ "<|10.84|>": 50906,
145
+ "<|10.86|>": 50907,
146
+ "<|10.88|>": 50908,
147
+ "<|10.90|>": 50909,
148
+ "<|10.92|>": 50910,
149
+ "<|10.94|>": 50911,
150
+ "<|10.96|>": 50912,
151
+ "<|10.98|>": 50913,
152
+ "<|11.00|>": 50914,
153
+ "<|11.02|>": 50915,
154
+ "<|11.04|>": 50916,
155
+ "<|11.06|>": 50917,
156
+ "<|11.08|>": 50918,
157
+ "<|11.10|>": 50919,
158
+ "<|11.12|>": 50920,
159
+ "<|11.14|>": 50921,
160
+ "<|11.16|>": 50922,
161
+ "<|11.18|>": 50923,
162
+ "<|11.20|>": 50924,
163
+ "<|11.22|>": 50925,
164
+ "<|11.24|>": 50926,
165
+ "<|11.26|>": 50927,
166
+ "<|11.28|>": 50928,
167
+ "<|11.30|>": 50929,
168
+ "<|11.32|>": 50930,
169
+ "<|11.34|>": 50931,
170
+ "<|11.36|>": 50932,
171
+ "<|11.38|>": 50933,
172
+ "<|11.40|>": 50934,
173
+ "<|11.42|>": 50935,
174
+ "<|11.44|>": 50936,
175
+ "<|11.46|>": 50937,
176
+ "<|11.48|>": 50938,
177
+ "<|11.50|>": 50939,
178
+ "<|11.52|>": 50940,
179
+ "<|11.54|>": 50941,
180
+ "<|11.56|>": 50942,
181
+ "<|11.58|>": 50943,
182
+ "<|11.60|>": 50944,
183
+ "<|11.62|>": 50945,
184
+ "<|11.64|>": 50946,
185
+ "<|11.66|>": 50947,
186
+ "<|11.68|>": 50948,
187
+ "<|11.70|>": 50949,
188
+ "<|11.72|>": 50950,
189
+ "<|11.74|>": 50951,
190
+ "<|11.76|>": 50952,
191
+ "<|11.78|>": 50953,
192
+ "<|11.80|>": 50954,
193
+ "<|11.82|>": 50955,
194
+ "<|11.84|>": 50956,
195
+ "<|11.86|>": 50957,
196
+ "<|11.88|>": 50958,
197
+ "<|11.90|>": 50959,
198
+ "<|11.92|>": 50960,
199
+ "<|11.94|>": 50961,
200
+ "<|11.96|>": 50962,
201
+ "<|11.98|>": 50963,
202
+ "<|12.00|>": 50964,
203
+ "<|12.02|>": 50965,
204
+ "<|12.04|>": 50966,
205
+ "<|12.06|>": 50967,
206
+ "<|12.08|>": 50968,
207
+ "<|12.10|>": 50969,
208
+ "<|12.12|>": 50970,
209
+ "<|12.14|>": 50971,
210
+ "<|12.16|>": 50972,
211
+ "<|12.18|>": 50973,
212
+ "<|12.20|>": 50974,
213
+ "<|12.22|>": 50975,
214
+ "<|12.24|>": 50976,
215
+ "<|12.26|>": 50977,
216
+ "<|12.28|>": 50978,
217
+ "<|12.30|>": 50979,
218
+ "<|12.32|>": 50980,
219
+ "<|12.34|>": 50981,
220
+ "<|12.36|>": 50982,
221
+ "<|12.38|>": 50983,
222
+ "<|12.40|>": 50984,
223
+ "<|12.42|>": 50985,
224
+ "<|12.44|>": 50986,
225
+ "<|12.46|>": 50987,
226
+ "<|12.48|>": 50988,
227
+ "<|12.50|>": 50989,
228
+ "<|12.52|>": 50990,
229
+ "<|12.54|>": 50991,
230
+ "<|12.56|>": 50992,
231
+ "<|12.58|>": 50993,
232
+ "<|12.60|>": 50994,
233
+ "<|12.62|>": 50995,
234
+ "<|12.64|>": 50996,
235
+ "<|12.66|>": 50997,
236
+ "<|12.68|>": 50998,
237
+ "<|12.70|>": 50999,
238
+ "<|12.72|>": 51000,
239
+ "<|12.74|>": 51001,
240
+ "<|12.76|>": 51002,
241
+ "<|12.78|>": 51003,
242
+ "<|12.80|>": 51004,
243
+ "<|12.82|>": 51005,
244
+ "<|12.84|>": 51006,
245
+ "<|12.86|>": 51007,
246
+ "<|12.88|>": 51008,
247
+ "<|12.90|>": 51009,
248
+ "<|12.92|>": 51010,
249
+ "<|12.94|>": 51011,
250
+ "<|12.96|>": 51012,
251
+ "<|12.98|>": 51013,
252
+ "<|13.00|>": 51014,
253
+ "<|13.02|>": 51015,
254
+ "<|13.04|>": 51016,
255
+ "<|13.06|>": 51017,
256
+ "<|13.08|>": 51018,
257
+ "<|13.10|>": 51019,
258
+ "<|13.12|>": 51020,
259
+ "<|13.14|>": 51021,
260
+ "<|13.16|>": 51022,
261
+ "<|13.18|>": 51023,
262
+ "<|13.20|>": 51024,
263
+ "<|13.22|>": 51025,
264
+ "<|13.24|>": 51026,
265
+ "<|13.26|>": 51027,
266
+ "<|13.28|>": 51028,
267
+ "<|13.30|>": 51029,
268
+ "<|13.32|>": 51030,
269
+ "<|13.34|>": 51031,
270
+ "<|13.36|>": 51032,
271
+ "<|13.38|>": 51033,
272
+ "<|13.40|>": 51034,
273
+ "<|13.42|>": 51035,
274
+ "<|13.44|>": 51036,
275
+ "<|13.46|>": 51037,
276
+ "<|13.48|>": 51038,
277
+ "<|13.50|>": 51039,
278
+ "<|13.52|>": 51040,
279
+ "<|13.54|>": 51041,
280
+ "<|13.56|>": 51042,
281
+ "<|13.58|>": 51043,
282
+ "<|13.60|>": 51044,
283
+ "<|13.62|>": 51045,
284
+ "<|13.64|>": 51046,
285
+ "<|13.66|>": 51047,
286
+ "<|13.68|>": 51048,
287
+ "<|13.70|>": 51049,
288
+ "<|13.72|>": 51050,
289
+ "<|13.74|>": 51051,
290
+ "<|13.76|>": 51052,
291
+ "<|13.78|>": 51053,
292
+ "<|13.80|>": 51054,
293
+ "<|13.82|>": 51055,
294
+ "<|13.84|>": 51056,
295
+ "<|13.86|>": 51057,
296
+ "<|13.88|>": 51058,
297
+ "<|13.90|>": 51059,
298
+ "<|13.92|>": 51060,
299
+ "<|13.94|>": 51061,
300
+ "<|13.96|>": 51062,
301
+ "<|13.98|>": 51063,
302
+ "<|14.00|>": 51064,
303
+ "<|14.02|>": 51065,
304
+ "<|14.04|>": 51066,
305
+ "<|14.06|>": 51067,
306
+ "<|14.08|>": 51068,
307
+ "<|14.10|>": 51069,
308
+ "<|14.12|>": 51070,
309
+ "<|14.14|>": 51071,
310
+ "<|14.16|>": 51072,
311
+ "<|14.18|>": 51073,
312
+ "<|14.20|>": 51074,
313
+ "<|14.22|>": 51075,
314
+ "<|14.24|>": 51076,
315
+ "<|14.26|>": 51077,
316
+ "<|14.28|>": 51078,
317
+ "<|14.30|>": 51079,
318
+ "<|14.32|>": 51080,
319
+ "<|14.34|>": 51081,
320
+ "<|14.36|>": 51082,
321
+ "<|14.38|>": 51083,
322
+ "<|14.40|>": 51084,
323
+ "<|14.42|>": 51085,
324
+ "<|14.44|>": 51086,
325
+ "<|14.46|>": 51087,
326
+ "<|14.48|>": 51088,
327
+ "<|14.50|>": 51089,
328
+ "<|14.52|>": 51090,
329
+ "<|14.54|>": 51091,
330
+ "<|14.56|>": 51092,
331
+ "<|14.58|>": 51093,
332
+ "<|14.60|>": 51094,
333
+ "<|14.62|>": 51095,
334
+ "<|14.64|>": 51096,
335
+ "<|14.66|>": 51097,
336
+ "<|14.68|>": 51098,
337
+ "<|14.70|>": 51099,
338
+ "<|14.72|>": 51100,
339
+ "<|14.74|>": 51101,
340
+ "<|14.76|>": 51102,
341
+ "<|14.78|>": 51103,
342
+ "<|14.80|>": 51104,
343
+ "<|14.82|>": 51105,
344
+ "<|14.84|>": 51106,
345
+ "<|14.86|>": 51107,
346
+ "<|14.88|>": 51108,
347
+ "<|14.90|>": 51109,
348
+ "<|14.92|>": 51110,
349
+ "<|14.94|>": 51111,
350
+ "<|14.96|>": 51112,
351
+ "<|14.98|>": 51113,
352
+ "<|15.00|>": 51114,
353
+ "<|15.02|>": 51115,
354
+ "<|15.04|>": 51116,
355
+ "<|15.06|>": 51117,
356
+ "<|15.08|>": 51118,
357
+ "<|15.10|>": 51119,
358
+ "<|15.12|>": 51120,
359
+ "<|15.14|>": 51121,
360
+ "<|15.16|>": 51122,
361
+ "<|15.18|>": 51123,
362
+ "<|15.20|>": 51124,
363
+ "<|15.22|>": 51125,
364
+ "<|15.24|>": 51126,
365
+ "<|15.26|>": 51127,
366
+ "<|15.28|>": 51128,
367
+ "<|15.30|>": 51129,
368
+ "<|15.32|>": 51130,
369
+ "<|15.34|>": 51131,
370
+ "<|15.36|>": 51132,
371
+ "<|15.38|>": 51133,
372
+ "<|15.40|>": 51134,
373
+ "<|15.42|>": 51135,
374
+ "<|15.44|>": 51136,
375
+ "<|15.46|>": 51137,
376
+ "<|15.48|>": 51138,
377
+ "<|15.50|>": 51139,
378
+ "<|15.52|>": 51140,
379
+ "<|15.54|>": 51141,
380
+ "<|15.56|>": 51142,
381
+ "<|15.58|>": 51143,
382
+ "<|15.60|>": 51144,
383
+ "<|15.62|>": 51145,
384
+ "<|15.64|>": 51146,
385
+ "<|15.66|>": 51147,
386
+ "<|15.68|>": 51148,
387
+ "<|15.70|>": 51149,
388
+ "<|15.72|>": 51150,
389
+ "<|15.74|>": 51151,
390
+ "<|15.76|>": 51152,
391
+ "<|15.78|>": 51153,
392
+ "<|15.80|>": 51154,
393
+ "<|15.82|>": 51155,
394
+ "<|15.84|>": 51156,
395
+ "<|15.86|>": 51157,
396
+ "<|15.88|>": 51158,
397
+ "<|15.90|>": 51159,
398
+ "<|15.92|>": 51160,
399
+ "<|15.94|>": 51161,
400
+ "<|15.96|>": 51162,
401
+ "<|15.98|>": 51163,
402
+ "<|16.00|>": 51164,
403
+ "<|16.02|>": 51165,
404
+ "<|16.04|>": 51166,
405
+ "<|16.06|>": 51167,
406
+ "<|16.08|>": 51168,
407
+ "<|16.10|>": 51169,
408
+ "<|16.12|>": 51170,
409
+ "<|16.14|>": 51171,
410
+ "<|16.16|>": 51172,
411
+ "<|16.18|>": 51173,
412
+ "<|16.20|>": 51174,
413
+ "<|16.22|>": 51175,
414
+ "<|16.24|>": 51176,
415
+ "<|16.26|>": 51177,
416
+ "<|16.28|>": 51178,
417
+ "<|16.30|>": 51179,
418
+ "<|16.32|>": 51180,
419
+ "<|16.34|>": 51181,
420
+ "<|16.36|>": 51182,
421
+ "<|16.38|>": 51183,
422
+ "<|16.40|>": 51184,
423
+ "<|16.42|>": 51185,
424
+ "<|16.44|>": 51186,
425
+ "<|16.46|>": 51187,
426
+ "<|16.48|>": 51188,
427
+ "<|16.50|>": 51189,
428
+ "<|16.52|>": 51190,
429
+ "<|16.54|>": 51191,
430
+ "<|16.56|>": 51192,
431
+ "<|16.58|>": 51193,
432
+ "<|16.60|>": 51194,
433
+ "<|16.62|>": 51195,
434
+ "<|16.64|>": 51196,
435
+ "<|16.66|>": 51197,
436
+ "<|16.68|>": 51198,
437
+ "<|16.70|>": 51199,
438
+ "<|16.72|>": 51200,
439
+ "<|16.74|>": 51201,
440
+ "<|16.76|>": 51202,
441
+ "<|16.78|>": 51203,
442
+ "<|16.80|>": 51204,
443
+ "<|16.82|>": 51205,
444
+ "<|16.84|>": 51206,
445
+ "<|16.86|>": 51207,
446
+ "<|16.88|>": 51208,
447
+ "<|16.90|>": 51209,
448
+ "<|16.92|>": 51210,
449
+ "<|16.94|>": 51211,
450
+ "<|16.96|>": 51212,
451
+ "<|16.98|>": 51213,
452
+ "<|17.00|>": 51214,
453
+ "<|17.02|>": 51215,
454
+ "<|17.04|>": 51216,
455
+ "<|17.06|>": 51217,
456
+ "<|17.08|>": 51218,
457
+ "<|17.10|>": 51219,
458
+ "<|17.12|>": 51220,
459
+ "<|17.14|>": 51221,
460
+ "<|17.16|>": 51222,
461
+ "<|17.18|>": 51223,
462
+ "<|17.20|>": 51224,
463
+ "<|17.22|>": 51225,
464
+ "<|17.24|>": 51226,
465
+ "<|17.26|>": 51227,
466
+ "<|17.28|>": 51228,
467
+ "<|17.30|>": 51229,
468
+ "<|17.32|>": 51230,
469
+ "<|17.34|>": 51231,
470
+ "<|17.36|>": 51232,
471
+ "<|17.38|>": 51233,
472
+ "<|17.40|>": 51234,
473
+ "<|17.42|>": 51235,
474
+ "<|17.44|>": 51236,
475
+ "<|17.46|>": 51237,
476
+ "<|17.48|>": 51238,
477
+ "<|17.50|>": 51239,
478
+ "<|17.52|>": 51240,
479
+ "<|17.54|>": 51241,
480
+ "<|17.56|>": 51242,
481
+ "<|17.58|>": 51243,
482
+ "<|17.60|>": 51244,
483
+ "<|17.62|>": 51245,
484
+ "<|17.64|>": 51246,
485
+ "<|17.66|>": 51247,
486
+ "<|17.68|>": 51248,
487
+ "<|17.70|>": 51249,
488
+ "<|17.72|>": 51250,
489
+ "<|17.74|>": 51251,
490
+ "<|17.76|>": 51252,
491
+ "<|17.78|>": 51253,
492
+ "<|17.80|>": 51254,
493
+ "<|17.82|>": 51255,
494
+ "<|17.84|>": 51256,
495
+ "<|17.86|>": 51257,
496
+ "<|17.88|>": 51258,
497
+ "<|17.90|>": 51259,
498
+ "<|17.92|>": 51260,
499
+ "<|17.94|>": 51261,
500
+ "<|17.96|>": 51262,
501
+ "<|17.98|>": 51263,
502
+ "<|18.00|>": 51264,
503
+ "<|18.02|>": 51265,
504
+ "<|18.04|>": 51266,
505
+ "<|18.06|>": 51267,
506
+ "<|18.08|>": 51268,
507
+ "<|18.10|>": 51269,
508
+ "<|18.12|>": 51270,
509
+ "<|18.14|>": 51271,
510
+ "<|18.16|>": 51272,
511
+ "<|18.18|>": 51273,
512
+ "<|18.20|>": 51274,
513
+ "<|18.22|>": 51275,
514
+ "<|18.24|>": 51276,
515
+ "<|18.26|>": 51277,
516
+ "<|18.28|>": 51278,
517
+ "<|18.30|>": 51279,
518
+ "<|18.32|>": 51280,
519
+ "<|18.34|>": 51281,
520
+ "<|18.36|>": 51282,
521
+ "<|18.38|>": 51283,
522
+ "<|18.40|>": 51284,
523
+ "<|18.42|>": 51285,
524
+ "<|18.44|>": 51286,
525
+ "<|18.46|>": 51287,
526
+ "<|18.48|>": 51288,
527
+ "<|18.50|>": 51289,
528
+ "<|18.52|>": 51290,
529
+ "<|18.54|>": 51291,
530
+ "<|18.56|>": 51292,
531
+ "<|18.58|>": 51293,
532
+ "<|18.60|>": 51294,
533
+ "<|18.62|>": 51295,
534
+ "<|18.64|>": 51296,
535
+ "<|18.66|>": 51297,
536
+ "<|18.68|>": 51298,
537
+ "<|18.70|>": 51299,
538
+ "<|18.72|>": 51300,
539
+ "<|18.74|>": 51301,
540
+ "<|18.76|>": 51302,
541
+ "<|18.78|>": 51303,
542
+ "<|18.80|>": 51304,
543
+ "<|18.82|>": 51305,
544
+ "<|18.84|>": 51306,
545
+ "<|18.86|>": 51307,
546
+ "<|18.88|>": 51308,
547
+ "<|18.90|>": 51309,
548
+ "<|18.92|>": 51310,
549
+ "<|18.94|>": 51311,
550
+ "<|18.96|>": 51312,
551
+ "<|18.98|>": 51313,
552
+ "<|19.00|>": 51314,
553
+ "<|19.02|>": 51315,
554
+ "<|19.04|>": 51316,
555
+ "<|19.06|>": 51317,
556
+ "<|19.08|>": 51318,
557
+ "<|19.10|>": 51319,
558
+ "<|19.12|>": 51320,
559
+ "<|19.14|>": 51321,
560
+ "<|19.16|>": 51322,
561
+ "<|19.18|>": 51323,
562
+ "<|19.20|>": 51324,
563
+ "<|19.22|>": 51325,
564
+ "<|19.24|>": 51326,
565
+ "<|19.26|>": 51327,
566
+ "<|19.28|>": 51328,
567
+ "<|19.30|>": 51329,
568
+ "<|19.32|>": 51330,
569
+ "<|19.34|>": 51331,
570
+ "<|19.36|>": 51332,
571
+ "<|19.38|>": 51333,
572
+ "<|19.40|>": 51334,
573
+ "<|19.42|>": 51335,
574
+ "<|19.44|>": 51336,
575
+ "<|19.46|>": 51337,
576
+ "<|19.48|>": 51338,
577
+ "<|19.50|>": 51339,
578
+ "<|19.52|>": 51340,
579
+ "<|19.54|>": 51341,
580
+ "<|19.56|>": 51342,
581
+ "<|19.58|>": 51343,
582
+ "<|19.60|>": 51344,
583
+ "<|19.62|>": 51345,
584
+ "<|19.64|>": 51346,
585
+ "<|19.66|>": 51347,
586
+ "<|19.68|>": 51348,
587
+ "<|19.70|>": 51349,
588
+ "<|19.72|>": 51350,
589
+ "<|19.74|>": 51351,
590
+ "<|19.76|>": 51352,
591
+ "<|19.78|>": 51353,
592
+ "<|19.80|>": 51354,
593
+ "<|19.82|>": 51355,
594
+ "<|19.84|>": 51356,
595
+ "<|19.86|>": 51357,
596
+ "<|19.88|>": 51358,
597
+ "<|19.90|>": 51359,
598
+ "<|19.92|>": 51360,
599
+ "<|19.94|>": 51361,
600
+ "<|19.96|>": 51362,
601
+ "<|19.98|>": 51363,
602
+ "<|2.00|>": 50464,
603
+ "<|2.02|>": 50465,
604
+ "<|2.04|>": 50466,
605
+ "<|2.06|>": 50467,
606
+ "<|2.08|>": 50468,
607
+ "<|2.10|>": 50469,
608
+ "<|2.12|>": 50470,
609
+ "<|2.14|>": 50471,
610
+ "<|2.16|>": 50472,
611
+ "<|2.18|>": 50473,
612
+ "<|2.20|>": 50474,
613
+ "<|2.22|>": 50475,
614
+ "<|2.24|>": 50476,
615
+ "<|2.26|>": 50477,
616
+ "<|2.28|>": 50478,
617
+ "<|2.30|>": 50479,
618
+ "<|2.32|>": 50480,
619
+ "<|2.34|>": 50481,
620
+ "<|2.36|>": 50482,
621
+ "<|2.38|>": 50483,
622
+ "<|2.40|>": 50484,
623
+ "<|2.42|>": 50485,
624
+ "<|2.44|>": 50486,
625
+ "<|2.46|>": 50487,
626
+ "<|2.48|>": 50488,
627
+ "<|2.50|>": 50489,
628
+ "<|2.52|>": 50490,
629
+ "<|2.54|>": 50491,
630
+ "<|2.56|>": 50492,
631
+ "<|2.58|>": 50493,
632
+ "<|2.60|>": 50494,
633
+ "<|2.62|>": 50495,
634
+ "<|2.64|>": 50496,
635
+ "<|2.66|>": 50497,
636
+ "<|2.68|>": 50498,
637
+ "<|2.70|>": 50499,
638
+ "<|2.72|>": 50500,
639
+ "<|2.74|>": 50501,
640
+ "<|2.76|>": 50502,
641
+ "<|2.78|>": 50503,
642
+ "<|2.80|>": 50504,
643
+ "<|2.82|>": 50505,
644
+ "<|2.84|>": 50506,
645
+ "<|2.86|>": 50507,
646
+ "<|2.88|>": 50508,
647
+ "<|2.90|>": 50509,
648
+ "<|2.92|>": 50510,
649
+ "<|2.94|>": 50511,
650
+ "<|2.96|>": 50512,
651
+ "<|2.98|>": 50513,
652
+ "<|20.00|>": 51364,
653
+ "<|20.02|>": 51365,
654
+ "<|20.04|>": 51366,
655
+ "<|20.06|>": 51367,
656
+ "<|20.08|>": 51368,
657
+ "<|20.10|>": 51369,
658
+ "<|20.12|>": 51370,
659
+ "<|20.14|>": 51371,
660
+ "<|20.16|>": 51372,
661
+ "<|20.18|>": 51373,
662
+ "<|20.20|>": 51374,
663
+ "<|20.22|>": 51375,
664
+ "<|20.24|>": 51376,
665
+ "<|20.26|>": 51377,
666
+ "<|20.28|>": 51378,
667
+ "<|20.30|>": 51379,
668
+ "<|20.32|>": 51380,
669
+ "<|20.34|>": 51381,
670
+ "<|20.36|>": 51382,
671
+ "<|20.38|>": 51383,
672
+ "<|20.40|>": 51384,
673
+ "<|20.42|>": 51385,
674
+ "<|20.44|>": 51386,
675
+ "<|20.46|>": 51387,
676
+ "<|20.48|>": 51388,
677
+ "<|20.50|>": 51389,
678
+ "<|20.52|>": 51390,
679
+ "<|20.54|>": 51391,
680
+ "<|20.56|>": 51392,
681
+ "<|20.58|>": 51393,
682
+ "<|20.60|>": 51394,
683
+ "<|20.62|>": 51395,
684
+ "<|20.64|>": 51396,
685
+ "<|20.66|>": 51397,
686
+ "<|20.68|>": 51398,
687
+ "<|20.70|>": 51399,
688
+ "<|20.72|>": 51400,
689
+ "<|20.74|>": 51401,
690
+ "<|20.76|>": 51402,
691
+ "<|20.78|>": 51403,
692
+ "<|20.80|>": 51404,
693
+ "<|20.82|>": 51405,
694
+ "<|20.84|>": 51406,
695
+ "<|20.86|>": 51407,
696
+ "<|20.88|>": 51408,
697
+ "<|20.90|>": 51409,
698
+ "<|20.92|>": 51410,
699
+ "<|20.94|>": 51411,
700
+ "<|20.96|>": 51412,
701
+ "<|20.98|>": 51413,
702
+ "<|21.00|>": 51414,
703
+ "<|21.02|>": 51415,
704
+ "<|21.04|>": 51416,
705
+ "<|21.06|>": 51417,
706
+ "<|21.08|>": 51418,
707
+ "<|21.10|>": 51419,
708
+ "<|21.12|>": 51420,
709
+ "<|21.14|>": 51421,
710
+ "<|21.16|>": 51422,
711
+ "<|21.18|>": 51423,
712
+ "<|21.20|>": 51424,
713
+ "<|21.22|>": 51425,
714
+ "<|21.24|>": 51426,
715
+ "<|21.26|>": 51427,
716
+ "<|21.28|>": 51428,
717
+ "<|21.30|>": 51429,
718
+ "<|21.32|>": 51430,
719
+ "<|21.34|>": 51431,
720
+ "<|21.36|>": 51432,
721
+ "<|21.38|>": 51433,
722
+ "<|21.40|>": 51434,
723
+ "<|21.42|>": 51435,
724
+ "<|21.44|>": 51436,
725
+ "<|21.46|>": 51437,
726
+ "<|21.48|>": 51438,
727
+ "<|21.50|>": 51439,
728
+ "<|21.52|>": 51440,
729
+ "<|21.54|>": 51441,
730
+ "<|21.56|>": 51442,
731
+ "<|21.58|>": 51443,
732
+ "<|21.60|>": 51444,
733
+ "<|21.62|>": 51445,
734
+ "<|21.64|>": 51446,
735
+ "<|21.66|>": 51447,
736
+ "<|21.68|>": 51448,
737
+ "<|21.70|>": 51449,
738
+ "<|21.72|>": 51450,
739
+ "<|21.74|>": 51451,
740
+ "<|21.76|>": 51452,
741
+ "<|21.78|>": 51453,
742
+ "<|21.80|>": 51454,
743
+ "<|21.82|>": 51455,
744
+ "<|21.84|>": 51456,
745
+ "<|21.86|>": 51457,
746
+ "<|21.88|>": 51458,
747
+ "<|21.90|>": 51459,
748
+ "<|21.92|>": 51460,
749
+ "<|21.94|>": 51461,
750
+ "<|21.96|>": 51462,
751
+ "<|21.98|>": 51463,
752
+ "<|22.00|>": 51464,
753
+ "<|22.02|>": 51465,
754
+ "<|22.04|>": 51466,
755
+ "<|22.06|>": 51467,
756
+ "<|22.08|>": 51468,
757
+ "<|22.10|>": 51469,
758
+ "<|22.12|>": 51470,
759
+ "<|22.14|>": 51471,
760
+ "<|22.16|>": 51472,
761
+ "<|22.18|>": 51473,
762
+ "<|22.20|>": 51474,
763
+ "<|22.22|>": 51475,
764
+ "<|22.24|>": 51476,
765
+ "<|22.26|>": 51477,
766
+ "<|22.28|>": 51478,
767
+ "<|22.30|>": 51479,
768
+ "<|22.32|>": 51480,
769
+ "<|22.34|>": 51481,
770
+ "<|22.36|>": 51482,
771
+ "<|22.38|>": 51483,
772
+ "<|22.40|>": 51484,
773
+ "<|22.42|>": 51485,
774
+ "<|22.44|>": 51486,
775
+ "<|22.46|>": 51487,
776
+ "<|22.48|>": 51488,
777
+ "<|22.50|>": 51489,
778
+ "<|22.52|>": 51490,
779
+ "<|22.54|>": 51491,
780
+ "<|22.56|>": 51492,
781
+ "<|22.58|>": 51493,
782
+ "<|22.60|>": 51494,
783
+ "<|22.62|>": 51495,
784
+ "<|22.64|>": 51496,
785
+ "<|22.66|>": 51497,
786
+ "<|22.68|>": 51498,
787
+ "<|22.70|>": 51499,
788
+ "<|22.72|>": 51500,
789
+ "<|22.74|>": 51501,
790
+ "<|22.76|>": 51502,
791
+ "<|22.78|>": 51503,
792
+ "<|22.80|>": 51504,
793
+ "<|22.82|>": 51505,
794
+ "<|22.84|>": 51506,
795
+ "<|22.86|>": 51507,
796
+ "<|22.88|>": 51508,
797
+ "<|22.90|>": 51509,
798
+ "<|22.92|>": 51510,
799
+ "<|22.94|>": 51511,
800
+ "<|22.96|>": 51512,
801
+ "<|22.98|>": 51513,
802
+ "<|23.00|>": 51514,
803
+ "<|23.02|>": 51515,
804
+ "<|23.04|>": 51516,
805
+ "<|23.06|>": 51517,
806
+ "<|23.08|>": 51518,
807
+ "<|23.10|>": 51519,
808
+ "<|23.12|>": 51520,
809
+ "<|23.14|>": 51521,
810
+ "<|23.16|>": 51522,
811
+ "<|23.18|>": 51523,
812
+ "<|23.20|>": 51524,
813
+ "<|23.22|>": 51525,
814
+ "<|23.24|>": 51526,
815
+ "<|23.26|>": 51527,
816
+ "<|23.28|>": 51528,
817
+ "<|23.30|>": 51529,
818
+ "<|23.32|>": 51530,
819
+ "<|23.34|>": 51531,
820
+ "<|23.36|>": 51532,
821
+ "<|23.38|>": 51533,
822
+ "<|23.40|>": 51534,
823
+ "<|23.42|>": 51535,
824
+ "<|23.44|>": 51536,
825
+ "<|23.46|>": 51537,
826
+ "<|23.48|>": 51538,
827
+ "<|23.50|>": 51539,
828
+ "<|23.52|>": 51540,
829
+ "<|23.54|>": 51541,
830
+ "<|23.56|>": 51542,
831
+ "<|23.58|>": 51543,
832
+ "<|23.60|>": 51544,
833
+ "<|23.62|>": 51545,
834
+ "<|23.64|>": 51546,
835
+ "<|23.66|>": 51547,
836
+ "<|23.68|>": 51548,
837
+ "<|23.70|>": 51549,
838
+ "<|23.72|>": 51550,
839
+ "<|23.74|>": 51551,
840
+ "<|23.76|>": 51552,
841
+ "<|23.78|>": 51553,
842
+ "<|23.80|>": 51554,
843
+ "<|23.82|>": 51555,
844
+ "<|23.84|>": 51556,
845
+ "<|23.86|>": 51557,
846
+ "<|23.88|>": 51558,
847
+ "<|23.90|>": 51559,
848
+ "<|23.92|>": 51560,
849
+ "<|23.94|>": 51561,
850
+ "<|23.96|>": 51562,
851
+ "<|23.98|>": 51563,
852
+ "<|24.00|>": 51564,
853
+ "<|24.02|>": 51565,
854
+ "<|24.04|>": 51566,
855
+ "<|24.06|>": 51567,
856
+ "<|24.08|>": 51568,
857
+ "<|24.10|>": 51569,
858
+ "<|24.12|>": 51570,
859
+ "<|24.14|>": 51571,
860
+ "<|24.16|>": 51572,
861
+ "<|24.18|>": 51573,
862
+ "<|24.20|>": 51574,
863
+ "<|24.22|>": 51575,
864
+ "<|24.24|>": 51576,
865
+ "<|24.26|>": 51577,
866
+ "<|24.28|>": 51578,
867
+ "<|24.30|>": 51579,
868
+ "<|24.32|>": 51580,
869
+ "<|24.34|>": 51581,
870
+ "<|24.36|>": 51582,
871
+ "<|24.38|>": 51583,
872
+ "<|24.40|>": 51584,
873
+ "<|24.42|>": 51585,
874
+ "<|24.44|>": 51586,
875
+ "<|24.46|>": 51587,
876
+ "<|24.48|>": 51588,
877
+ "<|24.50|>": 51589,
878
+ "<|24.52|>": 51590,
879
+ "<|24.54|>": 51591,
880
+ "<|24.56|>": 51592,
881
+ "<|24.58|>": 51593,
882
+ "<|24.60|>": 51594,
883
+ "<|24.62|>": 51595,
884
+ "<|24.64|>": 51596,
885
+ "<|24.66|>": 51597,
886
+ "<|24.68|>": 51598,
887
+ "<|24.70|>": 51599,
888
+ "<|24.72|>": 51600,
889
+ "<|24.74|>": 51601,
890
+ "<|24.76|>": 51602,
891
+ "<|24.78|>": 51603,
892
+ "<|24.80|>": 51604,
893
+ "<|24.82|>": 51605,
894
+ "<|24.84|>": 51606,
895
+ "<|24.86|>": 51607,
896
+ "<|24.88|>": 51608,
897
+ "<|24.90|>": 51609,
898
+ "<|24.92|>": 51610,
899
+ "<|24.94|>": 51611,
900
+ "<|24.96|>": 51612,
901
+ "<|24.98|>": 51613,
902
+ "<|25.00|>": 51614,
903
+ "<|25.02|>": 51615,
904
+ "<|25.04|>": 51616,
905
+ "<|25.06|>": 51617,
906
+ "<|25.08|>": 51618,
907
+ "<|25.10|>": 51619,
908
+ "<|25.12|>": 51620,
909
+ "<|25.14|>": 51621,
910
+ "<|25.16|>": 51622,
911
+ "<|25.18|>": 51623,
912
+ "<|25.20|>": 51624,
913
+ "<|25.22|>": 51625,
914
+ "<|25.24|>": 51626,
915
+ "<|25.26|>": 51627,
916
+ "<|25.28|>": 51628,
917
+ "<|25.30|>": 51629,
918
+ "<|25.32|>": 51630,
919
+ "<|25.34|>": 51631,
920
+ "<|25.36|>": 51632,
921
+ "<|25.38|>": 51633,
922
+ "<|25.40|>": 51634,
923
+ "<|25.42|>": 51635,
924
+ "<|25.44|>": 51636,
925
+ "<|25.46|>": 51637,
926
+ "<|25.48|>": 51638,
927
+ "<|25.50|>": 51639,
928
+ "<|25.52|>": 51640,
929
+ "<|25.54|>": 51641,
930
+ "<|25.56|>": 51642,
931
+ "<|25.58|>": 51643,
932
+ "<|25.60|>": 51644,
933
+ "<|25.62|>": 51645,
934
+ "<|25.64|>": 51646,
935
+ "<|25.66|>": 51647,
936
+ "<|25.68|>": 51648,
937
+ "<|25.70|>": 51649,
938
+ "<|25.72|>": 51650,
939
+ "<|25.74|>": 51651,
940
+ "<|25.76|>": 51652,
941
+ "<|25.78|>": 51653,
942
+ "<|25.80|>": 51654,
943
+ "<|25.82|>": 51655,
944
+ "<|25.84|>": 51656,
945
+ "<|25.86|>": 51657,
946
+ "<|25.88|>": 51658,
947
+ "<|25.90|>": 51659,
948
+ "<|25.92|>": 51660,
949
+ "<|25.94|>": 51661,
950
+ "<|25.96|>": 51662,
951
+ "<|25.98|>": 51663,
952
+ "<|26.00|>": 51664,
953
+ "<|26.02|>": 51665,
954
+ "<|26.04|>": 51666,
955
+ "<|26.06|>": 51667,
956
+ "<|26.08|>": 51668,
957
+ "<|26.10|>": 51669,
958
+ "<|26.12|>": 51670,
959
+ "<|26.14|>": 51671,
960
+ "<|26.16|>": 51672,
961
+ "<|26.18|>": 51673,
962
+ "<|26.20|>": 51674,
963
+ "<|26.22|>": 51675,
964
+ "<|26.24|>": 51676,
965
+ "<|26.26|>": 51677,
966
+ "<|26.28|>": 51678,
967
+ "<|26.30|>": 51679,
968
+ "<|26.32|>": 51680,
969
+ "<|26.34|>": 51681,
970
+ "<|26.36|>": 51682,
971
+ "<|26.38|>": 51683,
972
+ "<|26.40|>": 51684,
973
+ "<|26.42|>": 51685,
974
+ "<|26.44|>": 51686,
975
+ "<|26.46|>": 51687,
976
+ "<|26.48|>": 51688,
977
+ "<|26.50|>": 51689,
978
+ "<|26.52|>": 51690,
979
+ "<|26.54|>": 51691,
980
+ "<|26.56|>": 51692,
981
+ "<|26.58|>": 51693,
982
+ "<|26.60|>": 51694,
983
+ "<|26.62|>": 51695,
984
+ "<|26.64|>": 51696,
985
+ "<|26.66|>": 51697,
986
+ "<|26.68|>": 51698,
987
+ "<|26.70|>": 51699,
988
+ "<|26.72|>": 51700,
989
+ "<|26.74|>": 51701,
990
+ "<|26.76|>": 51702,
991
+ "<|26.78|>": 51703,
992
+ "<|26.80|>": 51704,
993
+ "<|26.82|>": 51705,
994
+ "<|26.84|>": 51706,
995
+ "<|26.86|>": 51707,
996
+ "<|26.88|>": 51708,
997
+ "<|26.90|>": 51709,
998
+ "<|26.92|>": 51710,
999
+ "<|26.94|>": 51711,
1000
+ "<|26.96|>": 51712,
1001
+ "<|26.98|>": 51713,
1002
+ "<|27.00|>": 51714,
1003
+ "<|27.02|>": 51715,
1004
+ "<|27.04|>": 51716,
1005
+ "<|27.06|>": 51717,
1006
+ "<|27.08|>": 51718,
1007
+ "<|27.10|>": 51719,
1008
+ "<|27.12|>": 51720,
1009
+ "<|27.14|>": 51721,
1010
+ "<|27.16|>": 51722,
1011
+ "<|27.18|>": 51723,
1012
+ "<|27.20|>": 51724,
1013
+ "<|27.22|>": 51725,
1014
+ "<|27.24|>": 51726,
1015
+ "<|27.26|>": 51727,
1016
+ "<|27.28|>": 51728,
1017
+ "<|27.30|>": 51729,
1018
+ "<|27.32|>": 51730,
1019
+ "<|27.34|>": 51731,
1020
+ "<|27.36|>": 51732,
1021
+ "<|27.38|>": 51733,
1022
+ "<|27.40|>": 51734,
1023
+ "<|27.42|>": 51735,
1024
+ "<|27.44|>": 51736,
1025
+ "<|27.46|>": 51737,
1026
+ "<|27.48|>": 51738,
1027
+ "<|27.50|>": 51739,
1028
+ "<|27.52|>": 51740,
1029
+ "<|27.54|>": 51741,
1030
+ "<|27.56|>": 51742,
1031
+ "<|27.58|>": 51743,
1032
+ "<|27.60|>": 51744,
1033
+ "<|27.62|>": 51745,
1034
+ "<|27.64|>": 51746,
1035
+ "<|27.66|>": 51747,
1036
+ "<|27.68|>": 51748,
1037
+ "<|27.70|>": 51749,
1038
+ "<|27.72|>": 51750,
1039
+ "<|27.74|>": 51751,
1040
+ "<|27.76|>": 51752,
1041
+ "<|27.78|>": 51753,
1042
+ "<|27.80|>": 51754,
1043
+ "<|27.82|>": 51755,
1044
+ "<|27.84|>": 51756,
1045
+ "<|27.86|>": 51757,
1046
+ "<|27.88|>": 51758,
1047
+ "<|27.90|>": 51759,
1048
+ "<|27.92|>": 51760,
1049
+ "<|27.94|>": 51761,
1050
+ "<|27.96|>": 51762,
1051
+ "<|27.98|>": 51763,
1052
+ "<|28.00|>": 51764,
1053
+ "<|28.02|>": 51765,
1054
+ "<|28.04|>": 51766,
1055
+ "<|28.06|>": 51767,
1056
+ "<|28.08|>": 51768,
1057
+ "<|28.10|>": 51769,
1058
+ "<|28.12|>": 51770,
1059
+ "<|28.14|>": 51771,
1060
+ "<|28.16|>": 51772,
1061
+ "<|28.18|>": 51773,
1062
+ "<|28.20|>": 51774,
1063
+ "<|28.22|>": 51775,
1064
+ "<|28.24|>": 51776,
1065
+ "<|28.26|>": 51777,
1066
+ "<|28.28|>": 51778,
1067
+ "<|28.30|>": 51779,
1068
+ "<|28.32|>": 51780,
1069
+ "<|28.34|>": 51781,
1070
+ "<|28.36|>": 51782,
1071
+ "<|28.38|>": 51783,
1072
+ "<|28.40|>": 51784,
1073
+ "<|28.42|>": 51785,
1074
+ "<|28.44|>": 51786,
1075
+ "<|28.46|>": 51787,
1076
+ "<|28.48|>": 51788,
1077
+ "<|28.50|>": 51789,
1078
+ "<|28.52|>": 51790,
1079
+ "<|28.54|>": 51791,
1080
+ "<|28.56|>": 51792,
1081
+ "<|28.58|>": 51793,
1082
+ "<|28.60|>": 51794,
1083
+ "<|28.62|>": 51795,
1084
+ "<|28.64|>": 51796,
1085
+ "<|28.66|>": 51797,
1086
+ "<|28.68|>": 51798,
1087
+ "<|28.70|>": 51799,
1088
+ "<|28.72|>": 51800,
1089
+ "<|28.74|>": 51801,
1090
+ "<|28.76|>": 51802,
1091
+ "<|28.78|>": 51803,
1092
+ "<|28.80|>": 51804,
1093
+ "<|28.82|>": 51805,
1094
+ "<|28.84|>": 51806,
1095
+ "<|28.86|>": 51807,
1096
+ "<|28.88|>": 51808,
1097
+ "<|28.90|>": 51809,
1098
+ "<|28.92|>": 51810,
1099
+ "<|28.94|>": 51811,
1100
+ "<|28.96|>": 51812,
1101
+ "<|28.98|>": 51813,
1102
+ "<|29.00|>": 51814,
1103
+ "<|29.02|>": 51815,
1104
+ "<|29.04|>": 51816,
1105
+ "<|29.06|>": 51817,
1106
+ "<|29.08|>": 51818,
1107
+ "<|29.10|>": 51819,
1108
+ "<|29.12|>": 51820,
1109
+ "<|29.14|>": 51821,
1110
+ "<|29.16|>": 51822,
1111
+ "<|29.18|>": 51823,
1112
+ "<|29.20|>": 51824,
1113
+ "<|29.22|>": 51825,
1114
+ "<|29.24|>": 51826,
1115
+ "<|29.26|>": 51827,
1116
+ "<|29.28|>": 51828,
1117
+ "<|29.30|>": 51829,
1118
+ "<|29.32|>": 51830,
1119
+ "<|29.34|>": 51831,
1120
+ "<|29.36|>": 51832,
1121
+ "<|29.38|>": 51833,
1122
+ "<|29.40|>": 51834,
1123
+ "<|29.42|>": 51835,
1124
+ "<|29.44|>": 51836,
1125
+ "<|29.46|>": 51837,
1126
+ "<|29.48|>": 51838,
1127
+ "<|29.50|>": 51839,
1128
+ "<|29.52|>": 51840,
1129
+ "<|29.54|>": 51841,
1130
+ "<|29.56|>": 51842,
1131
+ "<|29.58|>": 51843,
1132
+ "<|29.60|>": 51844,
1133
+ "<|29.62|>": 51845,
1134
+ "<|29.64|>": 51846,
1135
+ "<|29.66|>": 51847,
1136
+ "<|29.68|>": 51848,
1137
+ "<|29.70|>": 51849,
1138
+ "<|29.72|>": 51850,
1139
+ "<|29.74|>": 51851,
1140
+ "<|29.76|>": 51852,
1141
+ "<|29.78|>": 51853,
1142
+ "<|29.80|>": 51854,
1143
+ "<|29.82|>": 51855,
1144
+ "<|29.84|>": 51856,
1145
+ "<|29.86|>": 51857,
1146
+ "<|29.88|>": 51858,
1147
+ "<|29.90|>": 51859,
1148
+ "<|29.92|>": 51860,
1149
+ "<|29.94|>": 51861,
1150
+ "<|29.96|>": 51862,
1151
+ "<|29.98|>": 51863,
1152
+ "<|3.00|>": 50514,
1153
+ "<|3.02|>": 50515,
1154
+ "<|3.04|>": 50516,
1155
+ "<|3.06|>": 50517,
1156
+ "<|3.08|>": 50518,
1157
+ "<|3.10|>": 50519,
1158
+ "<|3.12|>": 50520,
1159
+ "<|3.14|>": 50521,
1160
+ "<|3.16|>": 50522,
1161
+ "<|3.18|>": 50523,
1162
+ "<|3.20|>": 50524,
1163
+ "<|3.22|>": 50525,
1164
+ "<|3.24|>": 50526,
1165
+ "<|3.26|>": 50527,
1166
+ "<|3.28|>": 50528,
1167
+ "<|3.30|>": 50529,
1168
+ "<|3.32|>": 50530,
1169
+ "<|3.34|>": 50531,
1170
+ "<|3.36|>": 50532,
1171
+ "<|3.38|>": 50533,
1172
+ "<|3.40|>": 50534,
1173
+ "<|3.42|>": 50535,
1174
+ "<|3.44|>": 50536,
1175
+ "<|3.46|>": 50537,
1176
+ "<|3.48|>": 50538,
1177
+ "<|3.50|>": 50539,
1178
+ "<|3.52|>": 50540,
1179
+ "<|3.54|>": 50541,
1180
+ "<|3.56|>": 50542,
1181
+ "<|3.58|>": 50543,
1182
+ "<|3.60|>": 50544,
1183
+ "<|3.62|>": 50545,
1184
+ "<|3.64|>": 50546,
1185
+ "<|3.66|>": 50547,
1186
+ "<|3.68|>": 50548,
1187
+ "<|3.70|>": 50549,
1188
+ "<|3.72|>": 50550,
1189
+ "<|3.74|>": 50551,
1190
+ "<|3.76|>": 50552,
1191
+ "<|3.78|>": 50553,
1192
+ "<|3.80|>": 50554,
1193
+ "<|3.82|>": 50555,
1194
+ "<|3.84|>": 50556,
1195
+ "<|3.86|>": 50557,
1196
+ "<|3.88|>": 50558,
1197
+ "<|3.90|>": 50559,
1198
+ "<|3.92|>": 50560,
1199
+ "<|3.94|>": 50561,
1200
+ "<|3.96|>": 50562,
1201
+ "<|3.98|>": 50563,
1202
+ "<|30.00|>": 51864,
1203
+ "<|4.00|>": 50564,
1204
+ "<|4.02|>": 50565,
1205
+ "<|4.04|>": 50566,
1206
+ "<|4.06|>": 50567,
1207
+ "<|4.08|>": 50568,
1208
+ "<|4.10|>": 50569,
1209
+ "<|4.12|>": 50570,
1210
+ "<|4.14|>": 50571,
1211
+ "<|4.16|>": 50572,
1212
+ "<|4.18|>": 50573,
1213
+ "<|4.20|>": 50574,
1214
+ "<|4.22|>": 50575,
1215
+ "<|4.24|>": 50576,
1216
+ "<|4.26|>": 50577,
1217
+ "<|4.28|>": 50578,
1218
+ "<|4.30|>": 50579,
1219
+ "<|4.32|>": 50580,
1220
+ "<|4.34|>": 50581,
1221
+ "<|4.36|>": 50582,
1222
+ "<|4.38|>": 50583,
1223
+ "<|4.40|>": 50584,
1224
+ "<|4.42|>": 50585,
1225
+ "<|4.44|>": 50586,
1226
+ "<|4.46|>": 50587,
1227
+ "<|4.48|>": 50588,
1228
+ "<|4.50|>": 50589,
1229
+ "<|4.52|>": 50590,
1230
+ "<|4.54|>": 50591,
1231
+ "<|4.56|>": 50592,
1232
+ "<|4.58|>": 50593,
1233
+ "<|4.60|>": 50594,
1234
+ "<|4.62|>": 50595,
1235
+ "<|4.64|>": 50596,
1236
+ "<|4.66|>": 50597,
1237
+ "<|4.68|>": 50598,
1238
+ "<|4.70|>": 50599,
1239
+ "<|4.72|>": 50600,
1240
+ "<|4.74|>": 50601,
1241
+ "<|4.76|>": 50602,
1242
+ "<|4.78|>": 50603,
1243
+ "<|4.80|>": 50604,
1244
+ "<|4.82|>": 50605,
1245
+ "<|4.84|>": 50606,
1246
+ "<|4.86|>": 50607,
1247
+ "<|4.88|>": 50608,
1248
+ "<|4.90|>": 50609,
1249
+ "<|4.92|>": 50610,
1250
+ "<|4.94|>": 50611,
1251
+ "<|4.96|>": 50612,
1252
+ "<|4.98|>": 50613,
1253
+ "<|5.00|>": 50614,
1254
+ "<|5.02|>": 50615,
1255
+ "<|5.04|>": 50616,
1256
+ "<|5.06|>": 50617,
1257
+ "<|5.08|>": 50618,
1258
+ "<|5.10|>": 50619,
1259
+ "<|5.12|>": 50620,
1260
+ "<|5.14|>": 50621,
1261
+ "<|5.16|>": 50622,
1262
+ "<|5.18|>": 50623,
1263
+ "<|5.20|>": 50624,
1264
+ "<|5.22|>": 50625,
1265
+ "<|5.24|>": 50626,
1266
+ "<|5.26|>": 50627,
1267
+ "<|5.28|>": 50628,
1268
+ "<|5.30|>": 50629,
1269
+ "<|5.32|>": 50630,
1270
+ "<|5.34|>": 50631,
1271
+ "<|5.36|>": 50632,
1272
+ "<|5.38|>": 50633,
1273
+ "<|5.40|>": 50634,
1274
+ "<|5.42|>": 50635,
1275
+ "<|5.44|>": 50636,
1276
+ "<|5.46|>": 50637,
1277
+ "<|5.48|>": 50638,
1278
+ "<|5.50|>": 50639,
1279
+ "<|5.52|>": 50640,
1280
+ "<|5.54|>": 50641,
1281
+ "<|5.56|>": 50642,
1282
+ "<|5.58|>": 50643,
1283
+ "<|5.60|>": 50644,
1284
+ "<|5.62|>": 50645,
1285
+ "<|5.64|>": 50646,
1286
+ "<|5.66|>": 50647,
1287
+ "<|5.68|>": 50648,
1288
+ "<|5.70|>": 50649,
1289
+ "<|5.72|>": 50650,
1290
+ "<|5.74|>": 50651,
1291
+ "<|5.76|>": 50652,
1292
+ "<|5.78|>": 50653,
1293
+ "<|5.80|>": 50654,
1294
+ "<|5.82|>": 50655,
1295
+ "<|5.84|>": 50656,
1296
+ "<|5.86|>": 50657,
1297
+ "<|5.88|>": 50658,
1298
+ "<|5.90|>": 50659,
1299
+ "<|5.92|>": 50660,
1300
+ "<|5.94|>": 50661,
1301
+ "<|5.96|>": 50662,
1302
+ "<|5.98|>": 50663,
1303
+ "<|6.00|>": 50664,
1304
+ "<|6.02|>": 50665,
1305
+ "<|6.04|>": 50666,
1306
+ "<|6.06|>": 50667,
1307
+ "<|6.08|>": 50668,
1308
+ "<|6.10|>": 50669,
1309
+ "<|6.12|>": 50670,
1310
+ "<|6.14|>": 50671,
1311
+ "<|6.16|>": 50672,
1312
+ "<|6.18|>": 50673,
1313
+ "<|6.20|>": 50674,
1314
+ "<|6.22|>": 50675,
1315
+ "<|6.24|>": 50676,
1316
+ "<|6.26|>": 50677,
1317
+ "<|6.28|>": 50678,
1318
+ "<|6.30|>": 50679,
1319
+ "<|6.32|>": 50680,
1320
+ "<|6.34|>": 50681,
1321
+ "<|6.36|>": 50682,
1322
+ "<|6.38|>": 50683,
1323
+ "<|6.40|>": 50684,
1324
+ "<|6.42|>": 50685,
1325
+ "<|6.44|>": 50686,
1326
+ "<|6.46|>": 50687,
1327
+ "<|6.48|>": 50688,
1328
+ "<|6.50|>": 50689,
1329
+ "<|6.52|>": 50690,
1330
+ "<|6.54|>": 50691,
1331
+ "<|6.56|>": 50692,
1332
+ "<|6.58|>": 50693,
1333
+ "<|6.60|>": 50694,
1334
+ "<|6.62|>": 50695,
1335
+ "<|6.64|>": 50696,
1336
+ "<|6.66|>": 50697,
1337
+ "<|6.68|>": 50698,
1338
+ "<|6.70|>": 50699,
1339
+ "<|6.72|>": 50700,
1340
+ "<|6.74|>": 50701,
1341
+ "<|6.76|>": 50702,
1342
+ "<|6.78|>": 50703,
1343
+ "<|6.80|>": 50704,
1344
+ "<|6.82|>": 50705,
1345
+ "<|6.84|>": 50706,
1346
+ "<|6.86|>": 50707,
1347
+ "<|6.88|>": 50708,
1348
+ "<|6.90|>": 50709,
1349
+ "<|6.92|>": 50710,
1350
+ "<|6.94|>": 50711,
1351
+ "<|6.96|>": 50712,
1352
+ "<|6.98|>": 50713,
1353
+ "<|7.00|>": 50714,
1354
+ "<|7.02|>": 50715,
1355
+ "<|7.04|>": 50716,
1356
+ "<|7.06|>": 50717,
1357
+ "<|7.08|>": 50718,
1358
+ "<|7.10|>": 50719,
1359
+ "<|7.12|>": 50720,
1360
+ "<|7.14|>": 50721,
1361
+ "<|7.16|>": 50722,
1362
+ "<|7.18|>": 50723,
1363
+ "<|7.20|>": 50724,
1364
+ "<|7.22|>": 50725,
1365
+ "<|7.24|>": 50726,
1366
+ "<|7.26|>": 50727,
1367
+ "<|7.28|>": 50728,
1368
+ "<|7.30|>": 50729,
1369
+ "<|7.32|>": 50730,
1370
+ "<|7.34|>": 50731,
1371
+ "<|7.36|>": 50732,
1372
+ "<|7.38|>": 50733,
1373
+ "<|7.40|>": 50734,
1374
+ "<|7.42|>": 50735,
1375
+ "<|7.44|>": 50736,
1376
+ "<|7.46|>": 50737,
1377
+ "<|7.48|>": 50738,
1378
+ "<|7.50|>": 50739,
1379
+ "<|7.52|>": 50740,
1380
+ "<|7.54|>": 50741,
1381
+ "<|7.56|>": 50742,
1382
+ "<|7.58|>": 50743,
1383
+ "<|7.60|>": 50744,
1384
+ "<|7.62|>": 50745,
1385
+ "<|7.64|>": 50746,
1386
+ "<|7.66|>": 50747,
1387
+ "<|7.68|>": 50748,
1388
+ "<|7.70|>": 50749,
1389
+ "<|7.72|>": 50750,
1390
+ "<|7.74|>": 50751,
1391
+ "<|7.76|>": 50752,
1392
+ "<|7.78|>": 50753,
1393
+ "<|7.80|>": 50754,
1394
+ "<|7.82|>": 50755,
1395
+ "<|7.84|>": 50756,
1396
+ "<|7.86|>": 50757,
1397
+ "<|7.88|>": 50758,
1398
+ "<|7.90|>": 50759,
1399
+ "<|7.92|>": 50760,
1400
+ "<|7.94|>": 50761,
1401
+ "<|7.96|>": 50762,
1402
+ "<|7.98|>": 50763,
1403
+ "<|8.00|>": 50764,
1404
+ "<|8.02|>": 50765,
1405
+ "<|8.04|>": 50766,
1406
+ "<|8.06|>": 50767,
1407
+ "<|8.08|>": 50768,
1408
+ "<|8.10|>": 50769,
1409
+ "<|8.12|>": 50770,
1410
+ "<|8.14|>": 50771,
1411
+ "<|8.16|>": 50772,
1412
+ "<|8.18|>": 50773,
1413
+ "<|8.20|>": 50774,
1414
+ "<|8.22|>": 50775,
1415
+ "<|8.24|>": 50776,
1416
+ "<|8.26|>": 50777,
1417
+ "<|8.28|>": 50778,
1418
+ "<|8.30|>": 50779,
1419
+ "<|8.32|>": 50780,
1420
+ "<|8.34|>": 50781,
1421
+ "<|8.36|>": 50782,
1422
+ "<|8.38|>": 50783,
1423
+ "<|8.40|>": 50784,
1424
+ "<|8.42|>": 50785,
1425
+ "<|8.44|>": 50786,
1426
+ "<|8.46|>": 50787,
1427
+ "<|8.48|>": 50788,
1428
+ "<|8.50|>": 50789,
1429
+ "<|8.52|>": 50790,
1430
+ "<|8.54|>": 50791,
1431
+ "<|8.56|>": 50792,
1432
+ "<|8.58|>": 50793,
1433
+ "<|8.60|>": 50794,
1434
+ "<|8.62|>": 50795,
1435
+ "<|8.64|>": 50796,
1436
+ "<|8.66|>": 50797,
1437
+ "<|8.68|>": 50798,
1438
+ "<|8.70|>": 50799,
1439
+ "<|8.72|>": 50800,
1440
+ "<|8.74|>": 50801,
1441
+ "<|8.76|>": 50802,
1442
+ "<|8.78|>": 50803,
1443
+ "<|8.80|>": 50804,
1444
+ "<|8.82|>": 50805,
1445
+ "<|8.84|>": 50806,
1446
+ "<|8.86|>": 50807,
1447
+ "<|8.88|>": 50808,
1448
+ "<|8.90|>": 50809,
1449
+ "<|8.92|>": 50810,
1450
+ "<|8.94|>": 50811,
1451
+ "<|8.96|>": 50812,
1452
+ "<|8.98|>": 50813,
1453
+ "<|9.00|>": 50814,
1454
+ "<|9.02|>": 50815,
1455
+ "<|9.04|>": 50816,
1456
+ "<|9.06|>": 50817,
1457
+ "<|9.08|>": 50818,
1458
+ "<|9.10|>": 50819,
1459
+ "<|9.12|>": 50820,
1460
+ "<|9.14|>": 50821,
1461
+ "<|9.16|>": 50822,
1462
+ "<|9.18|>": 50823,
1463
+ "<|9.20|>": 50824,
1464
+ "<|9.22|>": 50825,
1465
+ "<|9.24|>": 50826,
1466
+ "<|9.26|>": 50827,
1467
+ "<|9.28|>": 50828,
1468
+ "<|9.30|>": 50829,
1469
+ "<|9.32|>": 50830,
1470
+ "<|9.34|>": 50831,
1471
+ "<|9.36|>": 50832,
1472
+ "<|9.38|>": 50833,
1473
+ "<|9.40|>": 50834,
1474
+ "<|9.42|>": 50835,
1475
+ "<|9.44|>": 50836,
1476
+ "<|9.46|>": 50837,
1477
+ "<|9.48|>": 50838,
1478
+ "<|9.50|>": 50839,
1479
+ "<|9.52|>": 50840,
1480
+ "<|9.54|>": 50841,
1481
+ "<|9.56|>": 50842,
1482
+ "<|9.58|>": 50843,
1483
+ "<|9.60|>": 50844,
1484
+ "<|9.62|>": 50845,
1485
+ "<|9.64|>": 50846,
1486
+ "<|9.66|>": 50847,
1487
+ "<|9.68|>": 50848,
1488
+ "<|9.70|>": 50849,
1489
+ "<|9.72|>": 50850,
1490
+ "<|9.74|>": 50851,
1491
+ "<|9.76|>": 50852,
1492
+ "<|9.78|>": 50853,
1493
+ "<|9.80|>": 50854,
1494
+ "<|9.82|>": 50855,
1495
+ "<|9.84|>": 50856,
1496
+ "<|9.86|>": 50857,
1497
+ "<|9.88|>": 50858,
1498
+ "<|9.90|>": 50859,
1499
+ "<|9.92|>": 50860,
1500
+ "<|9.94|>": 50861,
1501
+ "<|9.96|>": 50862,
1502
+ "<|9.98|>": 50863,
1503
+ "<|af|>": 50327,
1504
+ "<|am|>": 50334,
1505
+ "<|ar|>": 50272,
1506
+ "<|as|>": 50350,
1507
+ "<|az|>": 50304,
1508
+ "<|ba|>": 50355,
1509
+ "<|be|>": 50330,
1510
+ "<|bg|>": 50292,
1511
+ "<|bn|>": 50302,
1512
+ "<|bo|>": 50347,
1513
+ "<|br|>": 50309,
1514
+ "<|bs|>": 50315,
1515
+ "<|ca|>": 50270,
1516
+ "<|cs|>": 50283,
1517
+ "<|cy|>": 50297,
1518
+ "<|da|>": 50285,
1519
+ "<|de|>": 50261,
1520
+ "<|el|>": 50281,
1521
+ "<|en|>": 50259,
1522
+ "<|es|>": 50262,
1523
+ "<|et|>": 50307,
1524
+ "<|eu|>": 50310,
1525
+ "<|fa|>": 50300,
1526
+ "<|fi|>": 50277,
1527
+ "<|fo|>": 50338,
1528
+ "<|fr|>": 50265,
1529
+ "<|gl|>": 50319,
1530
+ "<|gu|>": 50333,
1531
+ "<|haw|>": 50352,
1532
+ "<|ha|>": 50354,
1533
+ "<|he|>": 50279,
1534
+ "<|hi|>": 50276,
1535
+ "<|hr|>": 50291,
1536
+ "<|ht|>": 50339,
1537
+ "<|hu|>": 50286,
1538
+ "<|hy|>": 50312,
1539
+ "<|id|>": 50275,
1540
+ "<|is|>": 50311,
1541
+ "<|it|>": 50274,
1542
+ "<|ja|>": 50266,
1543
+ "<|jw|>": 50356,
1544
+ "<|ka|>": 50329,
1545
+ "<|kk|>": 50316,
1546
+ "<|km|>": 50323,
1547
+ "<|kn|>": 50306,
1548
+ "<|ko|>": 50264,
1549
+ "<|la|>": 50294,
1550
+ "<|lb|>": 50345,
1551
+ "<|ln|>": 50353,
1552
+ "<|lo|>": 50336,
1553
+ "<|lt|>": 50293,
1554
+ "<|lv|>": 50301,
1555
+ "<|mg|>": 50349,
1556
+ "<|mi|>": 50295,
1557
+ "<|mk|>": 50308,
1558
+ "<|ml|>": 50296,
1559
+ "<|mn|>": 50314,
1560
+ "<|mr|>": 50320,
1561
+ "<|ms|>": 50282,
1562
+ "<|mt|>": 50343,
1563
+ "<|my|>": 50346,
1564
+ "<|ne|>": 50313,
1565
+ "<|nl|>": 50271,
1566
+ "<|nn|>": 50342,
1567
+ "<|nocaptions|>": 50362,
1568
+ "<|notimestamps|>": 50363,
1569
+ "<|no|>": 50288,
1570
+ "<|oc|>": 50328,
1571
+ "<|pa|>": 50321,
1572
+ "<|pl|>": 50269,
1573
+ "<|ps|>": 50340,
1574
+ "<|pt|>": 50267,
1575
+ "<|ro|>": 50284,
1576
+ "<|ru|>": 50263,
1577
+ "<|sa|>": 50344,
1578
+ "<|sd|>": 50332,
1579
+ "<|si|>": 50322,
1580
+ "<|sk|>": 50298,
1581
+ "<|sl|>": 50305,
1582
+ "<|sn|>": 50324,
1583
+ "<|so|>": 50326,
1584
+ "<|sq|>": 50317,
1585
+ "<|sr|>": 50303,
1586
+ "<|startoflm|>": 50360,
1587
+ "<|startofprev|>": 50361,
1588
+ "<|startoftranscript|>": 50258,
1589
+ "<|su|>": 50357,
1590
+ "<|sv|>": 50273,
1591
+ "<|sw|>": 50318,
1592
+ "<|ta|>": 50287,
1593
+ "<|te|>": 50299,
1594
+ "<|tg|>": 50331,
1595
+ "<|th|>": 50289,
1596
+ "<|tk|>": 50341,
1597
+ "<|tl|>": 50348,
1598
+ "<|transcribe|>": 50359,
1599
+ "<|translate|>": 50358,
1600
+ "<|tr|>": 50268,
1601
+ "<|tt|>": 50351,
1602
+ "<|uk|>": 50280,
1603
+ "<|ur|>": 50290,
1604
+ "<|uz|>": 50337,
1605
+ "<|vi|>": 50278,
1606
+ "<|yi|>": 50335,
1607
+ "<|yo|>": 50325,
1608
+ "<|zh|>": 50260
1609
+ }
distil-small-init/config.json ADDED
@@ -0,0 +1,389 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "./",
3
+ "activation_dropout": 0.1,
4
+ "activation_function": "gelu",
5
+ "alignment_heads": [
6
+ [
7
+ 5,
8
+ 3
9
+ ],
10
+ [
11
+ 5,
12
+ 9
13
+ ],
14
+ [
15
+ 8,
16
+ 0
17
+ ],
18
+ [
19
+ 8,
20
+ 4
21
+ ],
22
+ [
23
+ 8,
24
+ 7
25
+ ],
26
+ [
27
+ 8,
28
+ 8
29
+ ],
30
+ [
31
+ 9,
32
+ 0
33
+ ],
34
+ [
35
+ 9,
36
+ 7
37
+ ],
38
+ [
39
+ 9,
40
+ 9
41
+ ],
42
+ [
43
+ 10,
44
+ 5
45
+ ]
46
+ ],
47
+ "apply_spec_augment": false,
48
+ "architectures": [
49
+ "WhisperForConditionalGeneration"
50
+ ],
51
+ "attention_dropout": 0,
52
+ "begin_suppress_tokens": [
53
+ 220,
54
+ 50257
55
+ ],
56
+ "bos_token_id": 50257,
57
+ "classifier_proj_size": 256,
58
+ "d_model": 768,
59
+ "decoder_attention_heads": 12,
60
+ "decoder_ffn_dim": 3072,
61
+ "decoder_layerdrop": 0,
62
+ "decoder_layers": 2,
63
+ "decoder_start_token_id": 50258,
64
+ "dropout": 0,
65
+ "encoder_attention_heads": 12,
66
+ "encoder_ffn_dim": 3072,
67
+ "encoder_layerdrop": 0,
68
+ "encoder_layers": 12,
69
+ "eos_token_id": 50257,
70
+ "forced_decoder_ids": [
71
+ [
72
+ 1,
73
+ 50259
74
+ ],
75
+ [
76
+ 2,
77
+ 50359
78
+ ],
79
+ [
80
+ 3,
81
+ 50363
82
+ ]
83
+ ],
84
+ "init_std": 0.02,
85
+ "is_encoder_decoder": true,
86
+ "lang_ids": [
87
+ 50259,
88
+ 50260,
89
+ 50261,
90
+ 50262,
91
+ 50263,
92
+ 50264,
93
+ 50265,
94
+ 50266,
95
+ 50267,
96
+ 50268,
97
+ 50269,
98
+ 50270,
99
+ 50271,
100
+ 50272,
101
+ 50273,
102
+ 50274,
103
+ 50275,
104
+ 50276,
105
+ 50277,
106
+ 50278,
107
+ 50279,
108
+ 50280,
109
+ 50281,
110
+ 50282,
111
+ 50283,
112
+ 50284,
113
+ 50285,
114
+ 50286,
115
+ 50287,
116
+ 50288,
117
+ 50289,
118
+ 50290,
119
+ 50291,
120
+ 50292,
121
+ 50293,
122
+ 50294,
123
+ 50295,
124
+ 50296,
125
+ 50297,
126
+ 50298,
127
+ 50299,
128
+ 50300,
129
+ 50301,
130
+ 50302,
131
+ 50303,
132
+ 50304,
133
+ 50305,
134
+ 50306,
135
+ 50307,
136
+ 50308,
137
+ 50309,
138
+ 50310,
139
+ 50311,
140
+ 50312,
141
+ 50313,
142
+ 50314,
143
+ 50315,
144
+ 50316,
145
+ 50317,
146
+ 50318,
147
+ 50319,
148
+ 50320,
149
+ 50321,
150
+ 50322,
151
+ 50323,
152
+ 50324,
153
+ 50325,
154
+ 50326,
155
+ 50327,
156
+ 50328,
157
+ 50329,
158
+ 50330,
159
+ 50331,
160
+ 50332,
161
+ 50333,
162
+ 50334,
163
+ 50335,
164
+ 50336,
165
+ 50337,
166
+ 50338,
167
+ 50339,
168
+ 50340,
169
+ 50341,
170
+ 50342,
171
+ 50343,
172
+ 50344,
173
+ 50345,
174
+ 50346,
175
+ 50347,
176
+ 50348,
177
+ 50349,
178
+ 50350,
179
+ 50351,
180
+ 50352,
181
+ 50353,
182
+ 50354,
183
+ 50355,
184
+ 50356,
185
+ 50357
186
+ ],
187
+ "mask_feature_length": 10,
188
+ "mask_feature_min_masks": 0,
189
+ "mask_feature_prob": 0,
190
+ "mask_time_length": 10,
191
+ "mask_time_min_masks": 2,
192
+ "mask_time_prob": 0.05,
193
+ "max_length": 448,
194
+ "max_source_positions": 1500,
195
+ "max_target_positions": 448,
196
+ "median_filter_width": 7,
197
+ "model_type": "whisper",
198
+ "num_hidden_layers": 12,
199
+ "num_mel_bins": 80,
200
+ "pad_token_id": 50257,
201
+ "scale_embedding": false,
202
+ "suppress_ids": [
203
+ 1,
204
+ 2,
205
+ 7,
206
+ 8,
207
+ 9,
208
+ 10,
209
+ 14,
210
+ 25,
211
+ 26,
212
+ 27,
213
+ 28,
214
+ 29,
215
+ 31,
216
+ 58,
217
+ 59,
218
+ 60,
219
+ 61,
220
+ 62,
221
+ 63,
222
+ 90,
223
+ 91,
224
+ 92,
225
+ 93,
226
+ 359,
227
+ 503,
228
+ 522,
229
+ 542,
230
+ 873,
231
+ 893,
232
+ 902,
233
+ 918,
234
+ 922,
235
+ 931,
236
+ 1350,
237
+ 1853,
238
+ 1982,
239
+ 2460,
240
+ 2627,
241
+ 3246,
242
+ 3253,
243
+ 3268,
244
+ 3536,
245
+ 3846,
246
+ 3961,
247
+ 4183,
248
+ 4667,
249
+ 6585,
250
+ 6647,
251
+ 7273,
252
+ 9061,
253
+ 9383,
254
+ 10428,
255
+ 10929,
256
+ 11938,
257
+ 12033,
258
+ 12331,
259
+ 12562,
260
+ 13793,
261
+ 14157,
262
+ 14635,
263
+ 15265,
264
+ 15618,
265
+ 16553,
266
+ 16604,
267
+ 18362,
268
+ 18956,
269
+ 20075,
270
+ 21675,
271
+ 22520,
272
+ 26130,
273
+ 26161,
274
+ 26435,
275
+ 28279,
276
+ 29464,
277
+ 31650,
278
+ 32302,
279
+ 32470,
280
+ 36865,
281
+ 42863,
282
+ 47425,
283
+ 49870,
284
+ 50254,
285
+ 50258,
286
+ 50358,
287
+ 50359,
288
+ 50360,
289
+ 50361,
290
+ 50362
291
+ ],
292
+ "suppress_ids_begin": [
293
+ 220,
294
+ 50257
295
+ ],
296
+ "suppress_tokens": [
297
+ 1,
298
+ 2,
299
+ 7,
300
+ 8,
301
+ 9,
302
+ 10,
303
+ 14,
304
+ 25,
305
+ 26,
306
+ 27,
307
+ 28,
308
+ 29,
309
+ 31,
310
+ 58,
311
+ 59,
312
+ 60,
313
+ 61,
314
+ 62,
315
+ 63,
316
+ 90,
317
+ 91,
318
+ 92,
319
+ 93,
320
+ 359,
321
+ 503,
322
+ 522,
323
+ 542,
324
+ 873,
325
+ 893,
326
+ 902,
327
+ 918,
328
+ 922,
329
+ 931,
330
+ 1350,
331
+ 1853,
332
+ 1982,
333
+ 2460,
334
+ 2627,
335
+ 3246,
336
+ 3253,
337
+ 3268,
338
+ 3536,
339
+ 3846,
340
+ 3961,
341
+ 4183,
342
+ 4667,
343
+ 6585,
344
+ 6647,
345
+ 7273,
346
+ 9061,
347
+ 9383,
348
+ 10428,
349
+ 10929,
350
+ 11938,
351
+ 12033,
352
+ 12331,
353
+ 12562,
354
+ 13793,
355
+ 14157,
356
+ 14635,
357
+ 15265,
358
+ 15618,
359
+ 16553,
360
+ 16604,
361
+ 18362,
362
+ 18956,
363
+ 20075,
364
+ 21675,
365
+ 22520,
366
+ 26130,
367
+ 26161,
368
+ 26435,
369
+ 28279,
370
+ 29464,
371
+ 31650,
372
+ 32302,
373
+ 32470,
374
+ 36865,
375
+ 42863,
376
+ 47425,
377
+ 49870,
378
+ 50254,
379
+ 50258,
380
+ 50360,
381
+ 50361,
382
+ 50362
383
+ ],
384
+ "torch_dtype": "float32",
385
+ "transformers_version": "4.46.2",
386
+ "use_cache": true,
387
+ "use_weighted_layer_sum": false,
388
+ "vocab_size": 51865
389
+ }
distil-small-init/flax_model.msgpack ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c5f42290a7b0a45166859ab69095d7499dd170d328c00133bd435d6a5c69cc1a
3
+ size 588938585
distil-small-init/generation_config.json ADDED
@@ -0,0 +1,269 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alignment_heads": [
3
+ [
4
+ 5,
5
+ 3
6
+ ],
7
+ [
8
+ 5,
9
+ 9
10
+ ],
11
+ [
12
+ 8,
13
+ 0
14
+ ],
15
+ [
16
+ 8,
17
+ 4
18
+ ],
19
+ [
20
+ 8,
21
+ 7
22
+ ],
23
+ [
24
+ 8,
25
+ 8
26
+ ],
27
+ [
28
+ 9,
29
+ 0
30
+ ],
31
+ [
32
+ 9,
33
+ 7
34
+ ],
35
+ [
36
+ 9,
37
+ 9
38
+ ],
39
+ [
40
+ 10,
41
+ 5
42
+ ]
43
+ ],
44
+ "begin_suppress_tokens": [
45
+ 220,
46
+ 50257
47
+ ],
48
+ "bos_token_id": 50257,
49
+ "decoder_start_token_id": 50258,
50
+ "eos_token_id": 50257,
51
+ "forced_decoder_ids": [
52
+ [
53
+ 1,
54
+ 50288
55
+ ],
56
+ [
57
+ 2,
58
+ 50359
59
+ ],
60
+ [
61
+ 3,
62
+ 50363
63
+ ]
64
+ ],
65
+ "is_multilingual": true,
66
+ "lang_to_id": {
67
+ "<|af|>": 50327,
68
+ "<|am|>": 50334,
69
+ "<|ar|>": 50272,
70
+ "<|as|>": 50350,
71
+ "<|az|>": 50304,
72
+ "<|ba|>": 50355,
73
+ "<|be|>": 50330,
74
+ "<|bg|>": 50292,
75
+ "<|bn|>": 50302,
76
+ "<|bo|>": 50347,
77
+ "<|br|>": 50309,
78
+ "<|bs|>": 50315,
79
+ "<|ca|>": 50270,
80
+ "<|cs|>": 50283,
81
+ "<|cy|>": 50297,
82
+ "<|da|>": 50285,
83
+ "<|de|>": 50261,
84
+ "<|el|>": 50281,
85
+ "<|en|>": 50259,
86
+ "<|es|>": 50262,
87
+ "<|et|>": 50307,
88
+ "<|eu|>": 50310,
89
+ "<|fa|>": 50300,
90
+ "<|fi|>": 50277,
91
+ "<|fo|>": 50338,
92
+ "<|fr|>": 50265,
93
+ "<|gl|>": 50319,
94
+ "<|gu|>": 50333,
95
+ "<|haw|>": 50352,
96
+ "<|ha|>": 50354,
97
+ "<|he|>": 50279,
98
+ "<|hi|>": 50276,
99
+ "<|hr|>": 50291,
100
+ "<|ht|>": 50339,
101
+ "<|hu|>": 50286,
102
+ "<|hy|>": 50312,
103
+ "<|id|>": 50275,
104
+ "<|is|>": 50311,
105
+ "<|it|>": 50274,
106
+ "<|ja|>": 50266,
107
+ "<|jw|>": 50356,
108
+ "<|ka|>": 50329,
109
+ "<|kk|>": 50316,
110
+ "<|km|>": 50323,
111
+ "<|kn|>": 50306,
112
+ "<|ko|>": 50264,
113
+ "<|la|>": 50294,
114
+ "<|lb|>": 50345,
115
+ "<|ln|>": 50353,
116
+ "<|lo|>": 50336,
117
+ "<|lt|>": 50293,
118
+ "<|lv|>": 50301,
119
+ "<|mg|>": 50349,
120
+ "<|mi|>": 50295,
121
+ "<|mk|>": 50308,
122
+ "<|ml|>": 50296,
123
+ "<|mn|>": 50314,
124
+ "<|mr|>": 50320,
125
+ "<|ms|>": 50282,
126
+ "<|mt|>": 50343,
127
+ "<|my|>": 50346,
128
+ "<|ne|>": 50313,
129
+ "<|nl|>": 50271,
130
+ "<|nn|>": 50342,
131
+ "<|no|>": 50288,
132
+ "<|oc|>": 50328,
133
+ "<|pa|>": 50321,
134
+ "<|pl|>": 50269,
135
+ "<|ps|>": 50340,
136
+ "<|pt|>": 50267,
137
+ "<|ro|>": 50284,
138
+ "<|ru|>": 50263,
139
+ "<|sa|>": 50344,
140
+ "<|sd|>": 50332,
141
+ "<|si|>": 50322,
142
+ "<|sk|>": 50298,
143
+ "<|sl|>": 50305,
144
+ "<|sn|>": 50324,
145
+ "<|so|>": 50326,
146
+ "<|sq|>": 50317,
147
+ "<|sr|>": 50303,
148
+ "<|su|>": 50357,
149
+ "<|sv|>": 50273,
150
+ "<|sw|>": 50318,
151
+ "<|ta|>": 50287,
152
+ "<|te|>": 50299,
153
+ "<|tg|>": 50331,
154
+ "<|th|>": 50289,
155
+ "<|tk|>": 50341,
156
+ "<|tl|>": 50348,
157
+ "<|tr|>": 50268,
158
+ "<|tt|>": 50351,
159
+ "<|uk|>": 50280,
160
+ "<|ur|>": 50290,
161
+ "<|uz|>": 50337,
162
+ "<|vi|>": 50278,
163
+ "<|yi|>": 50335,
164
+ "<|yo|>": 50325,
165
+ "<|zh|>": 50260
166
+ },
167
+ "language": "<|no|>",
168
+ "max_initial_timestamp_index": 1,
169
+ "max_length": 448,
170
+ "no_timestamps_token_id": 50363,
171
+ "pad_token_id": 50257,
172
+ "return_timestamps": false,
173
+ "suppress_tokens": [
174
+ 1,
175
+ 2,
176
+ 7,
177
+ 8,
178
+ 9,
179
+ 10,
180
+ 14,
181
+ 25,
182
+ 26,
183
+ 27,
184
+ 28,
185
+ 29,
186
+ 31,
187
+ 58,
188
+ 59,
189
+ 60,
190
+ 61,
191
+ 62,
192
+ 63,
193
+ 90,
194
+ 91,
195
+ 92,
196
+ 93,
197
+ 359,
198
+ 503,
199
+ 522,
200
+ 542,
201
+ 873,
202
+ 893,
203
+ 902,
204
+ 918,
205
+ 922,
206
+ 931,
207
+ 1350,
208
+ 1853,
209
+ 1982,
210
+ 2460,
211
+ 2627,
212
+ 3246,
213
+ 3253,
214
+ 3268,
215
+ 3536,
216
+ 3846,
217
+ 3961,
218
+ 4183,
219
+ 4667,
220
+ 6585,
221
+ 6647,
222
+ 7273,
223
+ 9061,
224
+ 9383,
225
+ 10428,
226
+ 10929,
227
+ 11938,
228
+ 12033,
229
+ 12331,
230
+ 12562,
231
+ 13793,
232
+ 14157,
233
+ 14635,
234
+ 15265,
235
+ 15618,
236
+ 16553,
237
+ 16604,
238
+ 18362,
239
+ 18956,
240
+ 20075,
241
+ 21675,
242
+ 22520,
243
+ 26130,
244
+ 26161,
245
+ 26435,
246
+ 28279,
247
+ 29464,
248
+ 31650,
249
+ 32302,
250
+ 32470,
251
+ 36865,
252
+ 42863,
253
+ 47425,
254
+ 49870,
255
+ 50254,
256
+ 50258,
257
+ 50358,
258
+ 50359,
259
+ 50360,
260
+ 50361,
261
+ 50362
262
+ ],
263
+ "task": "transcribe",
264
+ "task_to_id": {
265
+ "transcribe": 50359,
266
+ "translate": 50358
267
+ },
268
+ "transformers_version": "4.46.2"
269
+ }
distil-small-init/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
distil-small-init/normalizer.json ADDED
@@ -0,0 +1,1742 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "accessorise": "accessorize",
3
+ "accessorised": "accessorized",
4
+ "accessorises": "accessorizes",
5
+ "accessorising": "accessorizing",
6
+ "acclimatisation": "acclimatization",
7
+ "acclimatise": "acclimatize",
8
+ "acclimatised": "acclimatized",
9
+ "acclimatises": "acclimatizes",
10
+ "acclimatising": "acclimatizing",
11
+ "accoutrements": "accouterments",
12
+ "aeon": "eon",
13
+ "aeons": "eons",
14
+ "aerogramme": "aerogram",
15
+ "aerogrammes": "aerograms",
16
+ "aeroplane": "airplane",
17
+ "aeroplanes": "airplanes",
18
+ "aesthete": "esthete",
19
+ "aesthetes": "esthetes",
20
+ "aesthetic": "esthetic",
21
+ "aesthetically": "esthetically",
22
+ "aesthetics": "esthetics",
23
+ "aetiology": "etiology",
24
+ "ageing": "aging",
25
+ "aggrandisement": "aggrandizement",
26
+ "agonise": "agonize",
27
+ "agonised": "agonized",
28
+ "agonises": "agonizes",
29
+ "agonising": "agonizing",
30
+ "agonisingly": "agonizingly",
31
+ "almanack": "almanac",
32
+ "almanacks": "almanacs",
33
+ "aluminium": "aluminum",
34
+ "amortisable": "amortizable",
35
+ "amortisation": "amortization",
36
+ "amortisations": "amortizations",
37
+ "amortise": "amortize",
38
+ "amortised": "amortized",
39
+ "amortises": "amortizes",
40
+ "amortising": "amortizing",
41
+ "amphitheatre": "amphitheater",
42
+ "amphitheatres": "amphitheaters",
43
+ "anaemia": "anemia",
44
+ "anaemic": "anemic",
45
+ "anaesthesia": "anesthesia",
46
+ "anaesthetic": "anesthetic",
47
+ "anaesthetics": "anesthetics",
48
+ "anaesthetise": "anesthetize",
49
+ "anaesthetised": "anesthetized",
50
+ "anaesthetises": "anesthetizes",
51
+ "anaesthetising": "anesthetizing",
52
+ "anaesthetist": "anesthetist",
53
+ "anaesthetists": "anesthetists",
54
+ "anaesthetize": "anesthetize",
55
+ "anaesthetized": "anesthetized",
56
+ "anaesthetizes": "anesthetizes",
57
+ "anaesthetizing": "anesthetizing",
58
+ "analogue": "analog",
59
+ "analogues": "analogs",
60
+ "analyse": "analyze",
61
+ "analysed": "analyzed",
62
+ "analyses": "analyzes",
63
+ "analysing": "analyzing",
64
+ "anglicise": "anglicize",
65
+ "anglicised": "anglicized",
66
+ "anglicises": "anglicizes",
67
+ "anglicising": "anglicizing",
68
+ "annualised": "annualized",
69
+ "antagonise": "antagonize",
70
+ "antagonised": "antagonized",
71
+ "antagonises": "antagonizes",
72
+ "antagonising": "antagonizing",
73
+ "apologise": "apologize",
74
+ "apologised": "apologized",
75
+ "apologises": "apologizes",
76
+ "apologising": "apologizing",
77
+ "appal": "appall",
78
+ "appals": "appalls",
79
+ "appetiser": "appetizer",
80
+ "appetisers": "appetizers",
81
+ "appetising": "appetizing",
82
+ "appetisingly": "appetizingly",
83
+ "arbour": "arbor",
84
+ "arbours": "arbors",
85
+ "archaeologically": "archeologically",
86
+ "archaeologist": "archeologist",
87
+ "archaeologists": "archeologists",
88
+ "archaeology": "archeology</span>",
89
+ "archeological": "archaeological",
90
+ "ardour": "ardor",
91
+ "armour": "armor",
92
+ "armoured": "armored",
93
+ "armourer": "armorer",
94
+ "armourers": "armorers",
95
+ "armouries": "armories",
96
+ "armoury": "armory",
97
+ "artefact": "artifact",
98
+ "artefacts": "artifacts",
99
+ "authorise": "authorize",
100
+ "authorised": "authorized",
101
+ "authorises": "authorizes",
102
+ "authorising": "authorizing",
103
+ "axe": "ax",
104
+ "backpedalled": "backpedaled",
105
+ "backpedalling": "backpedaling",
106
+ "bannister": "banister",
107
+ "bannisters": "banisters",
108
+ "baptise": "baptize",
109
+ "baptised": "baptized",
110
+ "baptises": "baptizes",
111
+ "baptising": "baptizing",
112
+ "bastardise": "bastardize",
113
+ "bastardised": "bastardized",
114
+ "bastardises": "bastardizes",
115
+ "bastardising": "bastardizing",
116
+ "battleax": "battleaxe",
117
+ "baulk": "balk",
118
+ "baulked": "balked",
119
+ "baulking": "balking",
120
+ "baulks": "balks",
121
+ "bedevilled": "bedeviled",
122
+ "bedevilling": "bedeviling",
123
+ "behaviour": "behavior",
124
+ "behavioural": "behavioral",
125
+ "behaviourism": "behaviorism",
126
+ "behaviourist": "behaviorist",
127
+ "behaviourists": "behaviorists",
128
+ "behaviours": "behaviors",
129
+ "behove": "behoove",
130
+ "behoved": "behooved",
131
+ "behoves": "behooves",
132
+ "bejewelled": "bejeweled",
133
+ "belabour": "belabor",
134
+ "belaboured": "belabored",
135
+ "belabouring": "belaboring",
136
+ "belabours": "belabors",
137
+ "bevelled": "beveled",
138
+ "bevvies": "bevies",
139
+ "bevvy": "bevy",
140
+ "biassed": "biased",
141
+ "biassing": "biasing",
142
+ "bingeing": "binging",
143
+ "bougainvillaea": "bougainvillea",
144
+ "bougainvillaeas": "bougainvilleas",
145
+ "bowdlerise": "bowdlerize",
146
+ "bowdlerised": "bowdlerized",
147
+ "bowdlerises": "bowdlerizes",
148
+ "bowdlerising": "bowdlerizing",
149
+ "breathalyse": "breathalyze",
150
+ "breathalysed": "breathalyzed",
151
+ "breathalyser": "breathalyzer",
152
+ "breathalysers": "breathalyzers",
153
+ "breathalyses": "breathalyzes",
154
+ "breathalysing": "breathalyzing",
155
+ "brutalise": "brutalize",
156
+ "brutalised": "brutalized",
157
+ "brutalises": "brutalizes",
158
+ "brutalising": "brutalizing",
159
+ "busses": "buses",
160
+ "bussing": "busing",
161
+ "caesarean": "cesarean",
162
+ "caesareans": "cesareans",
163
+ "calibre": "caliber",
164
+ "calibres": "calibers",
165
+ "calliper": "caliper",
166
+ "callipers": "calipers",
167
+ "callisthenics": "calisthenics",
168
+ "canalise": "canalize",
169
+ "canalised": "canalized",
170
+ "canalises": "canalizes",
171
+ "canalising": "canalizing",
172
+ "cancelation": "cancellation",
173
+ "cancelations": "cancellations",
174
+ "cancelled": "canceled",
175
+ "cancelling": "canceling",
176
+ "candour": "candor",
177
+ "cannibalise": "cannibalize",
178
+ "cannibalised": "cannibalized",
179
+ "cannibalises": "cannibalizes",
180
+ "cannibalising": "cannibalizing",
181
+ "canonise": "canonize",
182
+ "canonised": "canonized",
183
+ "canonises": "canonizes",
184
+ "canonising": "canonizing",
185
+ "capitalise": "capitalize",
186
+ "capitalised": "capitalized",
187
+ "capitalises": "capitalizes",
188
+ "capitalising": "capitalizing",
189
+ "caramelise": "caramelize",
190
+ "caramelised": "caramelized",
191
+ "caramelises": "caramelizes",
192
+ "caramelising": "caramelizing",
193
+ "carbonise": "carbonize",
194
+ "carbonised": "carbonized",
195
+ "carbonises": "carbonizes",
196
+ "carbonising": "carbonizing",
197
+ "carolled": "caroled",
198
+ "carolling": "caroling",
199
+ "catalogue": "catalog",
200
+ "catalogued": "cataloged",
201
+ "catalogues": "catalogs",
202
+ "cataloguing": "cataloging",
203
+ "catalyse": "catalyze",
204
+ "catalysed": "catalyzed",
205
+ "catalyses": "catalyzes",
206
+ "catalysing": "catalyzing",
207
+ "categorise": "categorize",
208
+ "categorised": "categorized",
209
+ "categorises": "categorizes",
210
+ "categorising": "categorizing",
211
+ "cauterise": "cauterize",
212
+ "cauterised": "cauterized",
213
+ "cauterises": "cauterizes",
214
+ "cauterising": "cauterizing",
215
+ "cavilled": "caviled",
216
+ "cavilling": "caviling",
217
+ "centigramme": "centigram",
218
+ "centigrammes": "centigrams",
219
+ "centilitre": "centiliter",
220
+ "centilitres": "centiliters",
221
+ "centimetre": "centimeter",
222
+ "centimetres": "centimeters",
223
+ "centralise": "centralize",
224
+ "centralised": "centralized",
225
+ "centralises": "centralizes",
226
+ "centralising": "centralizing",
227
+ "centre": "center",
228
+ "centred": "centered",
229
+ "centrefold": "centerfold",
230
+ "centrefolds": "centerfolds",
231
+ "centrepiece": "centerpiece",
232
+ "centrepieces": "centerpieces",
233
+ "centres": "centers",
234
+ "channelled": "channeled",
235
+ "channelling": "channeling",
236
+ "characterise": "characterize",
237
+ "characterised": "characterized",
238
+ "characterises": "characterizes",
239
+ "characterising": "characterizing",
240
+ "cheque": "check",
241
+ "chequebook": "checkbook",
242
+ "chequebooks": "checkbooks",
243
+ "chequered": "checkered",
244
+ "cheques": "checks",
245
+ "chilli": "chili",
246
+ "chimaera": "chimera",
247
+ "chimaeras": "chimeras",
248
+ "chiselled": "chiseled",
249
+ "chiselling": "chiseling",
250
+ "circularise": "circularize",
251
+ "circularised": "circularized",
252
+ "circularises": "circularizes",
253
+ "circularising": "circularizing",
254
+ "civilise": "civilize",
255
+ "civilised": "civilized",
256
+ "civilises": "civilizes",
257
+ "civilising": "civilizing",
258
+ "clamour": "clamor",
259
+ "clamoured": "clamored",
260
+ "clamouring": "clamoring",
261
+ "clamours": "clamors",
262
+ "clangour": "clangor",
263
+ "clarinettist": "clarinetist",
264
+ "clarinettists": "clarinetists",
265
+ "collectivise": "collectivize",
266
+ "collectivised": "collectivized",
267
+ "collectivises": "collectivizes",
268
+ "collectivising": "collectivizing",
269
+ "colonisation": "colonization",
270
+ "colonise": "colonize",
271
+ "colonised": "colonized",
272
+ "coloniser": "colonizer",
273
+ "colonisers": "colonizers",
274
+ "colonises": "colonizes",
275
+ "colonising": "colonizing",
276
+ "colour": "color",
277
+ "colourant": "colorant",
278
+ "colourants": "colorants",
279
+ "coloured": "colored",
280
+ "coloureds": "coloreds",
281
+ "colourful": "colorful",
282
+ "colourfully": "colorfully",
283
+ "colouring": "coloring",
284
+ "colourize": "colorize",
285
+ "colourized": "colorized",
286
+ "colourizes": "colorizes",
287
+ "colourizing": "colorizing",
288
+ "colourless": "colorless",
289
+ "colours": "colors",
290
+ "commercialise": "commercialize",
291
+ "commercialised": "commercialized",
292
+ "commercialises": "commercializes",
293
+ "commercialising": "commercializing",
294
+ "compartmentalise": "compartmentalize",
295
+ "compartmentalised": "compartmentalized",
296
+ "compartmentalises": "compartmentalizes",
297
+ "compartmentalising": "compartmentalizing",
298
+ "computerise": "computerize",
299
+ "computerised": "computerized",
300
+ "computerises": "computerizes",
301
+ "computerising": "computerizing",
302
+ "conceptualise": "conceptualize",
303
+ "conceptualised": "conceptualized",
304
+ "conceptualises": "conceptualizes",
305
+ "conceptualising": "conceptualizing",
306
+ "connexion": "connection",
307
+ "connexions": "connections",
308
+ "contextualise": "contextualize",
309
+ "contextualised": "contextualized",
310
+ "contextualises": "contextualizes",
311
+ "contextualising": "contextualizing",
312
+ "cosier": "cozier",
313
+ "cosies": "cozies",
314
+ "cosiest": "coziest",
315
+ "cosily": "cozily",
316
+ "cosiness": "coziness",
317
+ "cosy": "cozy",
318
+ "councillor": "councilor",
319
+ "councillors": "councilors",
320
+ "counselled": "counseled",
321
+ "counselling": "counseling",
322
+ "counsellor": "counselor",
323
+ "counsellors": "counselors",
324
+ "crenelated": "crenellated",
325
+ "criminalise": "criminalize",
326
+ "criminalised": "criminalized",
327
+ "criminalises": "criminalizes",
328
+ "criminalising": "criminalizing",
329
+ "criticise": "criticize",
330
+ "criticised": "criticized",
331
+ "criticises": "criticizes",
332
+ "criticising": "criticizing",
333
+ "crueller": "crueler",
334
+ "cruellest": "cruelest",
335
+ "crystallisation": "crystallization",
336
+ "crystallise": "crystallize",
337
+ "crystallised": "crystallized",
338
+ "crystallises": "crystallizes",
339
+ "crystallising": "crystallizing",
340
+ "cudgelled": "cudgeled",
341
+ "cudgelling": "cudgeling",
342
+ "customise": "customize",
343
+ "customised": "customized",
344
+ "customises": "customizes",
345
+ "customising": "customizing",
346
+ "cypher": "cipher",
347
+ "cyphers": "ciphers",
348
+ "decentralisation": "decentralization",
349
+ "decentralise": "decentralize",
350
+ "decentralised": "decentralized",
351
+ "decentralises": "decentralizes",
352
+ "decentralising": "decentralizing",
353
+ "decriminalisation": "decriminalization",
354
+ "decriminalise": "decriminalize",
355
+ "decriminalised": "decriminalized",
356
+ "decriminalises": "decriminalizes",
357
+ "decriminalising": "decriminalizing",
358
+ "defence": "defense",
359
+ "defenceless": "defenseless",
360
+ "defences": "defenses",
361
+ "dehumanisation": "dehumanization",
362
+ "dehumanise": "dehumanize",
363
+ "dehumanised": "dehumanized",
364
+ "dehumanises": "dehumanizes",
365
+ "dehumanising": "dehumanizing",
366
+ "demeanour": "demeanor",
367
+ "demilitarisation": "demilitarization",
368
+ "demilitarise": "demilitarize",
369
+ "demilitarised": "demilitarized",
370
+ "demilitarises": "demilitarizes",
371
+ "demilitarising": "demilitarizing",
372
+ "demobilisation": "demobilization",
373
+ "demobilise": "demobilize",
374
+ "demobilised": "demobilized",
375
+ "demobilises": "demobilizes",
376
+ "demobilising": "demobilizing",
377
+ "democratisation": "democratization",
378
+ "democratise": "democratize",
379
+ "democratised": "democratized",
380
+ "democratises": "democratizes",
381
+ "democratising": "democratizing",
382
+ "demonise": "demonize",
383
+ "demonised": "demonized",
384
+ "demonises": "demonizes",
385
+ "demonising": "demonizing",
386
+ "demoralisation": "demoralization",
387
+ "demoralise": "demoralize",
388
+ "demoralised": "demoralized",
389
+ "demoralises": "demoralizes",
390
+ "demoralising": "demoralizing",
391
+ "denationalisation": "denationalization",
392
+ "denationalise": "denationalize",
393
+ "denationalised": "denationalized",
394
+ "denationalises": "denationalizes",
395
+ "denationalising": "denationalizing",
396
+ "deodorise": "deodorize",
397
+ "deodorised": "deodorized",
398
+ "deodorises": "deodorizes",
399
+ "deodorising": "deodorizing",
400
+ "depersonalise": "depersonalize",
401
+ "depersonalised": "depersonalized",
402
+ "depersonalises": "depersonalizes",
403
+ "depersonalising": "depersonalizing",
404
+ "deputise": "deputize",
405
+ "deputised": "deputized",
406
+ "deputises": "deputizes",
407
+ "deputising": "deputizing",
408
+ "desensitisation": "desensitization",
409
+ "desensitise": "desensitize",
410
+ "desensitised": "desensitized",
411
+ "desensitises": "desensitizes",
412
+ "desensitising": "desensitizing",
413
+ "destabilisation": "destabilization",
414
+ "destabilise": "destabilize",
415
+ "destabilised": "destabilized",
416
+ "destabilises": "destabilizes",
417
+ "destabilising": "destabilizing",
418
+ "dialled": "dialed",
419
+ "dialling": "dialing",
420
+ "dialogue": "dialog",
421
+ "dialogues": "dialogs",
422
+ "diarrhoea": "diarrhea",
423
+ "digitise": "digitize",
424
+ "digitised": "digitized",
425
+ "digitises": "digitizes",
426
+ "digitising": "digitizing",
427
+ "disc": "disk",
428
+ "discolour": "discolor",
429
+ "discoloured": "discolored",
430
+ "discolouring": "discoloring",
431
+ "discolours": "discolors",
432
+ "discs": "disks",
433
+ "disembowelled": "disemboweled",
434
+ "disembowelling": "disemboweling",
435
+ "disfavour": "disfavor",
436
+ "dishevelled": "disheveled",
437
+ "dishonour": "dishonor",
438
+ "dishonourable": "dishonorable",
439
+ "dishonourably": "dishonorably",
440
+ "dishonoured": "dishonored",
441
+ "dishonouring": "dishonoring",
442
+ "dishonours": "dishonors",
443
+ "disorganisation": "disorganization",
444
+ "disorganised": "disorganized",
445
+ "distil": "distill",
446
+ "distils": "distills",
447
+ "dramatisation": "dramatization",
448
+ "dramatisations": "dramatizations",
449
+ "dramatise": "dramatize",
450
+ "dramatised": "dramatized",
451
+ "dramatises": "dramatizes",
452
+ "dramatising": "dramatizing",
453
+ "draught": "draft",
454
+ "draughtboard": "draftboard",
455
+ "draughtboards": "draftboards",
456
+ "draughtier": "draftier",
457
+ "draughtiest": "draftiest",
458
+ "draughts": "drafts",
459
+ "draughtsman": "draftsman",
460
+ "draughtsmanship": "draftsmanship",
461
+ "draughtsmen": "draftsmen",
462
+ "draughtswoman": "draftswoman",
463
+ "draughtswomen": "draftswomen",
464
+ "draughty": "drafty",
465
+ "drivelled": "driveled",
466
+ "drivelling": "driveling",
467
+ "duelled": "dueled",
468
+ "duelling": "dueling",
469
+ "economise": "economize",
470
+ "economised": "economized",
471
+ "economises": "economizes",
472
+ "economising": "economizing",
473
+ "editorialise": "editorialize",
474
+ "editorialised": "editorialized",
475
+ "editorialises": "editorializes",
476
+ "editorialising": "editorializing",
477
+ "edoema": "edema",
478
+ "empathise": "empathize",
479
+ "empathised": "empathized",
480
+ "empathises": "empathizes",
481
+ "empathising": "empathizing",
482
+ "emphasise": "emphasize",
483
+ "emphasised": "emphasized",
484
+ "emphasises": "emphasizes",
485
+ "emphasising": "emphasizing",
486
+ "enamelled": "enameled",
487
+ "enamelling": "enameling",
488
+ "enamoured": "enamored",
489
+ "encyclopaedia": "encyclopedia",
490
+ "encyclopaedias": "encyclopedias",
491
+ "encyclopaedic": "encyclopedic",
492
+ "endeavour": "endeavor",
493
+ "endeavoured": "endeavored",
494
+ "endeavouring": "endeavoring",
495
+ "endeavours": "endeavors",
496
+ "energise": "energize",
497
+ "energised": "energized",
498
+ "energises": "energizes",
499
+ "energising": "energizing",
500
+ "enrol": "enroll",
501
+ "enrols": "enrolls",
502
+ "enthral": "enthrall",
503
+ "enthrals": "enthralls",
504
+ "epaulette": "epaulet",
505
+ "epaulettes": "epaulets",
506
+ "epicentre": "epicenter",
507
+ "epicentres": "epicenters",
508
+ "epilogue": "epilog",
509
+ "epilogues": "epilogs",
510
+ "epitomise": "epitomize",
511
+ "epitomised": "epitomized",
512
+ "epitomises": "epitomizes",
513
+ "epitomising": "epitomizing",
514
+ "equalisation": "equalization",
515
+ "equalise": "equalize",
516
+ "equalised": "equalized",
517
+ "equaliser": "equalizer",
518
+ "equalisers": "equalizers",
519
+ "equalises": "equalizes",
520
+ "equalising": "equalizing",
521
+ "eulogise": "eulogize",
522
+ "eulogised": "eulogized",
523
+ "eulogises": "eulogizes",
524
+ "eulogising": "eulogizing",
525
+ "evangelise": "evangelize",
526
+ "evangelised": "evangelized",
527
+ "evangelises": "evangelizes",
528
+ "evangelising": "evangelizing",
529
+ "exorcise": "exorcize",
530
+ "exorcised": "exorcized",
531
+ "exorcises": "exorcizes",
532
+ "exorcising": "exorcizing",
533
+ "extemporisation": "extemporization",
534
+ "extemporise": "extemporize",
535
+ "extemporised": "extemporized",
536
+ "extemporises": "extemporizes",
537
+ "extemporising": "extemporizing",
538
+ "externalisation": "externalization",
539
+ "externalisations": "externalizations",
540
+ "externalise": "externalize",
541
+ "externalised": "externalized",
542
+ "externalises": "externalizes",
543
+ "externalising": "externalizing",
544
+ "factorise": "factorize",
545
+ "factorised": "factorized",
546
+ "factorises": "factorizes",
547
+ "factorising": "factorizing",
548
+ "faecal": "fecal",
549
+ "faeces": "feces",
550
+ "familiarisation": "familiarization",
551
+ "familiarise": "familiarize",
552
+ "familiarised": "familiarized",
553
+ "familiarises": "familiarizes",
554
+ "familiarising": "familiarizing",
555
+ "fantasise": "fantasize",
556
+ "fantasised": "fantasized",
557
+ "fantasises": "fantasizes",
558
+ "fantasising": "fantasizing",
559
+ "favour": "favor",
560
+ "favourable": "favorable",
561
+ "favourably": "favorably",
562
+ "favoured": "favored",
563
+ "favouring": "favoring",
564
+ "favourite": "favorite",
565
+ "favourites": "favorites",
566
+ "favouritism": "favoritism",
567
+ "favours": "favors",
568
+ "feminise": "feminize",
569
+ "feminised": "feminized",
570
+ "feminises": "feminizes",
571
+ "feminising": "feminizing",
572
+ "fertilisation": "fertilization",
573
+ "fertilise": "fertilize",
574
+ "fertilised": "fertilized",
575
+ "fertiliser": "fertilizer",
576
+ "fertilisers": "fertilizers",
577
+ "fertilises": "fertilizes",
578
+ "fertilising": "fertilizing",
579
+ "fervour": "fervor",
580
+ "fibre": "fiber",
581
+ "fibreglass": "fiberglass",
582
+ "fibres": "fibers",
583
+ "fictionalisation": "fictionalization",
584
+ "fictionalisations": "fictionalizations",
585
+ "fictionalise": "fictionalize",
586
+ "fictionalised": "fictionalized",
587
+ "fictionalises": "fictionalizes",
588
+ "fictionalising": "fictionalizing",
589
+ "fillet": "filet",
590
+ "filleted": "fileted",
591
+ "filleting": "fileting",
592
+ "fillets": "filets",
593
+ "finalisation": "finalization",
594
+ "finalise": "finalize",
595
+ "finalised": "finalized",
596
+ "finalises": "finalizes",
597
+ "finalising": "finalizing",
598
+ "flautist": "flutist",
599
+ "flautists": "flutists",
600
+ "flavour": "flavor",
601
+ "flavoured": "flavored",
602
+ "flavouring": "flavoring",
603
+ "flavourings": "flavorings",
604
+ "flavourless": "flavorless",
605
+ "flavours": "flavors",
606
+ "flavoursome": "flavorsome",
607
+ "flyer / flier": "flier / flyer",
608
+ "foetal": "fetal",
609
+ "foetid": "fetid",
610
+ "foetus": "fetus",
611
+ "foetuses": "fetuses",
612
+ "formalisation": "formalization",
613
+ "formalise": "formalize",
614
+ "formalised": "formalized",
615
+ "formalises": "formalizes",
616
+ "formalising": "formalizing",
617
+ "fossilisation": "fossilization",
618
+ "fossilise": "fossilize",
619
+ "fossilised": "fossilized",
620
+ "fossilises": "fossilizes",
621
+ "fossilising": "fossilizing",
622
+ "fraternisation": "fraternization",
623
+ "fraternise": "fraternize",
624
+ "fraternised": "fraternized",
625
+ "fraternises": "fraternizes",
626
+ "fraternising": "fraternizing",
627
+ "fulfil": "fulfill",
628
+ "fulfilment": "fulfillment",
629
+ "fulfils": "fulfills",
630
+ "funnelled": "funneled",
631
+ "funnelling": "funneling",
632
+ "gage": "gauge",
633
+ "gaged": "gauged",
634
+ "gages": "gauges",
635
+ "gaging": "gauging",
636
+ "galvanise": "galvanize",
637
+ "galvanised": "galvanized",
638
+ "galvanises": "galvanizes",
639
+ "galvanising": "galvanizing",
640
+ "gambolled": "gamboled",
641
+ "gambolling": "gamboling",
642
+ "gaol": "jail",
643
+ "gaolbird": "jailbird",
644
+ "gaolbirds": "jailbirds",
645
+ "gaolbreak": "jailbreak",
646
+ "gaolbreaks": "jailbreaks",
647
+ "gaoled": "jailed",
648
+ "gaoler": "jailer",
649
+ "gaolers": "jailers",
650
+ "gaoling": "jailing",
651
+ "gaols": "jails",
652
+ "gasses": "gases",
653
+ "generalisation": "generalization",
654
+ "generalisations": "generalizations",
655
+ "generalise": "generalize",
656
+ "generalised": "generalized",
657
+ "generalises": "generalizes",
658
+ "generalising": "generalizing",
659
+ "ghettoise": "ghettoize",
660
+ "ghettoised": "ghettoized",
661
+ "ghettoises": "ghettoizes",
662
+ "ghettoising": "ghettoizing",
663
+ "gipsies": "gypsies",
664
+ "glamor": "glamour",
665
+ "glamorise": "glamorize",
666
+ "glamorised": "glamorized",
667
+ "glamorises": "glamorizes",
668
+ "glamorising": "glamorizing",
669
+ "globalisation": "globalization",
670
+ "globalise": "globalize",
671
+ "globalised": "globalized",
672
+ "globalises": "globalizes",
673
+ "globalising": "globalizing",
674
+ "glueing": "gluing",
675
+ "goitre": "goiter",
676
+ "goitres": "goiters",
677
+ "gonorrhoea": "gonorrhea",
678
+ "gramme": "gram",
679
+ "grammes": "grams",
680
+ "gravelled": "graveled",
681
+ "grey": "gray",
682
+ "greyed": "grayed",
683
+ "greying": "graying",
684
+ "greyish": "grayish",
685
+ "greyness": "grayness",
686
+ "greys": "grays",
687
+ "grovelled": "groveled",
688
+ "grovelling": "groveling",
689
+ "groyne": "groin",
690
+ "groynes": "groins",
691
+ "gruelling": "grueling",
692
+ "gruellingly": "gruelingly",
693
+ "gryphon": "griffin",
694
+ "gryphons": "griffins",
695
+ "gynaecological": "gynecological",
696
+ "gynaecologist": "gynecologist",
697
+ "gynaecologists": "gynecologists",
698
+ "gynaecology": "gynecology",
699
+ "haematological": "hematological",
700
+ "haematologist": "hematologist",
701
+ "haematologists": "hematologists",
702
+ "haematology": "hematology",
703
+ "haemoglobin": "hemoglobin",
704
+ "haemophilia": "hemophilia",
705
+ "haemophiliac": "hemophiliac",
706
+ "haemophiliacs": "hemophiliacs",
707
+ "haemorrhage": "hemorrhage",
708
+ "haemorrhaged": "hemorrhaged",
709
+ "haemorrhages": "hemorrhages",
710
+ "haemorrhaging": "hemorrhaging",
711
+ "haemorrhoids": "hemorrhoids",
712
+ "harbour": "harbor",
713
+ "harboured": "harbored",
714
+ "harbouring": "harboring",
715
+ "harbours": "harbors",
716
+ "harmonisation": "harmonization",
717
+ "harmonise": "harmonize",
718
+ "harmonised": "harmonized",
719
+ "harmonises": "harmonizes",
720
+ "harmonising": "harmonizing",
721
+ "homoeopath": "homeopath",
722
+ "homoeopathic": "homeopathic",
723
+ "homoeopaths": "homeopaths",
724
+ "homoeopathy": "homeopathy",
725
+ "homogenise": "homogenize",
726
+ "homogenised": "homogenized",
727
+ "homogenises": "homogenizes",
728
+ "homogenising": "homogenizing",
729
+ "honour": "honor",
730
+ "honourable": "honorable",
731
+ "honourably": "honorably",
732
+ "honoured": "honored",
733
+ "honouring": "honoring",
734
+ "honours": "honors",
735
+ "hospitalisation": "hospitalization",
736
+ "hospitalise": "hospitalize",
737
+ "hospitalised": "hospitalized",
738
+ "hospitalises": "hospitalizes",
739
+ "hospitalising": "hospitalizing",
740
+ "humanise": "humanize",
741
+ "humanised": "humanized",
742
+ "humanises": "humanizes",
743
+ "humanising": "humanizing",
744
+ "humour": "humor",
745
+ "humoured": "humored",
746
+ "humouring": "humoring",
747
+ "humourless": "humorless",
748
+ "humours": "humors",
749
+ "hybridise": "hybridize",
750
+ "hybridised": "hybridized",
751
+ "hybridises": "hybridizes",
752
+ "hybridising": "hybridizing",
753
+ "hypnotise": "hypnotize",
754
+ "hypnotised": "hypnotized",
755
+ "hypnotises": "hypnotizes",
756
+ "hypnotising": "hypnotizing",
757
+ "hypothesise": "hypothesize",
758
+ "hypothesised": "hypothesized",
759
+ "hypothesises": "hypothesizes",
760
+ "hypothesising": "hypothesizing",
761
+ "idealisation": "idealization",
762
+ "idealise": "idealize",
763
+ "idealised": "idealized",
764
+ "idealises": "idealizes",
765
+ "idealising": "idealizing",
766
+ "idolise": "idolize",
767
+ "idolised": "idolized",
768
+ "idolises": "idolizes",
769
+ "idolising": "idolizing",
770
+ "immobilisation": "immobilization",
771
+ "immobilise": "immobilize",
772
+ "immobilised": "immobilized",
773
+ "immobiliser": "immobilizer",
774
+ "immobilisers": "immobilizers",
775
+ "immobilises": "immobilizes",
776
+ "immobilising": "immobilizing",
777
+ "immortalise": "immortalize",
778
+ "immortalised": "immortalized",
779
+ "immortalises": "immortalizes",
780
+ "immortalising": "immortalizing",
781
+ "immunisation": "immunization",
782
+ "immunise": "immunize",
783
+ "immunised": "immunized",
784
+ "immunises": "immunizes",
785
+ "immunising": "immunizing",
786
+ "impanelled": "impaneled",
787
+ "impanelling": "impaneling",
788
+ "imperilled": "imperiled",
789
+ "imperilling": "imperiling",
790
+ "individualise": "individualize",
791
+ "individualised": "individualized",
792
+ "individualises": "individualizes",
793
+ "individualising": "individualizing",
794
+ "industrialise": "industrialize",
795
+ "industrialised": "industrialized",
796
+ "industrialises": "industrializes",
797
+ "industrialising": "industrializing",
798
+ "inflexion": "inflection",
799
+ "inflexions": "inflections",
800
+ "initialise": "initialize",
801
+ "initialised": "initialized",
802
+ "initialises": "initializes",
803
+ "initialising": "initializing",
804
+ "initialled": "initialed",
805
+ "initialling": "initialing",
806
+ "instal": "install",
807
+ "instalment": "installment",
808
+ "instalments": "installments",
809
+ "instals": "installs",
810
+ "instil": "instill",
811
+ "instils": "instills",
812
+ "institutionalisation": "institutionalization",
813
+ "institutionalise": "institutionalize",
814
+ "institutionalised": "institutionalized",
815
+ "institutionalises": "institutionalizes",
816
+ "institutionalising": "institutionalizing",
817
+ "intellectualise": "intellectualize",
818
+ "intellectualised": "intellectualized",
819
+ "intellectualises": "intellectualizes",
820
+ "intellectualising": "intellectualizing",
821
+ "internalisation": "internalization",
822
+ "internalise": "internalize",
823
+ "internalised": "internalized",
824
+ "internalises": "internalizes",
825
+ "internalising": "internalizing",
826
+ "internationalisation": "internationalization",
827
+ "internationalise": "internationalize",
828
+ "internationalised": "internationalized",
829
+ "internationalises": "internationalizes",
830
+ "internationalising": "internationalizing",
831
+ "ionisation": "ionization",
832
+ "ionise": "ionize",
833
+ "ionised": "ionized",
834
+ "ioniser": "ionizer",
835
+ "ionisers": "ionizers",
836
+ "ionises": "ionizes",
837
+ "ionising": "ionizing",
838
+ "italicise": "italicize",
839
+ "italicised": "italicized",
840
+ "italicises": "italicizes",
841
+ "italicising": "italicizing",
842
+ "itemise": "itemize",
843
+ "itemised": "itemized",
844
+ "itemises": "itemizes",
845
+ "itemising": "itemizing",
846
+ "jeopardise": "jeopardize",
847
+ "jeopardised": "jeopardized",
848
+ "jeopardises": "jeopardizes",
849
+ "jeopardising": "jeopardizing",
850
+ "jewelled": "jeweled",
851
+ "jeweller": "jeweler",
852
+ "jewellers": "jewelers",
853
+ "jewellery": "jewelry",
854
+ "judgement": "judgment",
855
+ "kilogramme": "kilogram",
856
+ "kilogrammes": "kilograms",
857
+ "kilometre": "kilometer",
858
+ "kilometres": "kilometers",
859
+ "labelled": "labeled",
860
+ "labelling": "labeling",
861
+ "labour": "labor",
862
+ "laboured": "labored",
863
+ "labourer": "laborer",
864
+ "labourers": "laborers",
865
+ "labouring": "laboring",
866
+ "labours": "labors",
867
+ "lacklustre": "lackluster",
868
+ "legalisation": "legalization",
869
+ "legalise": "legalize",
870
+ "legalised": "legalized",
871
+ "legalises": "legalizes",
872
+ "legalising": "legalizing",
873
+ "legitimise": "legitimize",
874
+ "legitimised": "legitimized",
875
+ "legitimises": "legitimizes",
876
+ "legitimising": "legitimizing",
877
+ "leukaemia": "leukemia",
878
+ "levelled": "leveled",
879
+ "leveller": "leveler",
880
+ "levellers": "levelers",
881
+ "levelling": "leveling",
882
+ "libelled": "libeled",
883
+ "libelling": "libeling",
884
+ "libellous": "libelous",
885
+ "liberalisation": "liberalization",
886
+ "liberalise": "liberalize",
887
+ "liberalised": "liberalized",
888
+ "liberalises": "liberalizes",
889
+ "liberalising": "liberalizing",
890
+ "licence": "license",
891
+ "licenced": "licensed",
892
+ "licences": "licenses",
893
+ "licencing": "licensing",
894
+ "likeable": "likable",
895
+ "lionisation": "lionization",
896
+ "lionise": "lionize",
897
+ "lionised": "lionized",
898
+ "lionises": "lionizes",
899
+ "lionising": "lionizing",
900
+ "liquidise": "liquidize",
901
+ "liquidised": "liquidized",
902
+ "liquidiser": "liquidizer",
903
+ "liquidisers": "liquidizers",
904
+ "liquidises": "liquidizes",
905
+ "liquidising": "liquidizing",
906
+ "litre": "liter",
907
+ "litres": "liters",
908
+ "localise": "localize",
909
+ "localised": "localized",
910
+ "localises": "localizes",
911
+ "localising": "localizing",
912
+ "louvre": "louver",
913
+ "louvred": "louvered",
914
+ "louvres": "louvers",
915
+ "lustre": "luster",
916
+ "magnetise": "magnetize",
917
+ "magnetised": "magnetized",
918
+ "magnetises": "magnetizes",
919
+ "magnetising": "magnetizing",
920
+ "manoeuvrability": "maneuverability",
921
+ "manoeuvrable": "maneuverable",
922
+ "manoeuvre": "maneuver",
923
+ "manoeuvred": "maneuvered",
924
+ "manoeuvres": "maneuvers",
925
+ "manoeuvring": "maneuvering",
926
+ "manoeuvrings": "maneuverings",
927
+ "marginalisation": "marginalization",
928
+ "marginalise": "marginalize",
929
+ "marginalised": "marginalized",
930
+ "marginalises": "marginalizes",
931
+ "marginalising": "marginalizing",
932
+ "marshalled": "marshaled",
933
+ "marshalling": "marshaling",
934
+ "marvelled": "marveled",
935
+ "marvelling": "marveling",
936
+ "marvellous": "marvelous",
937
+ "marvellously": "marvelously",
938
+ "materialisation": "materialization",
939
+ "materialise": "materialize",
940
+ "materialised": "materialized",
941
+ "materialises": "materializes",
942
+ "materialising": "materializing",
943
+ "maximisation": "maximization",
944
+ "maximise": "maximize",
945
+ "maximised": "maximized",
946
+ "maximises": "maximizes",
947
+ "maximising": "maximizing",
948
+ "meagre": "meager",
949
+ "mechanisation": "mechanization",
950
+ "mechanise": "mechanize",
951
+ "mechanised": "mechanized",
952
+ "mechanises": "mechanizes",
953
+ "mechanising": "mechanizing",
954
+ "mediaeval": "medieval",
955
+ "memorialise": "memorialize",
956
+ "memorialised": "memorialized",
957
+ "memorialises": "memorializes",
958
+ "memorialising": "memorializing",
959
+ "memorise": "memorize",
960
+ "memorised": "memorized",
961
+ "memorises": "memorizes",
962
+ "memorising": "memorizing",
963
+ "mesmerise": "mesmerize",
964
+ "mesmerised": "mesmerized",
965
+ "mesmerises": "mesmerizes",
966
+ "mesmerising": "mesmerizing",
967
+ "metabolise": "metabolize",
968
+ "metabolised": "metabolized",
969
+ "metabolises": "metabolizes",
970
+ "metabolising": "metabolizing",
971
+ "metre": "meter",
972
+ "metres": "meters",
973
+ "mhm": "hmm",
974
+ "micrometre": "micrometer",
975
+ "micrometres": "micrometers",
976
+ "militarise": "militarize",
977
+ "militarised": "militarized",
978
+ "militarises": "militarizes",
979
+ "militarising": "militarizing",
980
+ "milligramme": "milligram",
981
+ "milligrammes": "milligrams",
982
+ "millilitre": "milliliter",
983
+ "millilitres": "milliliters",
984
+ "millimetre": "millimeter",
985
+ "millimetres": "millimeters",
986
+ "miniaturisation": "miniaturization",
987
+ "miniaturise": "miniaturize",
988
+ "miniaturised": "miniaturized",
989
+ "miniaturises": "miniaturizes",
990
+ "miniaturising": "miniaturizing",
991
+ "minibusses": "minibuses",
992
+ "minimise": "minimize",
993
+ "minimised": "minimized",
994
+ "minimises": "minimizes",
995
+ "minimising": "minimizing",
996
+ "misbehaviour": "misbehavior",
997
+ "misdemeanour": "misdemeanor",
998
+ "misdemeanours": "misdemeanors",
999
+ "misspelt": "misspelled",
1000
+ "mitre": "miter",
1001
+ "mitres": "miters",
1002
+ "mm": "hmm",
1003
+ "mmm": "hmm",
1004
+ "mobilisation": "mobilization",
1005
+ "mobilise": "mobilize",
1006
+ "mobilised": "mobilized",
1007
+ "mobilises": "mobilizes",
1008
+ "mobilising": "mobilizing",
1009
+ "modelled": "modeled",
1010
+ "modeller": "modeler",
1011
+ "modellers": "modelers",
1012
+ "modelling": "modeling",
1013
+ "modernise": "modernize",
1014
+ "modernised": "modernized",
1015
+ "modernises": "modernizes",
1016
+ "modernising": "modernizing",
1017
+ "moisturise": "moisturize",
1018
+ "moisturised": "moisturized",
1019
+ "moisturiser": "moisturizer",
1020
+ "moisturisers": "moisturizers",
1021
+ "moisturises": "moisturizes",
1022
+ "moisturising": "moisturizing",
1023
+ "monologue": "monolog",
1024
+ "monologues": "monologs",
1025
+ "monopolisation": "monopolization",
1026
+ "monopolise": "monopolize",
1027
+ "monopolised": "monopolized",
1028
+ "monopolises": "monopolizes",
1029
+ "monopolising": "monopolizing",
1030
+ "moralise": "moralize",
1031
+ "moralised": "moralized",
1032
+ "moralises": "moralizes",
1033
+ "moralising": "moralizing",
1034
+ "motorised": "motorized",
1035
+ "mould": "mold",
1036
+ "moulded": "molded",
1037
+ "moulder": "molder",
1038
+ "mouldered": "moldered",
1039
+ "mouldering": "moldering",
1040
+ "moulders": "molders",
1041
+ "mouldier": "moldier",
1042
+ "mouldiest": "moldiest",
1043
+ "moulding": "molding",
1044
+ "mouldings": "moldings",
1045
+ "moulds": "molds",
1046
+ "mouldy": "moldy",
1047
+ "moult": "molt",
1048
+ "moulted": "molted",
1049
+ "moulting": "molting",
1050
+ "moults": "molts",
1051
+ "moustache": "mustache",
1052
+ "moustached": "mustached",
1053
+ "moustaches": "mustaches",
1054
+ "moustachioed": "mustachioed",
1055
+ "multicoloured": "multicolored",
1056
+ "nationalisation": "nationalization",
1057
+ "nationalisations": "nationalizations",
1058
+ "nationalise": "nationalize",
1059
+ "nationalised": "nationalized",
1060
+ "nationalises": "nationalizes",
1061
+ "nationalising": "nationalizing",
1062
+ "naturalisation": "naturalization",
1063
+ "naturalise": "naturalize",
1064
+ "naturalised": "naturalized",
1065
+ "naturalises": "naturalizes",
1066
+ "naturalising": "naturalizing",
1067
+ "neighbour": "neighbor",
1068
+ "neighbourhood": "neighborhood",
1069
+ "neighbourhoods": "neighborhoods",
1070
+ "neighbouring": "neighboring",
1071
+ "neighbourliness": "neighborliness",
1072
+ "neighbourly": "neighborly",
1073
+ "neighbours": "neighbors",
1074
+ "neutralisation": "neutralization",
1075
+ "neutralise": "neutralize",
1076
+ "neutralised": "neutralized",
1077
+ "neutralises": "neutralizes",
1078
+ "neutralising": "neutralizing",
1079
+ "normalisation": "normalization",
1080
+ "normalise": "normalize",
1081
+ "normalised": "normalized",
1082
+ "normalises": "normalizes",
1083
+ "normalising": "normalizing",
1084
+ "odour": "odor",
1085
+ "odourless": "odorless",
1086
+ "odours": "odors",
1087
+ "oesophagus": "esophagus",
1088
+ "oesophaguses": "esophaguses",
1089
+ "oestrogen": "estrogen",
1090
+ "offence": "offense",
1091
+ "offences": "offenses",
1092
+ "omelette": "omelet",
1093
+ "omelettes": "omelets",
1094
+ "optimise": "optimize",
1095
+ "optimised": "optimized",
1096
+ "optimises": "optimizes",
1097
+ "optimising": "optimizing",
1098
+ "organisation": "organization",
1099
+ "organisational": "organizational",
1100
+ "organisations": "organizations",
1101
+ "organise": "organize",
1102
+ "organised": "organized",
1103
+ "organiser": "organizer",
1104
+ "organisers": "organizers",
1105
+ "organises": "organizes",
1106
+ "organising": "organizing",
1107
+ "orthopaedic": "orthopedic",
1108
+ "orthopaedics": "orthopedics",
1109
+ "ostracise": "ostracize",
1110
+ "ostracised": "ostracized",
1111
+ "ostracises": "ostracizes",
1112
+ "ostracising": "ostracizing",
1113
+ "outmanoeuvre": "outmaneuver",
1114
+ "outmanoeuvred": "outmaneuvered",
1115
+ "outmanoeuvres": "outmaneuvers",
1116
+ "outmanoeuvring": "outmaneuvering",
1117
+ "overemphasise": "overemphasize",
1118
+ "overemphasised": "overemphasized",
1119
+ "overemphasises": "overemphasizes",
1120
+ "overemphasising": "overemphasizing",
1121
+ "oxidisation": "oxidization",
1122
+ "oxidise": "oxidize",
1123
+ "oxidised": "oxidized",
1124
+ "oxidises": "oxidizes",
1125
+ "oxidising": "oxidizing",
1126
+ "paederast": "pederast",
1127
+ "paederasts": "pederasts",
1128
+ "paediatric": "pediatric",
1129
+ "paediatrician": "pediatrician",
1130
+ "paediatricians": "pediatricians",
1131
+ "paediatrics": "pediatrics",
1132
+ "paedophile": "pedophile",
1133
+ "paedophiles": "pedophiles",
1134
+ "paedophilia": "pedophilia",
1135
+ "palaeolithic": "paleolithic",
1136
+ "palaeontologist": "paleontologist",
1137
+ "palaeontologists": "paleontologists",
1138
+ "palaeontology": "paleontology",
1139
+ "panelled": "paneled",
1140
+ "panelling": "paneling",
1141
+ "panellist": "panelist",
1142
+ "panellists": "panelists",
1143
+ "paralyse": "paralyze",
1144
+ "paralysed": "paralyzed",
1145
+ "paralyses": "paralyzes",
1146
+ "paralysing": "paralyzing",
1147
+ "parcelled": "parceled",
1148
+ "parcelling": "parceling",
1149
+ "parlour": "parlor",
1150
+ "parlours": "parlors",
1151
+ "particularise": "particularize",
1152
+ "particularised": "particularized",
1153
+ "particularises": "particularizes",
1154
+ "particularising": "particularizing",
1155
+ "passivisation": "passivization",
1156
+ "passivise": "passivize",
1157
+ "passivised": "passivized",
1158
+ "passivises": "passivizes",
1159
+ "passivising": "passivizing",
1160
+ "pasteurisation": "pasteurization",
1161
+ "pasteurise": "pasteurize",
1162
+ "pasteurised": "pasteurized",
1163
+ "pasteurises": "pasteurizes",
1164
+ "pasteurising": "pasteurizing",
1165
+ "patronise": "patronize",
1166
+ "patronised": "patronized",
1167
+ "patronises": "patronizes",
1168
+ "patronising": "patronizing",
1169
+ "patronisingly": "patronizingly",
1170
+ "pedalled": "pedaled",
1171
+ "pedalling": "pedaling",
1172
+ "pedestrianisation": "pedestrianization",
1173
+ "pedestrianise": "pedestrianize",
1174
+ "pedestrianised": "pedestrianized",
1175
+ "pedestrianises": "pedestrianizes",
1176
+ "pedestrianising": "pedestrianizing",
1177
+ "penalise": "penalize",
1178
+ "penalised": "penalized",
1179
+ "penalises": "penalizes",
1180
+ "penalising": "penalizing",
1181
+ "pencilled": "penciled",
1182
+ "pencilling": "penciling",
1183
+ "personalise": "personalize",
1184
+ "personalised": "personalized",
1185
+ "personalises": "personalizes",
1186
+ "personalising": "personalizing",
1187
+ "pharmacopoeia": "pharmacopeia",
1188
+ "pharmacopoeias": "pharmacopeias",
1189
+ "philosophise": "philosophize",
1190
+ "philosophised": "philosophized",
1191
+ "philosophises": "philosophizes",
1192
+ "philosophising": "philosophizing",
1193
+ "philtre": "filter",
1194
+ "philtres": "filters",
1195
+ "phoney": "phony",
1196
+ "plagiarise": "plagiarize",
1197
+ "plagiarised": "plagiarized",
1198
+ "plagiarises": "plagiarizes",
1199
+ "plagiarising": "plagiarizing",
1200
+ "plough": "plow",
1201
+ "ploughed": "plowed",
1202
+ "ploughing": "plowing",
1203
+ "ploughman": "plowman",
1204
+ "ploughmen": "plowmen",
1205
+ "ploughs": "plows",
1206
+ "ploughshare": "plowshare",
1207
+ "ploughshares": "plowshares",
1208
+ "polarisation": "polarization",
1209
+ "polarise": "polarize",
1210
+ "polarised": "polarized",
1211
+ "polarises": "polarizes",
1212
+ "polarising": "polarizing",
1213
+ "politicisation": "politicization",
1214
+ "politicise": "politicize",
1215
+ "politicised": "politicized",
1216
+ "politicises": "politicizes",
1217
+ "politicising": "politicizing",
1218
+ "popularisation": "popularization",
1219
+ "popularise": "popularize",
1220
+ "popularised": "popularized",
1221
+ "popularises": "popularizes",
1222
+ "popularising": "popularizing",
1223
+ "pouffe": "pouf",
1224
+ "pouffes": "poufs",
1225
+ "practise": "practice",
1226
+ "practised": "practiced",
1227
+ "practises": "practices",
1228
+ "practising": "practicing",
1229
+ "praesidium": "presidium",
1230
+ "praesidiums": "presidiums",
1231
+ "pressurisation": "pressurization",
1232
+ "pressurise": "pressurize",
1233
+ "pressurised": "pressurized",
1234
+ "pressurises": "pressurizes",
1235
+ "pressurising": "pressurizing",
1236
+ "pretence": "pretense",
1237
+ "pretences": "pretenses",
1238
+ "primaeval": "primeval",
1239
+ "prioritisation": "prioritization",
1240
+ "prioritise": "prioritize",
1241
+ "prioritised": "prioritized",
1242
+ "prioritises": "prioritizes",
1243
+ "prioritising": "prioritizing",
1244
+ "privatisation": "privatization",
1245
+ "privatisations": "privatizations",
1246
+ "privatise": "privatize",
1247
+ "privatised": "privatized",
1248
+ "privatises": "privatizes",
1249
+ "privatising": "privatizing",
1250
+ "professionalisation": "professionalization",
1251
+ "professionalise": "professionalize",
1252
+ "professionalised": "professionalized",
1253
+ "professionalises": "professionalizes",
1254
+ "professionalising": "professionalizing",
1255
+ "programme": "program",
1256
+ "programmes": "programs",
1257
+ "prologue": "prolog",
1258
+ "prologues": "prologs",
1259
+ "propagandise": "propagandize",
1260
+ "propagandised": "propagandized",
1261
+ "propagandises": "propagandizes",
1262
+ "propagandising": "propagandizing",
1263
+ "proselytise": "proselytize",
1264
+ "proselytised": "proselytized",
1265
+ "proselytiser": "proselytizer",
1266
+ "proselytisers": "proselytizers",
1267
+ "proselytises": "proselytizes",
1268
+ "proselytising": "proselytizing",
1269
+ "psychoanalyse": "psychoanalyze",
1270
+ "psychoanalysed": "psychoanalyzed",
1271
+ "psychoanalyses": "psychoanalyzes",
1272
+ "psychoanalysing": "psychoanalyzing",
1273
+ "publicise": "publicize",
1274
+ "publicised": "publicized",
1275
+ "publicises": "publicizes",
1276
+ "publicising": "publicizing",
1277
+ "pulverisation": "pulverization",
1278
+ "pulverise": "pulverize",
1279
+ "pulverised": "pulverized",
1280
+ "pulverises": "pulverizes",
1281
+ "pulverising": "pulverizing",
1282
+ "pummelled": "pummel",
1283
+ "pummelling": "pummeled",
1284
+ "pyjama": "pajama",
1285
+ "pyjamas": "pajamas",
1286
+ "pzazz": "pizzazz",
1287
+ "quarrelled": "quarreled",
1288
+ "quarrelling": "quarreling",
1289
+ "radicalise": "radicalize",
1290
+ "radicalised": "radicalized",
1291
+ "radicalises": "radicalizes",
1292
+ "radicalising": "radicalizing",
1293
+ "rancour": "rancor",
1294
+ "randomise": "randomize",
1295
+ "randomised": "randomized",
1296
+ "randomises": "randomizes",
1297
+ "randomising": "randomizing",
1298
+ "rationalisation": "rationalization",
1299
+ "rationalisations": "rationalizations",
1300
+ "rationalise": "rationalize",
1301
+ "rationalised": "rationalized",
1302
+ "rationalises": "rationalizes",
1303
+ "rationalising": "rationalizing",
1304
+ "ravelled": "raveled",
1305
+ "ravelling": "raveling",
1306
+ "realisable": "realizable",
1307
+ "realisation": "realization",
1308
+ "realisations": "realizations",
1309
+ "realise": "realize",
1310
+ "realised": "realized",
1311
+ "realises": "realizes",
1312
+ "realising": "realizing",
1313
+ "recognisable": "recognizable",
1314
+ "recognisably": "recognizably",
1315
+ "recognisance": "recognizance",
1316
+ "recognise": "recognize",
1317
+ "recognised": "recognized",
1318
+ "recognises": "recognizes",
1319
+ "recognising": "recognizing",
1320
+ "reconnoitre": "reconnoiter",
1321
+ "reconnoitred": "reconnoitered",
1322
+ "reconnoitres": "reconnoiters",
1323
+ "reconnoitring": "reconnoitering",
1324
+ "refuelled": "refueled",
1325
+ "refuelling": "refueling",
1326
+ "regularisation": "regularization",
1327
+ "regularise": "regularize",
1328
+ "regularised": "regularized",
1329
+ "regularises": "regularizes",
1330
+ "regularising": "regularizing",
1331
+ "remodelled": "remodeled",
1332
+ "remodelling": "remodeling",
1333
+ "remould": "remold",
1334
+ "remoulded": "remolded",
1335
+ "remoulding": "remolding",
1336
+ "remoulds": "remolds",
1337
+ "reorganisation": "reorganization",
1338
+ "reorganisations": "reorganizations",
1339
+ "reorganise": "reorganize",
1340
+ "reorganised": "reorganized",
1341
+ "reorganises": "reorganizes",
1342
+ "reorganising": "reorganizing",
1343
+ "revelled": "reveled",
1344
+ "reveller": "reveler",
1345
+ "revellers": "revelers",
1346
+ "revelling": "reveling",
1347
+ "revitalise": "revitalize",
1348
+ "revitalised": "revitalized",
1349
+ "revitalises": "revitalizes",
1350
+ "revitalising": "revitalizing",
1351
+ "revolutionise": "revolutionize",
1352
+ "revolutionised": "revolutionized",
1353
+ "revolutionises": "revolutionizes",
1354
+ "revolutionising": "revolutionizing",
1355
+ "rhapsodise": "rhapsodize",
1356
+ "rhapsodised": "rhapsodized",
1357
+ "rhapsodises": "rhapsodizes",
1358
+ "rhapsodising": "rhapsodizing",
1359
+ "rigour": "rigor",
1360
+ "rigours": "rigors",
1361
+ "ritualised": "ritualized",
1362
+ "rivalled": "rivaled",
1363
+ "rivalling": "rivaling",
1364
+ "romanticise": "romanticize",
1365
+ "romanticised": "romanticized",
1366
+ "romanticises": "romanticizes",
1367
+ "romanticising": "romanticizing",
1368
+ "rumour": "rumor",
1369
+ "rumoured": "rumored",
1370
+ "rumours": "rumors",
1371
+ "sabre": "saber",
1372
+ "sabres": "sabers",
1373
+ "saltpetre": "saltpeter",
1374
+ "sanitise": "sanitize",
1375
+ "sanitised": "sanitized",
1376
+ "sanitises": "sanitizes",
1377
+ "sanitising": "sanitizing",
1378
+ "satirise": "satirize",
1379
+ "satirised": "satirized",
1380
+ "satirises": "satirizes",
1381
+ "satirising": "satirizing",
1382
+ "saviour": "savior",
1383
+ "saviours": "saviors",
1384
+ "savour": "savor",
1385
+ "savoured": "savored",
1386
+ "savouries": "savories",
1387
+ "savouring": "savoring",
1388
+ "savours": "savors",
1389
+ "savoury": "savory",
1390
+ "scandalise": "scandalize",
1391
+ "scandalised": "scandalized",
1392
+ "scandalises": "scandalizes",
1393
+ "scandalising": "scandalizing",
1394
+ "sceptic": "skeptic",
1395
+ "sceptical": "skeptical",
1396
+ "sceptically": "skeptically",
1397
+ "scepticism": "skepticism",
1398
+ "sceptics": "skeptics",
1399
+ "sceptre": "scepter",
1400
+ "sceptres": "scepters",
1401
+ "scrutinise": "scrutinize",
1402
+ "scrutinised": "scrutinized",
1403
+ "scrutinises": "scrutinizes",
1404
+ "scrutinising": "scrutinizing",
1405
+ "secularisation": "secularization",
1406
+ "secularise": "secularize",
1407
+ "secularised": "secularized",
1408
+ "secularises": "secularizes",
1409
+ "secularising": "secularizing",
1410
+ "sensationalise": "sensationalize",
1411
+ "sensationalised": "sensationalized",
1412
+ "sensationalises": "sensationalizes",
1413
+ "sensationalising": "sensationalizing",
1414
+ "sensitise": "sensitize",
1415
+ "sensitised": "sensitized",
1416
+ "sensitises": "sensitizes",
1417
+ "sensitising": "sensitizing",
1418
+ "sentimentalise": "sentimentalize",
1419
+ "sentimentalised": "sentimentalized",
1420
+ "sentimentalises": "sentimentalizes",
1421
+ "sentimentalising": "sentimentalizing",
1422
+ "sepulchre": "sepulcher",
1423
+ "sepulchres": "sepulchers",
1424
+ "serialisation": "serialization",
1425
+ "serialisations": "serializations",
1426
+ "serialise": "serialize",
1427
+ "serialised": "serialized",
1428
+ "serialises": "serializes",
1429
+ "serialising": "serializing",
1430
+ "sermonise": "sermonize",
1431
+ "sermonised": "sermonized",
1432
+ "sermonises": "sermonizes",
1433
+ "sermonising": "sermonizing",
1434
+ "sheikh": "sheik",
1435
+ "shovelled": "shoveled",
1436
+ "shovelling": "shoveling",
1437
+ "shrivelled": "shriveled",
1438
+ "shrivelling": "shriveling",
1439
+ "signalise": "signalize",
1440
+ "signalised": "signalized",
1441
+ "signalises": "signalizes",
1442
+ "signalising": "signalizing",
1443
+ "signalled": "signaled",
1444
+ "signalling": "signaling",
1445
+ "smoulder": "smolder",
1446
+ "smouldered": "smoldered",
1447
+ "smouldering": "smoldering",
1448
+ "smoulders": "smolders",
1449
+ "snivelled": "sniveled",
1450
+ "snivelling": "sniveling",
1451
+ "snorkelled": "snorkeled",
1452
+ "snorkelling": "snorkeling",
1453
+ "snowplough": "snowplow",
1454
+ "snowploughs": "snowplow",
1455
+ "socialisation": "socialization",
1456
+ "socialise": "socialize",
1457
+ "socialised": "socialized",
1458
+ "socialises": "socializes",
1459
+ "socialising": "socializing",
1460
+ "sodomise": "sodomize",
1461
+ "sodomised": "sodomized",
1462
+ "sodomises": "sodomizes",
1463
+ "sodomising": "sodomizing",
1464
+ "solemnise": "solemnize",
1465
+ "solemnised": "solemnized",
1466
+ "solemnises": "solemnizes",
1467
+ "solemnising": "solemnizing",
1468
+ "sombre": "somber",
1469
+ "specialisation": "specialization",
1470
+ "specialisations": "specializations",
1471
+ "specialise": "specialize",
1472
+ "specialised": "specialized",
1473
+ "specialises": "specializes",
1474
+ "specialising": "specializing",
1475
+ "spectre": "specter",
1476
+ "spectres": "specters",
1477
+ "spiralled": "spiraled",
1478
+ "spiralling": "spiraling",
1479
+ "splendour": "splendor",
1480
+ "splendours": "splendors",
1481
+ "squirrelled": "squirreled",
1482
+ "squirrelling": "squirreling",
1483
+ "stabilisation": "stabilization",
1484
+ "stabilise": "stabilize",
1485
+ "stabilised": "stabilized",
1486
+ "stabiliser": "stabilizer",
1487
+ "stabilisers": "stabilizers",
1488
+ "stabilises": "stabilizes",
1489
+ "stabilising": "stabilizing",
1490
+ "standardisation": "standardization",
1491
+ "standardise": "standardize",
1492
+ "standardised": "standardized",
1493
+ "standardises": "standardizes",
1494
+ "standardising": "standardizing",
1495
+ "stencilled": "stenciled",
1496
+ "stencilling": "stenciling",
1497
+ "sterilisation": "sterilization",
1498
+ "sterilisations": "sterilizations",
1499
+ "sterilise": "sterilize",
1500
+ "sterilised": "sterilized",
1501
+ "steriliser": "sterilizer",
1502
+ "sterilisers": "sterilizers",
1503
+ "sterilises": "sterilizes",
1504
+ "sterilising": "sterilizing",
1505
+ "stigmatisation": "stigmatization",
1506
+ "stigmatise": "stigmatize",
1507
+ "stigmatised": "stigmatized",
1508
+ "stigmatises": "stigmatizes",
1509
+ "stigmatising": "stigmatizing",
1510
+ "storey": "story",
1511
+ "storeys": "stories",
1512
+ "subsidisation": "subsidization",
1513
+ "subsidise": "subsidize",
1514
+ "subsidised": "subsidized",
1515
+ "subsidiser": "subsidizer",
1516
+ "subsidisers": "subsidizers",
1517
+ "subsidises": "subsidizes",
1518
+ "subsidising": "subsidizing",
1519
+ "succour": "succor",
1520
+ "succoured": "succored",
1521
+ "succouring": "succoring",
1522
+ "succours": "succors",
1523
+ "sulphate": "sulfate",
1524
+ "sulphates": "sulfates",
1525
+ "sulphide": "sulfide",
1526
+ "sulphides": "sulfides",
1527
+ "sulphur": "sulfur",
1528
+ "sulphurous": "sulfurous",
1529
+ "summarise": "summarize",
1530
+ "summarised": "summarized",
1531
+ "summarises": "summarizes",
1532
+ "summarising": "summarizing",
1533
+ "swivelled": "swiveled",
1534
+ "swivelling": "swiveling",
1535
+ "symbolise": "symbolize",
1536
+ "symbolised": "symbolized",
1537
+ "symbolises": "symbolizes",
1538
+ "symbolising": "symbolizing",
1539
+ "sympathise": "sympathize",
1540
+ "sympathised": "sympathized",
1541
+ "sympathiser": "sympathizer",
1542
+ "sympathisers": "sympathizers",
1543
+ "sympathises": "sympathizes",
1544
+ "sympathising": "sympathizing",
1545
+ "synchronisation": "synchronization",
1546
+ "synchronise": "synchronize",
1547
+ "synchronised": "synchronized",
1548
+ "synchronises": "synchronizes",
1549
+ "synchronising": "synchronizing",
1550
+ "synthesise": "synthesize",
1551
+ "synthesised": "synthesized",
1552
+ "synthesiser": "synthesizer",
1553
+ "synthesisers": "synthesizers",
1554
+ "synthesises": "synthesizes",
1555
+ "synthesising": "synthesizing",
1556
+ "syphon": "siphon",
1557
+ "syphoned": "siphoned",
1558
+ "syphoning": "siphoning",
1559
+ "syphons": "siphons",
1560
+ "systematisation": "systematization",
1561
+ "systematise": "systematize",
1562
+ "systematised": "systematized",
1563
+ "systematises": "systematizes",
1564
+ "systematising": "systematizing",
1565
+ "tantalise": "tantalize",
1566
+ "tantalised": "tantalized",
1567
+ "tantalises": "tantalizes",
1568
+ "tantalising": "tantalizing",
1569
+ "tantalisingly": "tantalizingly",
1570
+ "tasselled": "tasseled",
1571
+ "technicolour": "technicolor",
1572
+ "temporise": "temporize",
1573
+ "temporised": "temporized",
1574
+ "temporises": "temporizes",
1575
+ "temporising": "temporizing",
1576
+ "tenderise": "tenderize",
1577
+ "tenderised": "tenderized",
1578
+ "tenderises": "tenderizes",
1579
+ "tenderising": "tenderizing",
1580
+ "terrorise": "terrorize",
1581
+ "terrorised": "terrorized",
1582
+ "terrorises": "terrorizes",
1583
+ "terrorising": "terrorizing",
1584
+ "theatre": "theater",
1585
+ "theatregoer": "theatergoer",
1586
+ "theatregoers": "theatergoers",
1587
+ "theatres": "theaters",
1588
+ "theorise": "theorize",
1589
+ "theorised": "theorized",
1590
+ "theorises": "theorizes",
1591
+ "theorising": "theorizing",
1592
+ "tonne": "ton",
1593
+ "tonnes": "tons",
1594
+ "towelled": "toweled",
1595
+ "towelling": "toweling",
1596
+ "toxaemia": "toxemia",
1597
+ "tranquillise": "tranquilize",
1598
+ "tranquillised": "tranquilized",
1599
+ "tranquilliser": "tranquilizer",
1600
+ "tranquillisers": "tranquilizers",
1601
+ "tranquillises": "tranquilizes",
1602
+ "tranquillising": "tranquilizing",
1603
+ "tranquillity": "tranquility",
1604
+ "tranquillize": "tranquilize",
1605
+ "tranquillized": "tranquilized",
1606
+ "tranquillizer": "tranquilizer",
1607
+ "tranquillizers": "tranquilizers",
1608
+ "tranquillizes": "tranquilizes",
1609
+ "tranquillizing": "tranquilizing",
1610
+ "tranquilly": "tranquility",
1611
+ "transistorised": "transistorized",
1612
+ "traumatise": "traumatize",
1613
+ "traumatised": "traumatized",
1614
+ "traumatises": "traumatizes",
1615
+ "traumatising": "traumatizing",
1616
+ "travelled": "traveled",
1617
+ "traveller": "traveler",
1618
+ "travellers": "travelers",
1619
+ "travelling": "traveling",
1620
+ "travelog": "travelogue",
1621
+ "travelogs": "travelogues",
1622
+ "trialled": "trialed",
1623
+ "trialling": "trialing",
1624
+ "tricolour": "tricolor",
1625
+ "tricolours": "tricolors",
1626
+ "trivialise": "trivialize",
1627
+ "trivialised": "trivialized",
1628
+ "trivialises": "trivializes",
1629
+ "trivialising": "trivializing",
1630
+ "tumour": "tumor",
1631
+ "tumours": "tumors",
1632
+ "tunnelled": "tunneled",
1633
+ "tunnelling": "tunneling",
1634
+ "tyrannise": "tyrannize",
1635
+ "tyrannised": "tyrannized",
1636
+ "tyrannises": "tyrannizes",
1637
+ "tyrannising": "tyrannizing",
1638
+ "tyre": "tire",
1639
+ "tyres": "tires",
1640
+ "unauthorised": "unauthorized",
1641
+ "uncivilised": "uncivilized",
1642
+ "underutilised": "underutilized",
1643
+ "unequalled": "unequaled",
1644
+ "unfavourable": "unfavorable",
1645
+ "unfavourably": "unfavorably",
1646
+ "unionisation": "unionization",
1647
+ "unionise": "unionize",
1648
+ "unionised": "unionized",
1649
+ "unionises": "unionizes",
1650
+ "unionising": "unionizing",
1651
+ "unorganised": "unorganized",
1652
+ "unravelled": "unraveled",
1653
+ "unravelling": "unraveling",
1654
+ "unrecognisable": "unrecognizable",
1655
+ "unrecognised": "unrecognized",
1656
+ "unrivalled": "unrivaled",
1657
+ "unsavoury": "unsavory",
1658
+ "untrammelled": "untrammeled",
1659
+ "urbanisation": "urbanization",
1660
+ "urbanise": "urbanize",
1661
+ "urbanised": "urbanized",
1662
+ "urbanises": "urbanizes",
1663
+ "urbanising": "urbanizing",
1664
+ "utilisable": "utilizable",
1665
+ "utilisation": "utilization",
1666
+ "utilise": "utilize",
1667
+ "utilised": "utilized",
1668
+ "utilises": "utilizes",
1669
+ "utilising": "utilizing",
1670
+ "valour": "valor",
1671
+ "vandalise": "vandalize",
1672
+ "vandalised": "vandalized",
1673
+ "vandalises": "vandalizes",
1674
+ "vandalising": "vandalizing",
1675
+ "vaporisation": "vaporization",
1676
+ "vaporise": "vaporize",
1677
+ "vaporised": "vaporized",
1678
+ "vaporises": "vaporizes",
1679
+ "vaporising": "vaporizing",
1680
+ "vapour": "vapor",
1681
+ "vapours": "vapors",
1682
+ "verbalise": "verbalize",
1683
+ "verbalised": "verbalized",
1684
+ "verbalises": "verbalizes",
1685
+ "verbalising": "verbalizing",
1686
+ "victimisation": "victimization",
1687
+ "victimise": "victimize",
1688
+ "victimised": "victimized",
1689
+ "victimises": "victimizes",
1690
+ "victimising": "victimizing",
1691
+ "videodisc": "videodisk",
1692
+ "videodiscs": "videodisks",
1693
+ "vigour": "vigor",
1694
+ "visualisation": "visualization",
1695
+ "visualisations": "visualizations",
1696
+ "visualise": "visualize",
1697
+ "visualised": "visualized",
1698
+ "visualises": "visualizes",
1699
+ "visualising": "visualizing",
1700
+ "vocalisation": "vocalization",
1701
+ "vocalisations": "vocalizations",
1702
+ "vocalise": "vocalize",
1703
+ "vocalised": "vocalized",
1704
+ "vocalises": "vocalizes",
1705
+ "vocalising": "vocalizing",
1706
+ "vulcanised": "vulcanized",
1707
+ "vulgarisation": "vulgarization",
1708
+ "vulgarise": "vulgarize",
1709
+ "vulgarised": "vulgarized",
1710
+ "vulgarises": "vulgarizes",
1711
+ "vulgarising": "vulgarizing",
1712
+ "waggon": "wagon",
1713
+ "waggons": "wagons",
1714
+ "watercolour": "watercolor",
1715
+ "watercolours": "watercolors",
1716
+ "weaselled": "weaseled",
1717
+ "weaselling": "weaseling",
1718
+ "westernisation": "westernization",
1719
+ "westernise": "westernize",
1720
+ "westernised": "westernized",
1721
+ "westernises": "westernizes",
1722
+ "westernising": "westernizing",
1723
+ "womanise": "womanize",
1724
+ "womanised": "womanized",
1725
+ "womaniser": "womanizer",
1726
+ "womanisers": "womanizers",
1727
+ "womanises": "womanizes",
1728
+ "womanising": "womanizing",
1729
+ "woollen": "woolen",
1730
+ "woollens": "woolens",
1731
+ "woollies": "woolies",
1732
+ "woolly": "wooly",
1733
+ "worshipped": "worshiped",
1734
+ "worshipper": "worshiper",
1735
+ "worshipping": "worshiping",
1736
+ "yodelled": "yodeled",
1737
+ "yodelling": "yodeling",
1738
+ "yoghourt": "yogurt",
1739
+ "yoghourts": "yogurts",
1740
+ "yoghurt": "yogurt",
1741
+ "yoghurts": "yogurts"
1742
+ }
distil-small-init/preprocessor_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "chunk_length": 30,
3
+ "feature_extractor_type": "WhisperFeatureExtractor",
4
+ "feature_size": 80,
5
+ "hop_length": 160,
6
+ "n_fft": 400,
7
+ "n_samples": 480000,
8
+ "nb_max_frames": 3000,
9
+ "padding_side": "right",
10
+ "padding_value": 0.0,
11
+ "processor_class": "WhisperProcessor",
12
+ "return_attention_mask": false,
13
+ "sampling_rate": 16000
14
+ }
distil-small-init/special_tokens_map.json ADDED
@@ -0,0 +1,139 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|endoftext|>",
4
+ "<|startoftranscript|>",
5
+ "<|en|>",
6
+ "<|zh|>",
7
+ "<|de|>",
8
+ "<|es|>",
9
+ "<|ru|>",
10
+ "<|ko|>",
11
+ "<|fr|>",
12
+ "<|ja|>",
13
+ "<|pt|>",
14
+ "<|tr|>",
15
+ "<|pl|>",
16
+ "<|ca|>",
17
+ "<|nl|>",
18
+ "<|ar|>",
19
+ "<|sv|>",
20
+ "<|it|>",
21
+ "<|id|>",
22
+ "<|hi|>",
23
+ "<|fi|>",
24
+ "<|vi|>",
25
+ "<|he|>",
26
+ "<|uk|>",
27
+ "<|el|>",
28
+ "<|ms|>",
29
+ "<|cs|>",
30
+ "<|ro|>",
31
+ "<|da|>",
32
+ "<|hu|>",
33
+ "<|ta|>",
34
+ "<|no|>",
35
+ "<|th|>",
36
+ "<|ur|>",
37
+ "<|hr|>",
38
+ "<|bg|>",
39
+ "<|lt|>",
40
+ "<|la|>",
41
+ "<|mi|>",
42
+ "<|ml|>",
43
+ "<|cy|>",
44
+ "<|sk|>",
45
+ "<|te|>",
46
+ "<|fa|>",
47
+ "<|lv|>",
48
+ "<|bn|>",
49
+ "<|sr|>",
50
+ "<|az|>",
51
+ "<|sl|>",
52
+ "<|kn|>",
53
+ "<|et|>",
54
+ "<|mk|>",
55
+ "<|br|>",
56
+ "<|eu|>",
57
+ "<|is|>",
58
+ "<|hy|>",
59
+ "<|ne|>",
60
+ "<|mn|>",
61
+ "<|bs|>",
62
+ "<|kk|>",
63
+ "<|sq|>",
64
+ "<|sw|>",
65
+ "<|gl|>",
66
+ "<|mr|>",
67
+ "<|pa|>",
68
+ "<|si|>",
69
+ "<|km|>",
70
+ "<|sn|>",
71
+ "<|yo|>",
72
+ "<|so|>",
73
+ "<|af|>",
74
+ "<|oc|>",
75
+ "<|ka|>",
76
+ "<|be|>",
77
+ "<|tg|>",
78
+ "<|sd|>",
79
+ "<|gu|>",
80
+ "<|am|>",
81
+ "<|yi|>",
82
+ "<|lo|>",
83
+ "<|uz|>",
84
+ "<|fo|>",
85
+ "<|ht|>",
86
+ "<|ps|>",
87
+ "<|tk|>",
88
+ "<|nn|>",
89
+ "<|mt|>",
90
+ "<|sa|>",
91
+ "<|lb|>",
92
+ "<|my|>",
93
+ "<|bo|>",
94
+ "<|tl|>",
95
+ "<|mg|>",
96
+ "<|as|>",
97
+ "<|tt|>",
98
+ "<|haw|>",
99
+ "<|ln|>",
100
+ "<|ha|>",
101
+ "<|ba|>",
102
+ "<|jw|>",
103
+ "<|su|>",
104
+ "<|translate|>",
105
+ "<|transcribe|>",
106
+ "<|startoflm|>",
107
+ "<|startofprev|>",
108
+ "<|nocaptions|>",
109
+ "<|notimestamps|>"
110
+ ],
111
+ "bos_token": {
112
+ "content": "<|endoftext|>",
113
+ "lstrip": false,
114
+ "normalized": true,
115
+ "rstrip": false,
116
+ "single_word": false
117
+ },
118
+ "eos_token": {
119
+ "content": "<|endoftext|>",
120
+ "lstrip": false,
121
+ "normalized": true,
122
+ "rstrip": false,
123
+ "single_word": false
124
+ },
125
+ "pad_token": {
126
+ "content": "<|endoftext|>",
127
+ "lstrip": false,
128
+ "normalized": true,
129
+ "rstrip": false,
130
+ "single_word": false
131
+ },
132
+ "unk_token": {
133
+ "content": "<|endoftext|>",
134
+ "lstrip": false,
135
+ "normalized": true,
136
+ "rstrip": false,
137
+ "single_word": false
138
+ }
139
+ }
distil-small-init/tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff
 
distil-small-init/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
generation_config.json ADDED
@@ -0,0 +1,271 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alignment_heads": [
3
+ [
4
+ 7,
5
+ 0
6
+ ],
7
+ [
8
+ 10,
9
+ 17
10
+ ],
11
+ [
12
+ 12,
13
+ 18
14
+ ],
15
+ [
16
+ 13,
17
+ 12
18
+ ],
19
+ [
20
+ 16,
21
+ 1
22
+ ],
23
+ [
24
+ 17,
25
+ 14
26
+ ],
27
+ [
28
+ 19,
29
+ 11
30
+ ],
31
+ [
32
+ 21,
33
+ 4
34
+ ],
35
+ [
36
+ 24,
37
+ 1
38
+ ],
39
+ [
40
+ 25,
41
+ 6
42
+ ]
43
+ ],
44
+ "begin_suppress_tokens": [
45
+ 220,
46
+ 50257
47
+ ],
48
+ "bos_token_id": 50257,
49
+ "decoder_start_token_id": 50258,
50
+ "eos_token_id": 50257,
51
+ "forced_decoder_ids": [
52
+ [
53
+ 1,
54
+ 50288
55
+ ],
56
+ [
57
+ 2,
58
+ 50360
59
+ ],
60
+ [
61
+ 3,
62
+ 50364
63
+ ]
64
+ ],
65
+ "is_multilingual": true,
66
+ "lang_to_id": {
67
+ "<|af|>": 50327,
68
+ "<|am|>": 50334,
69
+ "<|ar|>": 50272,
70
+ "<|as|>": 50350,
71
+ "<|az|>": 50304,
72
+ "<|ba|>": 50355,
73
+ "<|be|>": 50330,
74
+ "<|bg|>": 50292,
75
+ "<|bn|>": 50302,
76
+ "<|bo|>": 50347,
77
+ "<|br|>": 50309,
78
+ "<|bs|>": 50315,
79
+ "<|ca|>": 50270,
80
+ "<|cs|>": 50283,
81
+ "<|cy|>": 50297,
82
+ "<|da|>": 50285,
83
+ "<|de|>": 50261,
84
+ "<|el|>": 50281,
85
+ "<|en|>": 50259,
86
+ "<|es|>": 50262,
87
+ "<|et|>": 50307,
88
+ "<|eu|>": 50310,
89
+ "<|fa|>": 50300,
90
+ "<|fi|>": 50277,
91
+ "<|fo|>": 50338,
92
+ "<|fr|>": 50265,
93
+ "<|gl|>": 50319,
94
+ "<|gu|>": 50333,
95
+ "<|haw|>": 50352,
96
+ "<|ha|>": 50354,
97
+ "<|he|>": 50279,
98
+ "<|hi|>": 50276,
99
+ "<|hr|>": 50291,
100
+ "<|ht|>": 50339,
101
+ "<|hu|>": 50286,
102
+ "<|hy|>": 50312,
103
+ "<|id|>": 50275,
104
+ "<|is|>": 50311,
105
+ "<|it|>": 50274,
106
+ "<|ja|>": 50266,
107
+ "<|jw|>": 50356,
108
+ "<|ka|>": 50329,
109
+ "<|kk|>": 50316,
110
+ "<|km|>": 50323,
111
+ "<|kn|>": 50306,
112
+ "<|ko|>": 50264,
113
+ "<|la|>": 50294,
114
+ "<|lb|>": 50345,
115
+ "<|ln|>": 50353,
116
+ "<|lo|>": 50336,
117
+ "<|lt|>": 50293,
118
+ "<|lv|>": 50301,
119
+ "<|mg|>": 50349,
120
+ "<|mi|>": 50295,
121
+ "<|mk|>": 50308,
122
+ "<|ml|>": 50296,
123
+ "<|mn|>": 50314,
124
+ "<|mr|>": 50320,
125
+ "<|ms|>": 50282,
126
+ "<|mt|>": 50343,
127
+ "<|my|>": 50346,
128
+ "<|ne|>": 50313,
129
+ "<|nl|>": 50271,
130
+ "<|nn|>": 50342,
131
+ "<|no|>": 50288,
132
+ "<|oc|>": 50328,
133
+ "<|pa|>": 50321,
134
+ "<|pl|>": 50269,
135
+ "<|ps|>": 50340,
136
+ "<|pt|>": 50267,
137
+ "<|ro|>": 50284,
138
+ "<|ru|>": 50263,
139
+ "<|sa|>": 50344,
140
+ "<|sd|>": 50332,
141
+ "<|si|>": 50322,
142
+ "<|sk|>": 50298,
143
+ "<|sl|>": 50305,
144
+ "<|sn|>": 50324,
145
+ "<|so|>": 50326,
146
+ "<|sq|>": 50317,
147
+ "<|sr|>": 50303,
148
+ "<|su|>": 50357,
149
+ "<|sv|>": 50273,
150
+ "<|sw|>": 50318,
151
+ "<|ta|>": 50287,
152
+ "<|te|>": 50299,
153
+ "<|tg|>": 50331,
154
+ "<|th|>": 50289,
155
+ "<|tk|>": 50341,
156
+ "<|tl|>": 50348,
157
+ "<|tr|>": 50268,
158
+ "<|tt|>": 50351,
159
+ "<|uk|>": 50280,
160
+ "<|ur|>": 50290,
161
+ "<|uz|>": 50337,
162
+ "<|vi|>": 50278,
163
+ "<|yi|>": 50335,
164
+ "<|yo|>": 50325,
165
+ "<|yue|>": 50358,
166
+ "<|zh|>": 50260
167
+ },
168
+ "language": "<|no|>",
169
+ "max_initial_timestamp_index": 1,
170
+ "max_length": 448,
171
+ "no_timestamps_token_id": 50364,
172
+ "pad_token_id": 50257,
173
+ "return_timestamps": false,
174
+ "suppress_tokens": [
175
+ 1,
176
+ 2,
177
+ 7,
178
+ 8,
179
+ 9,
180
+ 10,
181
+ 14,
182
+ 25,
183
+ 26,
184
+ 27,
185
+ 28,
186
+ 29,
187
+ 31,
188
+ 58,
189
+ 59,
190
+ 60,
191
+ 61,
192
+ 62,
193
+ 63,
194
+ 90,
195
+ 91,
196
+ 92,
197
+ 93,
198
+ 359,
199
+ 503,
200
+ 522,
201
+ 542,
202
+ 873,
203
+ 893,
204
+ 902,
205
+ 918,
206
+ 922,
207
+ 931,
208
+ 1350,
209
+ 1853,
210
+ 1982,
211
+ 2460,
212
+ 2627,
213
+ 3246,
214
+ 3253,
215
+ 3268,
216
+ 3536,
217
+ 3846,
218
+ 3961,
219
+ 4183,
220
+ 4667,
221
+ 6585,
222
+ 6647,
223
+ 7273,
224
+ 9061,
225
+ 9383,
226
+ 10428,
227
+ 10929,
228
+ 11938,
229
+ 12033,
230
+ 12331,
231
+ 12562,
232
+ 13793,
233
+ 14157,
234
+ 14635,
235
+ 15265,
236
+ 15618,
237
+ 16553,
238
+ 16604,
239
+ 18362,
240
+ 18956,
241
+ 20075,
242
+ 21675,
243
+ 22520,
244
+ 26130,
245
+ 26161,
246
+ 26435,
247
+ 28279,
248
+ 29464,
249
+ 31650,
250
+ 32302,
251
+ 32470,
252
+ 36865,
253
+ 42863,
254
+ 47425,
255
+ 49870,
256
+ 50254,
257
+ 50258,
258
+ 50359,
259
+ 50360,
260
+ 50361,
261
+ 50362,
262
+ 50363
263
+ ],
264
+ "task": "transcribe",
265
+ "task_to_id": {
266
+ "transcribe": 50360,
267
+ "translate": 50359
268
+ },
269
+ "transformers_version": "4.46.2",
270
+ "use_scan": false
271
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
nb-distil-large-init/added_tokens.json ADDED
@@ -0,0 +1,1611 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "<|0.00|>": 50365,
3
+ "<|0.02|>": 50366,
4
+ "<|0.04|>": 50367,
5
+ "<|0.06|>": 50368,
6
+ "<|0.08|>": 50369,
7
+ "<|0.10|>": 50370,
8
+ "<|0.12|>": 50371,
9
+ "<|0.14|>": 50372,
10
+ "<|0.16|>": 50373,
11
+ "<|0.18|>": 50374,
12
+ "<|0.20|>": 50375,
13
+ "<|0.22|>": 50376,
14
+ "<|0.24|>": 50377,
15
+ "<|0.26|>": 50378,
16
+ "<|0.28|>": 50379,
17
+ "<|0.30|>": 50380,
18
+ "<|0.32|>": 50381,
19
+ "<|0.34|>": 50382,
20
+ "<|0.36|>": 50383,
21
+ "<|0.38|>": 50384,
22
+ "<|0.40|>": 50385,
23
+ "<|0.42|>": 50386,
24
+ "<|0.44|>": 50387,
25
+ "<|0.46|>": 50388,
26
+ "<|0.48|>": 50389,
27
+ "<|0.50|>": 50390,
28
+ "<|0.52|>": 50391,
29
+ "<|0.54|>": 50392,
30
+ "<|0.56|>": 50393,
31
+ "<|0.58|>": 50394,
32
+ "<|0.60|>": 50395,
33
+ "<|0.62|>": 50396,
34
+ "<|0.64|>": 50397,
35
+ "<|0.66|>": 50398,
36
+ "<|0.68|>": 50399,
37
+ "<|0.70|>": 50400,
38
+ "<|0.72|>": 50401,
39
+ "<|0.74|>": 50402,
40
+ "<|0.76|>": 50403,
41
+ "<|0.78|>": 50404,
42
+ "<|0.80|>": 50405,
43
+ "<|0.82|>": 50406,
44
+ "<|0.84|>": 50407,
45
+ "<|0.86|>": 50408,
46
+ "<|0.88|>": 50409,
47
+ "<|0.90|>": 50410,
48
+ "<|0.92|>": 50411,
49
+ "<|0.94|>": 50412,
50
+ "<|0.96|>": 50413,
51
+ "<|0.98|>": 50414,
52
+ "<|1.00|>": 50415,
53
+ "<|1.02|>": 50416,
54
+ "<|1.04|>": 50417,
55
+ "<|1.06|>": 50418,
56
+ "<|1.08|>": 50419,
57
+ "<|1.10|>": 50420,
58
+ "<|1.12|>": 50421,
59
+ "<|1.14|>": 50422,
60
+ "<|1.16|>": 50423,
61
+ "<|1.18|>": 50424,
62
+ "<|1.20|>": 50425,
63
+ "<|1.22|>": 50426,
64
+ "<|1.24|>": 50427,
65
+ "<|1.26|>": 50428,
66
+ "<|1.28|>": 50429,
67
+ "<|1.30|>": 50430,
68
+ "<|1.32|>": 50431,
69
+ "<|1.34|>": 50432,
70
+ "<|1.36|>": 50433,
71
+ "<|1.38|>": 50434,
72
+ "<|1.40|>": 50435,
73
+ "<|1.42|>": 50436,
74
+ "<|1.44|>": 50437,
75
+ "<|1.46|>": 50438,
76
+ "<|1.48|>": 50439,
77
+ "<|1.50|>": 50440,
78
+ "<|1.52|>": 50441,
79
+ "<|1.54|>": 50442,
80
+ "<|1.56|>": 50443,
81
+ "<|1.58|>": 50444,
82
+ "<|1.60|>": 50445,
83
+ "<|1.62|>": 50446,
84
+ "<|1.64|>": 50447,
85
+ "<|1.66|>": 50448,
86
+ "<|1.68|>": 50449,
87
+ "<|1.70|>": 50450,
88
+ "<|1.72|>": 50451,
89
+ "<|1.74|>": 50452,
90
+ "<|1.76|>": 50453,
91
+ "<|1.78|>": 50454,
92
+ "<|1.80|>": 50455,
93
+ "<|1.82|>": 50456,
94
+ "<|1.84|>": 50457,
95
+ "<|1.86|>": 50458,
96
+ "<|1.88|>": 50459,
97
+ "<|1.90|>": 50460,
98
+ "<|1.92|>": 50461,
99
+ "<|1.94|>": 50462,
100
+ "<|1.96|>": 50463,
101
+ "<|1.98|>": 50464,
102
+ "<|10.00|>": 50865,
103
+ "<|10.02|>": 50866,
104
+ "<|10.04|>": 50867,
105
+ "<|10.06|>": 50868,
106
+ "<|10.08|>": 50869,
107
+ "<|10.10|>": 50870,
108
+ "<|10.12|>": 50871,
109
+ "<|10.14|>": 50872,
110
+ "<|10.16|>": 50873,
111
+ "<|10.18|>": 50874,
112
+ "<|10.20|>": 50875,
113
+ "<|10.22|>": 50876,
114
+ "<|10.24|>": 50877,
115
+ "<|10.26|>": 50878,
116
+ "<|10.28|>": 50879,
117
+ "<|10.30|>": 50880,
118
+ "<|10.32|>": 50881,
119
+ "<|10.34|>": 50882,
120
+ "<|10.36|>": 50883,
121
+ "<|10.38|>": 50884,
122
+ "<|10.40|>": 50885,
123
+ "<|10.42|>": 50886,
124
+ "<|10.44|>": 50887,
125
+ "<|10.46|>": 50888,
126
+ "<|10.48|>": 50889,
127
+ "<|10.50|>": 50890,
128
+ "<|10.52|>": 50891,
129
+ "<|10.54|>": 50892,
130
+ "<|10.56|>": 50893,
131
+ "<|10.58|>": 50894,
132
+ "<|10.60|>": 50895,
133
+ "<|10.62|>": 50896,
134
+ "<|10.64|>": 50897,
135
+ "<|10.66|>": 50898,
136
+ "<|10.68|>": 50899,
137
+ "<|10.70|>": 50900,
138
+ "<|10.72|>": 50901,
139
+ "<|10.74|>": 50902,
140
+ "<|10.76|>": 50903,
141
+ "<|10.78|>": 50904,
142
+ "<|10.80|>": 50905,
143
+ "<|10.82|>": 50906,
144
+ "<|10.84|>": 50907,
145
+ "<|10.86|>": 50908,
146
+ "<|10.88|>": 50909,
147
+ "<|10.90|>": 50910,
148
+ "<|10.92|>": 50911,
149
+ "<|10.94|>": 50912,
150
+ "<|10.96|>": 50913,
151
+ "<|10.98|>": 50914,
152
+ "<|11.00|>": 50915,
153
+ "<|11.02|>": 50916,
154
+ "<|11.04|>": 50917,
155
+ "<|11.06|>": 50918,
156
+ "<|11.08|>": 50919,
157
+ "<|11.10|>": 50920,
158
+ "<|11.12|>": 50921,
159
+ "<|11.14|>": 50922,
160
+ "<|11.16|>": 50923,
161
+ "<|11.18|>": 50924,
162
+ "<|11.20|>": 50925,
163
+ "<|11.22|>": 50926,
164
+ "<|11.24|>": 50927,
165
+ "<|11.26|>": 50928,
166
+ "<|11.28|>": 50929,
167
+ "<|11.30|>": 50930,
168
+ "<|11.32|>": 50931,
169
+ "<|11.34|>": 50932,
170
+ "<|11.36|>": 50933,
171
+ "<|11.38|>": 50934,
172
+ "<|11.40|>": 50935,
173
+ "<|11.42|>": 50936,
174
+ "<|11.44|>": 50937,
175
+ "<|11.46|>": 50938,
176
+ "<|11.48|>": 50939,
177
+ "<|11.50|>": 50940,
178
+ "<|11.52|>": 50941,
179
+ "<|11.54|>": 50942,
180
+ "<|11.56|>": 50943,
181
+ "<|11.58|>": 50944,
182
+ "<|11.60|>": 50945,
183
+ "<|11.62|>": 50946,
184
+ "<|11.64|>": 50947,
185
+ "<|11.66|>": 50948,
186
+ "<|11.68|>": 50949,
187
+ "<|11.70|>": 50950,
188
+ "<|11.72|>": 50951,
189
+ "<|11.74|>": 50952,
190
+ "<|11.76|>": 50953,
191
+ "<|11.78|>": 50954,
192
+ "<|11.80|>": 50955,
193
+ "<|11.82|>": 50956,
194
+ "<|11.84|>": 50957,
195
+ "<|11.86|>": 50958,
196
+ "<|11.88|>": 50959,
197
+ "<|11.90|>": 50960,
198
+ "<|11.92|>": 50961,
199
+ "<|11.94|>": 50962,
200
+ "<|11.96|>": 50963,
201
+ "<|11.98|>": 50964,
202
+ "<|12.00|>": 50965,
203
+ "<|12.02|>": 50966,
204
+ "<|12.04|>": 50967,
205
+ "<|12.06|>": 50968,
206
+ "<|12.08|>": 50969,
207
+ "<|12.10|>": 50970,
208
+ "<|12.12|>": 50971,
209
+ "<|12.14|>": 50972,
210
+ "<|12.16|>": 50973,
211
+ "<|12.18|>": 50974,
212
+ "<|12.20|>": 50975,
213
+ "<|12.22|>": 50976,
214
+ "<|12.24|>": 50977,
215
+ "<|12.26|>": 50978,
216
+ "<|12.28|>": 50979,
217
+ "<|12.30|>": 50980,
218
+ "<|12.32|>": 50981,
219
+ "<|12.34|>": 50982,
220
+ "<|12.36|>": 50983,
221
+ "<|12.38|>": 50984,
222
+ "<|12.40|>": 50985,
223
+ "<|12.42|>": 50986,
224
+ "<|12.44|>": 50987,
225
+ "<|12.46|>": 50988,
226
+ "<|12.48|>": 50989,
227
+ "<|12.50|>": 50990,
228
+ "<|12.52|>": 50991,
229
+ "<|12.54|>": 50992,
230
+ "<|12.56|>": 50993,
231
+ "<|12.58|>": 50994,
232
+ "<|12.60|>": 50995,
233
+ "<|12.62|>": 50996,
234
+ "<|12.64|>": 50997,
235
+ "<|12.66|>": 50998,
236
+ "<|12.68|>": 50999,
237
+ "<|12.70|>": 51000,
238
+ "<|12.72|>": 51001,
239
+ "<|12.74|>": 51002,
240
+ "<|12.76|>": 51003,
241
+ "<|12.78|>": 51004,
242
+ "<|12.80|>": 51005,
243
+ "<|12.82|>": 51006,
244
+ "<|12.84|>": 51007,
245
+ "<|12.86|>": 51008,
246
+ "<|12.88|>": 51009,
247
+ "<|12.90|>": 51010,
248
+ "<|12.92|>": 51011,
249
+ "<|12.94|>": 51012,
250
+ "<|12.96|>": 51013,
251
+ "<|12.98|>": 51014,
252
+ "<|13.00|>": 51015,
253
+ "<|13.02|>": 51016,
254
+ "<|13.04|>": 51017,
255
+ "<|13.06|>": 51018,
256
+ "<|13.08|>": 51019,
257
+ "<|13.10|>": 51020,
258
+ "<|13.12|>": 51021,
259
+ "<|13.14|>": 51022,
260
+ "<|13.16|>": 51023,
261
+ "<|13.18|>": 51024,
262
+ "<|13.20|>": 51025,
263
+ "<|13.22|>": 51026,
264
+ "<|13.24|>": 51027,
265
+ "<|13.26|>": 51028,
266
+ "<|13.28|>": 51029,
267
+ "<|13.30|>": 51030,
268
+ "<|13.32|>": 51031,
269
+ "<|13.34|>": 51032,
270
+ "<|13.36|>": 51033,
271
+ "<|13.38|>": 51034,
272
+ "<|13.40|>": 51035,
273
+ "<|13.42|>": 51036,
274
+ "<|13.44|>": 51037,
275
+ "<|13.46|>": 51038,
276
+ "<|13.48|>": 51039,
277
+ "<|13.50|>": 51040,
278
+ "<|13.52|>": 51041,
279
+ "<|13.54|>": 51042,
280
+ "<|13.56|>": 51043,
281
+ "<|13.58|>": 51044,
282
+ "<|13.60|>": 51045,
283
+ "<|13.62|>": 51046,
284
+ "<|13.64|>": 51047,
285
+ "<|13.66|>": 51048,
286
+ "<|13.68|>": 51049,
287
+ "<|13.70|>": 51050,
288
+ "<|13.72|>": 51051,
289
+ "<|13.74|>": 51052,
290
+ "<|13.76|>": 51053,
291
+ "<|13.78|>": 51054,
292
+ "<|13.80|>": 51055,
293
+ "<|13.82|>": 51056,
294
+ "<|13.84|>": 51057,
295
+ "<|13.86|>": 51058,
296
+ "<|13.88|>": 51059,
297
+ "<|13.90|>": 51060,
298
+ "<|13.92|>": 51061,
299
+ "<|13.94|>": 51062,
300
+ "<|13.96|>": 51063,
301
+ "<|13.98|>": 51064,
302
+ "<|14.00|>": 51065,
303
+ "<|14.02|>": 51066,
304
+ "<|14.04|>": 51067,
305
+ "<|14.06|>": 51068,
306
+ "<|14.08|>": 51069,
307
+ "<|14.10|>": 51070,
308
+ "<|14.12|>": 51071,
309
+ "<|14.14|>": 51072,
310
+ "<|14.16|>": 51073,
311
+ "<|14.18|>": 51074,
312
+ "<|14.20|>": 51075,
313
+ "<|14.22|>": 51076,
314
+ "<|14.24|>": 51077,
315
+ "<|14.26|>": 51078,
316
+ "<|14.28|>": 51079,
317
+ "<|14.30|>": 51080,
318
+ "<|14.32|>": 51081,
319
+ "<|14.34|>": 51082,
320
+ "<|14.36|>": 51083,
321
+ "<|14.38|>": 51084,
322
+ "<|14.40|>": 51085,
323
+ "<|14.42|>": 51086,
324
+ "<|14.44|>": 51087,
325
+ "<|14.46|>": 51088,
326
+ "<|14.48|>": 51089,
327
+ "<|14.50|>": 51090,
328
+ "<|14.52|>": 51091,
329
+ "<|14.54|>": 51092,
330
+ "<|14.56|>": 51093,
331
+ "<|14.58|>": 51094,
332
+ "<|14.60|>": 51095,
333
+ "<|14.62|>": 51096,
334
+ "<|14.64|>": 51097,
335
+ "<|14.66|>": 51098,
336
+ "<|14.68|>": 51099,
337
+ "<|14.70|>": 51100,
338
+ "<|14.72|>": 51101,
339
+ "<|14.74|>": 51102,
340
+ "<|14.76|>": 51103,
341
+ "<|14.78|>": 51104,
342
+ "<|14.80|>": 51105,
343
+ "<|14.82|>": 51106,
344
+ "<|14.84|>": 51107,
345
+ "<|14.86|>": 51108,
346
+ "<|14.88|>": 51109,
347
+ "<|14.90|>": 51110,
348
+ "<|14.92|>": 51111,
349
+ "<|14.94|>": 51112,
350
+ "<|14.96|>": 51113,
351
+ "<|14.98|>": 51114,
352
+ "<|15.00|>": 51115,
353
+ "<|15.02|>": 51116,
354
+ "<|15.04|>": 51117,
355
+ "<|15.06|>": 51118,
356
+ "<|15.08|>": 51119,
357
+ "<|15.10|>": 51120,
358
+ "<|15.12|>": 51121,
359
+ "<|15.14|>": 51122,
360
+ "<|15.16|>": 51123,
361
+ "<|15.18|>": 51124,
362
+ "<|15.20|>": 51125,
363
+ "<|15.22|>": 51126,
364
+ "<|15.24|>": 51127,
365
+ "<|15.26|>": 51128,
366
+ "<|15.28|>": 51129,
367
+ "<|15.30|>": 51130,
368
+ "<|15.32|>": 51131,
369
+ "<|15.34|>": 51132,
370
+ "<|15.36|>": 51133,
371
+ "<|15.38|>": 51134,
372
+ "<|15.40|>": 51135,
373
+ "<|15.42|>": 51136,
374
+ "<|15.44|>": 51137,
375
+ "<|15.46|>": 51138,
376
+ "<|15.48|>": 51139,
377
+ "<|15.50|>": 51140,
378
+ "<|15.52|>": 51141,
379
+ "<|15.54|>": 51142,
380
+ "<|15.56|>": 51143,
381
+ "<|15.58|>": 51144,
382
+ "<|15.60|>": 51145,
383
+ "<|15.62|>": 51146,
384
+ "<|15.64|>": 51147,
385
+ "<|15.66|>": 51148,
386
+ "<|15.68|>": 51149,
387
+ "<|15.70|>": 51150,
388
+ "<|15.72|>": 51151,
389
+ "<|15.74|>": 51152,
390
+ "<|15.76|>": 51153,
391
+ "<|15.78|>": 51154,
392
+ "<|15.80|>": 51155,
393
+ "<|15.82|>": 51156,
394
+ "<|15.84|>": 51157,
395
+ "<|15.86|>": 51158,
396
+ "<|15.88|>": 51159,
397
+ "<|15.90|>": 51160,
398
+ "<|15.92|>": 51161,
399
+ "<|15.94|>": 51162,
400
+ "<|15.96|>": 51163,
401
+ "<|15.98|>": 51164,
402
+ "<|16.00|>": 51165,
403
+ "<|16.02|>": 51166,
404
+ "<|16.04|>": 51167,
405
+ "<|16.06|>": 51168,
406
+ "<|16.08|>": 51169,
407
+ "<|16.10|>": 51170,
408
+ "<|16.12|>": 51171,
409
+ "<|16.14|>": 51172,
410
+ "<|16.16|>": 51173,
411
+ "<|16.18|>": 51174,
412
+ "<|16.20|>": 51175,
413
+ "<|16.22|>": 51176,
414
+ "<|16.24|>": 51177,
415
+ "<|16.26|>": 51178,
416
+ "<|16.28|>": 51179,
417
+ "<|16.30|>": 51180,
418
+ "<|16.32|>": 51181,
419
+ "<|16.34|>": 51182,
420
+ "<|16.36|>": 51183,
421
+ "<|16.38|>": 51184,
422
+ "<|16.40|>": 51185,
423
+ "<|16.42|>": 51186,
424
+ "<|16.44|>": 51187,
425
+ "<|16.46|>": 51188,
426
+ "<|16.48|>": 51189,
427
+ "<|16.50|>": 51190,
428
+ "<|16.52|>": 51191,
429
+ "<|16.54|>": 51192,
430
+ "<|16.56|>": 51193,
431
+ "<|16.58|>": 51194,
432
+ "<|16.60|>": 51195,
433
+ "<|16.62|>": 51196,
434
+ "<|16.64|>": 51197,
435
+ "<|16.66|>": 51198,
436
+ "<|16.68|>": 51199,
437
+ "<|16.70|>": 51200,
438
+ "<|16.72|>": 51201,
439
+ "<|16.74|>": 51202,
440
+ "<|16.76|>": 51203,
441
+ "<|16.78|>": 51204,
442
+ "<|16.80|>": 51205,
443
+ "<|16.82|>": 51206,
444
+ "<|16.84|>": 51207,
445
+ "<|16.86|>": 51208,
446
+ "<|16.88|>": 51209,
447
+ "<|16.90|>": 51210,
448
+ "<|16.92|>": 51211,
449
+ "<|16.94|>": 51212,
450
+ "<|16.96|>": 51213,
451
+ "<|16.98|>": 51214,
452
+ "<|17.00|>": 51215,
453
+ "<|17.02|>": 51216,
454
+ "<|17.04|>": 51217,
455
+ "<|17.06|>": 51218,
456
+ "<|17.08|>": 51219,
457
+ "<|17.10|>": 51220,
458
+ "<|17.12|>": 51221,
459
+ "<|17.14|>": 51222,
460
+ "<|17.16|>": 51223,
461
+ "<|17.18|>": 51224,
462
+ "<|17.20|>": 51225,
463
+ "<|17.22|>": 51226,
464
+ "<|17.24|>": 51227,
465
+ "<|17.26|>": 51228,
466
+ "<|17.28|>": 51229,
467
+ "<|17.30|>": 51230,
468
+ "<|17.32|>": 51231,
469
+ "<|17.34|>": 51232,
470
+ "<|17.36|>": 51233,
471
+ "<|17.38|>": 51234,
472
+ "<|17.40|>": 51235,
473
+ "<|17.42|>": 51236,
474
+ "<|17.44|>": 51237,
475
+ "<|17.46|>": 51238,
476
+ "<|17.48|>": 51239,
477
+ "<|17.50|>": 51240,
478
+ "<|17.52|>": 51241,
479
+ "<|17.54|>": 51242,
480
+ "<|17.56|>": 51243,
481
+ "<|17.58|>": 51244,
482
+ "<|17.60|>": 51245,
483
+ "<|17.62|>": 51246,
484
+ "<|17.64|>": 51247,
485
+ "<|17.66|>": 51248,
486
+ "<|17.68|>": 51249,
487
+ "<|17.70|>": 51250,
488
+ "<|17.72|>": 51251,
489
+ "<|17.74|>": 51252,
490
+ "<|17.76|>": 51253,
491
+ "<|17.78|>": 51254,
492
+ "<|17.80|>": 51255,
493
+ "<|17.82|>": 51256,
494
+ "<|17.84|>": 51257,
495
+ "<|17.86|>": 51258,
496
+ "<|17.88|>": 51259,
497
+ "<|17.90|>": 51260,
498
+ "<|17.92|>": 51261,
499
+ "<|17.94|>": 51262,
500
+ "<|17.96|>": 51263,
501
+ "<|17.98|>": 51264,
502
+ "<|18.00|>": 51265,
503
+ "<|18.02|>": 51266,
504
+ "<|18.04|>": 51267,
505
+ "<|18.06|>": 51268,
506
+ "<|18.08|>": 51269,
507
+ "<|18.10|>": 51270,
508
+ "<|18.12|>": 51271,
509
+ "<|18.14|>": 51272,
510
+ "<|18.16|>": 51273,
511
+ "<|18.18|>": 51274,
512
+ "<|18.20|>": 51275,
513
+ "<|18.22|>": 51276,
514
+ "<|18.24|>": 51277,
515
+ "<|18.26|>": 51278,
516
+ "<|18.28|>": 51279,
517
+ "<|18.30|>": 51280,
518
+ "<|18.32|>": 51281,
519
+ "<|18.34|>": 51282,
520
+ "<|18.36|>": 51283,
521
+ "<|18.38|>": 51284,
522
+ "<|18.40|>": 51285,
523
+ "<|18.42|>": 51286,
524
+ "<|18.44|>": 51287,
525
+ "<|18.46|>": 51288,
526
+ "<|18.48|>": 51289,
527
+ "<|18.50|>": 51290,
528
+ "<|18.52|>": 51291,
529
+ "<|18.54|>": 51292,
530
+ "<|18.56|>": 51293,
531
+ "<|18.58|>": 51294,
532
+ "<|18.60|>": 51295,
533
+ "<|18.62|>": 51296,
534
+ "<|18.64|>": 51297,
535
+ "<|18.66|>": 51298,
536
+ "<|18.68|>": 51299,
537
+ "<|18.70|>": 51300,
538
+ "<|18.72|>": 51301,
539
+ "<|18.74|>": 51302,
540
+ "<|18.76|>": 51303,
541
+ "<|18.78|>": 51304,
542
+ "<|18.80|>": 51305,
543
+ "<|18.82|>": 51306,
544
+ "<|18.84|>": 51307,
545
+ "<|18.86|>": 51308,
546
+ "<|18.88|>": 51309,
547
+ "<|18.90|>": 51310,
548
+ "<|18.92|>": 51311,
549
+ "<|18.94|>": 51312,
550
+ "<|18.96|>": 51313,
551
+ "<|18.98|>": 51314,
552
+ "<|19.00|>": 51315,
553
+ "<|19.02|>": 51316,
554
+ "<|19.04|>": 51317,
555
+ "<|19.06|>": 51318,
556
+ "<|19.08|>": 51319,
557
+ "<|19.10|>": 51320,
558
+ "<|19.12|>": 51321,
559
+ "<|19.14|>": 51322,
560
+ "<|19.16|>": 51323,
561
+ "<|19.18|>": 51324,
562
+ "<|19.20|>": 51325,
563
+ "<|19.22|>": 51326,
564
+ "<|19.24|>": 51327,
565
+ "<|19.26|>": 51328,
566
+ "<|19.28|>": 51329,
567
+ "<|19.30|>": 51330,
568
+ "<|19.32|>": 51331,
569
+ "<|19.34|>": 51332,
570
+ "<|19.36|>": 51333,
571
+ "<|19.38|>": 51334,
572
+ "<|19.40|>": 51335,
573
+ "<|19.42|>": 51336,
574
+ "<|19.44|>": 51337,
575
+ "<|19.46|>": 51338,
576
+ "<|19.48|>": 51339,
577
+ "<|19.50|>": 51340,
578
+ "<|19.52|>": 51341,
579
+ "<|19.54|>": 51342,
580
+ "<|19.56|>": 51343,
581
+ "<|19.58|>": 51344,
582
+ "<|19.60|>": 51345,
583
+ "<|19.62|>": 51346,
584
+ "<|19.64|>": 51347,
585
+ "<|19.66|>": 51348,
586
+ "<|19.68|>": 51349,
587
+ "<|19.70|>": 51350,
588
+ "<|19.72|>": 51351,
589
+ "<|19.74|>": 51352,
590
+ "<|19.76|>": 51353,
591
+ "<|19.78|>": 51354,
592
+ "<|19.80|>": 51355,
593
+ "<|19.82|>": 51356,
594
+ "<|19.84|>": 51357,
595
+ "<|19.86|>": 51358,
596
+ "<|19.88|>": 51359,
597
+ "<|19.90|>": 51360,
598
+ "<|19.92|>": 51361,
599
+ "<|19.94|>": 51362,
600
+ "<|19.96|>": 51363,
601
+ "<|19.98|>": 51364,
602
+ "<|2.00|>": 50465,
603
+ "<|2.02|>": 50466,
604
+ "<|2.04|>": 50467,
605
+ "<|2.06|>": 50468,
606
+ "<|2.08|>": 50469,
607
+ "<|2.10|>": 50470,
608
+ "<|2.12|>": 50471,
609
+ "<|2.14|>": 50472,
610
+ "<|2.16|>": 50473,
611
+ "<|2.18|>": 50474,
612
+ "<|2.20|>": 50475,
613
+ "<|2.22|>": 50476,
614
+ "<|2.24|>": 50477,
615
+ "<|2.26|>": 50478,
616
+ "<|2.28|>": 50479,
617
+ "<|2.30|>": 50480,
618
+ "<|2.32|>": 50481,
619
+ "<|2.34|>": 50482,
620
+ "<|2.36|>": 50483,
621
+ "<|2.38|>": 50484,
622
+ "<|2.40|>": 50485,
623
+ "<|2.42|>": 50486,
624
+ "<|2.44|>": 50487,
625
+ "<|2.46|>": 50488,
626
+ "<|2.48|>": 50489,
627
+ "<|2.50|>": 50490,
628
+ "<|2.52|>": 50491,
629
+ "<|2.54|>": 50492,
630
+ "<|2.56|>": 50493,
631
+ "<|2.58|>": 50494,
632
+ "<|2.60|>": 50495,
633
+ "<|2.62|>": 50496,
634
+ "<|2.64|>": 50497,
635
+ "<|2.66|>": 50498,
636
+ "<|2.68|>": 50499,
637
+ "<|2.70|>": 50500,
638
+ "<|2.72|>": 50501,
639
+ "<|2.74|>": 50502,
640
+ "<|2.76|>": 50503,
641
+ "<|2.78|>": 50504,
642
+ "<|2.80|>": 50505,
643
+ "<|2.82|>": 50506,
644
+ "<|2.84|>": 50507,
645
+ "<|2.86|>": 50508,
646
+ "<|2.88|>": 50509,
647
+ "<|2.90|>": 50510,
648
+ "<|2.92|>": 50511,
649
+ "<|2.94|>": 50512,
650
+ "<|2.96|>": 50513,
651
+ "<|2.98|>": 50514,
652
+ "<|20.00|>": 51365,
653
+ "<|20.02|>": 51366,
654
+ "<|20.04|>": 51367,
655
+ "<|20.06|>": 51368,
656
+ "<|20.08|>": 51369,
657
+ "<|20.10|>": 51370,
658
+ "<|20.12|>": 51371,
659
+ "<|20.14|>": 51372,
660
+ "<|20.16|>": 51373,
661
+ "<|20.18|>": 51374,
662
+ "<|20.20|>": 51375,
663
+ "<|20.22|>": 51376,
664
+ "<|20.24|>": 51377,
665
+ "<|20.26|>": 51378,
666
+ "<|20.28|>": 51379,
667
+ "<|20.30|>": 51380,
668
+ "<|20.32|>": 51381,
669
+ "<|20.34|>": 51382,
670
+ "<|20.36|>": 51383,
671
+ "<|20.38|>": 51384,
672
+ "<|20.40|>": 51385,
673
+ "<|20.42|>": 51386,
674
+ "<|20.44|>": 51387,
675
+ "<|20.46|>": 51388,
676
+ "<|20.48|>": 51389,
677
+ "<|20.50|>": 51390,
678
+ "<|20.52|>": 51391,
679
+ "<|20.54|>": 51392,
680
+ "<|20.56|>": 51393,
681
+ "<|20.58|>": 51394,
682
+ "<|20.60|>": 51395,
683
+ "<|20.62|>": 51396,
684
+ "<|20.64|>": 51397,
685
+ "<|20.66|>": 51398,
686
+ "<|20.68|>": 51399,
687
+ "<|20.70|>": 51400,
688
+ "<|20.72|>": 51401,
689
+ "<|20.74|>": 51402,
690
+ "<|20.76|>": 51403,
691
+ "<|20.78|>": 51404,
692
+ "<|20.80|>": 51405,
693
+ "<|20.82|>": 51406,
694
+ "<|20.84|>": 51407,
695
+ "<|20.86|>": 51408,
696
+ "<|20.88|>": 51409,
697
+ "<|20.90|>": 51410,
698
+ "<|20.92|>": 51411,
699
+ "<|20.94|>": 51412,
700
+ "<|20.96|>": 51413,
701
+ "<|20.98|>": 51414,
702
+ "<|21.00|>": 51415,
703
+ "<|21.02|>": 51416,
704
+ "<|21.04|>": 51417,
705
+ "<|21.06|>": 51418,
706
+ "<|21.08|>": 51419,
707
+ "<|21.10|>": 51420,
708
+ "<|21.12|>": 51421,
709
+ "<|21.14|>": 51422,
710
+ "<|21.16|>": 51423,
711
+ "<|21.18|>": 51424,
712
+ "<|21.20|>": 51425,
713
+ "<|21.22|>": 51426,
714
+ "<|21.24|>": 51427,
715
+ "<|21.26|>": 51428,
716
+ "<|21.28|>": 51429,
717
+ "<|21.30|>": 51430,
718
+ "<|21.32|>": 51431,
719
+ "<|21.34|>": 51432,
720
+ "<|21.36|>": 51433,
721
+ "<|21.38|>": 51434,
722
+ "<|21.40|>": 51435,
723
+ "<|21.42|>": 51436,
724
+ "<|21.44|>": 51437,
725
+ "<|21.46|>": 51438,
726
+ "<|21.48|>": 51439,
727
+ "<|21.50|>": 51440,
728
+ "<|21.52|>": 51441,
729
+ "<|21.54|>": 51442,
730
+ "<|21.56|>": 51443,
731
+ "<|21.58|>": 51444,
732
+ "<|21.60|>": 51445,
733
+ "<|21.62|>": 51446,
734
+ "<|21.64|>": 51447,
735
+ "<|21.66|>": 51448,
736
+ "<|21.68|>": 51449,
737
+ "<|21.70|>": 51450,
738
+ "<|21.72|>": 51451,
739
+ "<|21.74|>": 51452,
740
+ "<|21.76|>": 51453,
741
+ "<|21.78|>": 51454,
742
+ "<|21.80|>": 51455,
743
+ "<|21.82|>": 51456,
744
+ "<|21.84|>": 51457,
745
+ "<|21.86|>": 51458,
746
+ "<|21.88|>": 51459,
747
+ "<|21.90|>": 51460,
748
+ "<|21.92|>": 51461,
749
+ "<|21.94|>": 51462,
750
+ "<|21.96|>": 51463,
751
+ "<|21.98|>": 51464,
752
+ "<|22.00|>": 51465,
753
+ "<|22.02|>": 51466,
754
+ "<|22.04|>": 51467,
755
+ "<|22.06|>": 51468,
756
+ "<|22.08|>": 51469,
757
+ "<|22.10|>": 51470,
758
+ "<|22.12|>": 51471,
759
+ "<|22.14|>": 51472,
760
+ "<|22.16|>": 51473,
761
+ "<|22.18|>": 51474,
762
+ "<|22.20|>": 51475,
763
+ "<|22.22|>": 51476,
764
+ "<|22.24|>": 51477,
765
+ "<|22.26|>": 51478,
766
+ "<|22.28|>": 51479,
767
+ "<|22.30|>": 51480,
768
+ "<|22.32|>": 51481,
769
+ "<|22.34|>": 51482,
770
+ "<|22.36|>": 51483,
771
+ "<|22.38|>": 51484,
772
+ "<|22.40|>": 51485,
773
+ "<|22.42|>": 51486,
774
+ "<|22.44|>": 51487,
775
+ "<|22.46|>": 51488,
776
+ "<|22.48|>": 51489,
777
+ "<|22.50|>": 51490,
778
+ "<|22.52|>": 51491,
779
+ "<|22.54|>": 51492,
780
+ "<|22.56|>": 51493,
781
+ "<|22.58|>": 51494,
782
+ "<|22.60|>": 51495,
783
+ "<|22.62|>": 51496,
784
+ "<|22.64|>": 51497,
785
+ "<|22.66|>": 51498,
786
+ "<|22.68|>": 51499,
787
+ "<|22.70|>": 51500,
788
+ "<|22.72|>": 51501,
789
+ "<|22.74|>": 51502,
790
+ "<|22.76|>": 51503,
791
+ "<|22.78|>": 51504,
792
+ "<|22.80|>": 51505,
793
+ "<|22.82|>": 51506,
794
+ "<|22.84|>": 51507,
795
+ "<|22.86|>": 51508,
796
+ "<|22.88|>": 51509,
797
+ "<|22.90|>": 51510,
798
+ "<|22.92|>": 51511,
799
+ "<|22.94|>": 51512,
800
+ "<|22.96|>": 51513,
801
+ "<|22.98|>": 51514,
802
+ "<|23.00|>": 51515,
803
+ "<|23.02|>": 51516,
804
+ "<|23.04|>": 51517,
805
+ "<|23.06|>": 51518,
806
+ "<|23.08|>": 51519,
807
+ "<|23.10|>": 51520,
808
+ "<|23.12|>": 51521,
809
+ "<|23.14|>": 51522,
810
+ "<|23.16|>": 51523,
811
+ "<|23.18|>": 51524,
812
+ "<|23.20|>": 51525,
813
+ "<|23.22|>": 51526,
814
+ "<|23.24|>": 51527,
815
+ "<|23.26|>": 51528,
816
+ "<|23.28|>": 51529,
817
+ "<|23.30|>": 51530,
818
+ "<|23.32|>": 51531,
819
+ "<|23.34|>": 51532,
820
+ "<|23.36|>": 51533,
821
+ "<|23.38|>": 51534,
822
+ "<|23.40|>": 51535,
823
+ "<|23.42|>": 51536,
824
+ "<|23.44|>": 51537,
825
+ "<|23.46|>": 51538,
826
+ "<|23.48|>": 51539,
827
+ "<|23.50|>": 51540,
828
+ "<|23.52|>": 51541,
829
+ "<|23.54|>": 51542,
830
+ "<|23.56|>": 51543,
831
+ "<|23.58|>": 51544,
832
+ "<|23.60|>": 51545,
833
+ "<|23.62|>": 51546,
834
+ "<|23.64|>": 51547,
835
+ "<|23.66|>": 51548,
836
+ "<|23.68|>": 51549,
837
+ "<|23.70|>": 51550,
838
+ "<|23.72|>": 51551,
839
+ "<|23.74|>": 51552,
840
+ "<|23.76|>": 51553,
841
+ "<|23.78|>": 51554,
842
+ "<|23.80|>": 51555,
843
+ "<|23.82|>": 51556,
844
+ "<|23.84|>": 51557,
845
+ "<|23.86|>": 51558,
846
+ "<|23.88|>": 51559,
847
+ "<|23.90|>": 51560,
848
+ "<|23.92|>": 51561,
849
+ "<|23.94|>": 51562,
850
+ "<|23.96|>": 51563,
851
+ "<|23.98|>": 51564,
852
+ "<|24.00|>": 51565,
853
+ "<|24.02|>": 51566,
854
+ "<|24.04|>": 51567,
855
+ "<|24.06|>": 51568,
856
+ "<|24.08|>": 51569,
857
+ "<|24.10|>": 51570,
858
+ "<|24.12|>": 51571,
859
+ "<|24.14|>": 51572,
860
+ "<|24.16|>": 51573,
861
+ "<|24.18|>": 51574,
862
+ "<|24.20|>": 51575,
863
+ "<|24.22|>": 51576,
864
+ "<|24.24|>": 51577,
865
+ "<|24.26|>": 51578,
866
+ "<|24.28|>": 51579,
867
+ "<|24.30|>": 51580,
868
+ "<|24.32|>": 51581,
869
+ "<|24.34|>": 51582,
870
+ "<|24.36|>": 51583,
871
+ "<|24.38|>": 51584,
872
+ "<|24.40|>": 51585,
873
+ "<|24.42|>": 51586,
874
+ "<|24.44|>": 51587,
875
+ "<|24.46|>": 51588,
876
+ "<|24.48|>": 51589,
877
+ "<|24.50|>": 51590,
878
+ "<|24.52|>": 51591,
879
+ "<|24.54|>": 51592,
880
+ "<|24.56|>": 51593,
881
+ "<|24.58|>": 51594,
882
+ "<|24.60|>": 51595,
883
+ "<|24.62|>": 51596,
884
+ "<|24.64|>": 51597,
885
+ "<|24.66|>": 51598,
886
+ "<|24.68|>": 51599,
887
+ "<|24.70|>": 51600,
888
+ "<|24.72|>": 51601,
889
+ "<|24.74|>": 51602,
890
+ "<|24.76|>": 51603,
891
+ "<|24.78|>": 51604,
892
+ "<|24.80|>": 51605,
893
+ "<|24.82|>": 51606,
894
+ "<|24.84|>": 51607,
895
+ "<|24.86|>": 51608,
896
+ "<|24.88|>": 51609,
897
+ "<|24.90|>": 51610,
898
+ "<|24.92|>": 51611,
899
+ "<|24.94|>": 51612,
900
+ "<|24.96|>": 51613,
901
+ "<|24.98|>": 51614,
902
+ "<|25.00|>": 51615,
903
+ "<|25.02|>": 51616,
904
+ "<|25.04|>": 51617,
905
+ "<|25.06|>": 51618,
906
+ "<|25.08|>": 51619,
907
+ "<|25.10|>": 51620,
908
+ "<|25.12|>": 51621,
909
+ "<|25.14|>": 51622,
910
+ "<|25.16|>": 51623,
911
+ "<|25.18|>": 51624,
912
+ "<|25.20|>": 51625,
913
+ "<|25.22|>": 51626,
914
+ "<|25.24|>": 51627,
915
+ "<|25.26|>": 51628,
916
+ "<|25.28|>": 51629,
917
+ "<|25.30|>": 51630,
918
+ "<|25.32|>": 51631,
919
+ "<|25.34|>": 51632,
920
+ "<|25.36|>": 51633,
921
+ "<|25.38|>": 51634,
922
+ "<|25.40|>": 51635,
923
+ "<|25.42|>": 51636,
924
+ "<|25.44|>": 51637,
925
+ "<|25.46|>": 51638,
926
+ "<|25.48|>": 51639,
927
+ "<|25.50|>": 51640,
928
+ "<|25.52|>": 51641,
929
+ "<|25.54|>": 51642,
930
+ "<|25.56|>": 51643,
931
+ "<|25.58|>": 51644,
932
+ "<|25.60|>": 51645,
933
+ "<|25.62|>": 51646,
934
+ "<|25.64|>": 51647,
935
+ "<|25.66|>": 51648,
936
+ "<|25.68|>": 51649,
937
+ "<|25.70|>": 51650,
938
+ "<|25.72|>": 51651,
939
+ "<|25.74|>": 51652,
940
+ "<|25.76|>": 51653,
941
+ "<|25.78|>": 51654,
942
+ "<|25.80|>": 51655,
943
+ "<|25.82|>": 51656,
944
+ "<|25.84|>": 51657,
945
+ "<|25.86|>": 51658,
946
+ "<|25.88|>": 51659,
947
+ "<|25.90|>": 51660,
948
+ "<|25.92|>": 51661,
949
+ "<|25.94|>": 51662,
950
+ "<|25.96|>": 51663,
951
+ "<|25.98|>": 51664,
952
+ "<|26.00|>": 51665,
953
+ "<|26.02|>": 51666,
954
+ "<|26.04|>": 51667,
955
+ "<|26.06|>": 51668,
956
+ "<|26.08|>": 51669,
957
+ "<|26.10|>": 51670,
958
+ "<|26.12|>": 51671,
959
+ "<|26.14|>": 51672,
960
+ "<|26.16|>": 51673,
961
+ "<|26.18|>": 51674,
962
+ "<|26.20|>": 51675,
963
+ "<|26.22|>": 51676,
964
+ "<|26.24|>": 51677,
965
+ "<|26.26|>": 51678,
966
+ "<|26.28|>": 51679,
967
+ "<|26.30|>": 51680,
968
+ "<|26.32|>": 51681,
969
+ "<|26.34|>": 51682,
970
+ "<|26.36|>": 51683,
971
+ "<|26.38|>": 51684,
972
+ "<|26.40|>": 51685,
973
+ "<|26.42|>": 51686,
974
+ "<|26.44|>": 51687,
975
+ "<|26.46|>": 51688,
976
+ "<|26.48|>": 51689,
977
+ "<|26.50|>": 51690,
978
+ "<|26.52|>": 51691,
979
+ "<|26.54|>": 51692,
980
+ "<|26.56|>": 51693,
981
+ "<|26.58|>": 51694,
982
+ "<|26.60|>": 51695,
983
+ "<|26.62|>": 51696,
984
+ "<|26.64|>": 51697,
985
+ "<|26.66|>": 51698,
986
+ "<|26.68|>": 51699,
987
+ "<|26.70|>": 51700,
988
+ "<|26.72|>": 51701,
989
+ "<|26.74|>": 51702,
990
+ "<|26.76|>": 51703,
991
+ "<|26.78|>": 51704,
992
+ "<|26.80|>": 51705,
993
+ "<|26.82|>": 51706,
994
+ "<|26.84|>": 51707,
995
+ "<|26.86|>": 51708,
996
+ "<|26.88|>": 51709,
997
+ "<|26.90|>": 51710,
998
+ "<|26.92|>": 51711,
999
+ "<|26.94|>": 51712,
1000
+ "<|26.96|>": 51713,
1001
+ "<|26.98|>": 51714,
1002
+ "<|27.00|>": 51715,
1003
+ "<|27.02|>": 51716,
1004
+ "<|27.04|>": 51717,
1005
+ "<|27.06|>": 51718,
1006
+ "<|27.08|>": 51719,
1007
+ "<|27.10|>": 51720,
1008
+ "<|27.12|>": 51721,
1009
+ "<|27.14|>": 51722,
1010
+ "<|27.16|>": 51723,
1011
+ "<|27.18|>": 51724,
1012
+ "<|27.20|>": 51725,
1013
+ "<|27.22|>": 51726,
1014
+ "<|27.24|>": 51727,
1015
+ "<|27.26|>": 51728,
1016
+ "<|27.28|>": 51729,
1017
+ "<|27.30|>": 51730,
1018
+ "<|27.32|>": 51731,
1019
+ "<|27.34|>": 51732,
1020
+ "<|27.36|>": 51733,
1021
+ "<|27.38|>": 51734,
1022
+ "<|27.40|>": 51735,
1023
+ "<|27.42|>": 51736,
1024
+ "<|27.44|>": 51737,
1025
+ "<|27.46|>": 51738,
1026
+ "<|27.48|>": 51739,
1027
+ "<|27.50|>": 51740,
1028
+ "<|27.52|>": 51741,
1029
+ "<|27.54|>": 51742,
1030
+ "<|27.56|>": 51743,
1031
+ "<|27.58|>": 51744,
1032
+ "<|27.60|>": 51745,
1033
+ "<|27.62|>": 51746,
1034
+ "<|27.64|>": 51747,
1035
+ "<|27.66|>": 51748,
1036
+ "<|27.68|>": 51749,
1037
+ "<|27.70|>": 51750,
1038
+ "<|27.72|>": 51751,
1039
+ "<|27.74|>": 51752,
1040
+ "<|27.76|>": 51753,
1041
+ "<|27.78|>": 51754,
1042
+ "<|27.80|>": 51755,
1043
+ "<|27.82|>": 51756,
1044
+ "<|27.84|>": 51757,
1045
+ "<|27.86|>": 51758,
1046
+ "<|27.88|>": 51759,
1047
+ "<|27.90|>": 51760,
1048
+ "<|27.92|>": 51761,
1049
+ "<|27.94|>": 51762,
1050
+ "<|27.96|>": 51763,
1051
+ "<|27.98|>": 51764,
1052
+ "<|28.00|>": 51765,
1053
+ "<|28.02|>": 51766,
1054
+ "<|28.04|>": 51767,
1055
+ "<|28.06|>": 51768,
1056
+ "<|28.08|>": 51769,
1057
+ "<|28.10|>": 51770,
1058
+ "<|28.12|>": 51771,
1059
+ "<|28.14|>": 51772,
1060
+ "<|28.16|>": 51773,
1061
+ "<|28.18|>": 51774,
1062
+ "<|28.20|>": 51775,
1063
+ "<|28.22|>": 51776,
1064
+ "<|28.24|>": 51777,
1065
+ "<|28.26|>": 51778,
1066
+ "<|28.28|>": 51779,
1067
+ "<|28.30|>": 51780,
1068
+ "<|28.32|>": 51781,
1069
+ "<|28.34|>": 51782,
1070
+ "<|28.36|>": 51783,
1071
+ "<|28.38|>": 51784,
1072
+ "<|28.40|>": 51785,
1073
+ "<|28.42|>": 51786,
1074
+ "<|28.44|>": 51787,
1075
+ "<|28.46|>": 51788,
1076
+ "<|28.48|>": 51789,
1077
+ "<|28.50|>": 51790,
1078
+ "<|28.52|>": 51791,
1079
+ "<|28.54|>": 51792,
1080
+ "<|28.56|>": 51793,
1081
+ "<|28.58|>": 51794,
1082
+ "<|28.60|>": 51795,
1083
+ "<|28.62|>": 51796,
1084
+ "<|28.64|>": 51797,
1085
+ "<|28.66|>": 51798,
1086
+ "<|28.68|>": 51799,
1087
+ "<|28.70|>": 51800,
1088
+ "<|28.72|>": 51801,
1089
+ "<|28.74|>": 51802,
1090
+ "<|28.76|>": 51803,
1091
+ "<|28.78|>": 51804,
1092
+ "<|28.80|>": 51805,
1093
+ "<|28.82|>": 51806,
1094
+ "<|28.84|>": 51807,
1095
+ "<|28.86|>": 51808,
1096
+ "<|28.88|>": 51809,
1097
+ "<|28.90|>": 51810,
1098
+ "<|28.92|>": 51811,
1099
+ "<|28.94|>": 51812,
1100
+ "<|28.96|>": 51813,
1101
+ "<|28.98|>": 51814,
1102
+ "<|29.00|>": 51815,
1103
+ "<|29.02|>": 51816,
1104
+ "<|29.04|>": 51817,
1105
+ "<|29.06|>": 51818,
1106
+ "<|29.08|>": 51819,
1107
+ "<|29.10|>": 51820,
1108
+ "<|29.12|>": 51821,
1109
+ "<|29.14|>": 51822,
1110
+ "<|29.16|>": 51823,
1111
+ "<|29.18|>": 51824,
1112
+ "<|29.20|>": 51825,
1113
+ "<|29.22|>": 51826,
1114
+ "<|29.24|>": 51827,
1115
+ "<|29.26|>": 51828,
1116
+ "<|29.28|>": 51829,
1117
+ "<|29.30|>": 51830,
1118
+ "<|29.32|>": 51831,
1119
+ "<|29.34|>": 51832,
1120
+ "<|29.36|>": 51833,
1121
+ "<|29.38|>": 51834,
1122
+ "<|29.40|>": 51835,
1123
+ "<|29.42|>": 51836,
1124
+ "<|29.44|>": 51837,
1125
+ "<|29.46|>": 51838,
1126
+ "<|29.48|>": 51839,
1127
+ "<|29.50|>": 51840,
1128
+ "<|29.52|>": 51841,
1129
+ "<|29.54|>": 51842,
1130
+ "<|29.56|>": 51843,
1131
+ "<|29.58|>": 51844,
1132
+ "<|29.60|>": 51845,
1133
+ "<|29.62|>": 51846,
1134
+ "<|29.64|>": 51847,
1135
+ "<|29.66|>": 51848,
1136
+ "<|29.68|>": 51849,
1137
+ "<|29.70|>": 51850,
1138
+ "<|29.72|>": 51851,
1139
+ "<|29.74|>": 51852,
1140
+ "<|29.76|>": 51853,
1141
+ "<|29.78|>": 51854,
1142
+ "<|29.80|>": 51855,
1143
+ "<|29.82|>": 51856,
1144
+ "<|29.84|>": 51857,
1145
+ "<|29.86|>": 51858,
1146
+ "<|29.88|>": 51859,
1147
+ "<|29.90|>": 51860,
1148
+ "<|29.92|>": 51861,
1149
+ "<|29.94|>": 51862,
1150
+ "<|29.96|>": 51863,
1151
+ "<|29.98|>": 51864,
1152
+ "<|3.00|>": 50515,
1153
+ "<|3.02|>": 50516,
1154
+ "<|3.04|>": 50517,
1155
+ "<|3.06|>": 50518,
1156
+ "<|3.08|>": 50519,
1157
+ "<|3.10|>": 50520,
1158
+ "<|3.12|>": 50521,
1159
+ "<|3.14|>": 50522,
1160
+ "<|3.16|>": 50523,
1161
+ "<|3.18|>": 50524,
1162
+ "<|3.20|>": 50525,
1163
+ "<|3.22|>": 50526,
1164
+ "<|3.24|>": 50527,
1165
+ "<|3.26|>": 50528,
1166
+ "<|3.28|>": 50529,
1167
+ "<|3.30|>": 50530,
1168
+ "<|3.32|>": 50531,
1169
+ "<|3.34|>": 50532,
1170
+ "<|3.36|>": 50533,
1171
+ "<|3.38|>": 50534,
1172
+ "<|3.40|>": 50535,
1173
+ "<|3.42|>": 50536,
1174
+ "<|3.44|>": 50537,
1175
+ "<|3.46|>": 50538,
1176
+ "<|3.48|>": 50539,
1177
+ "<|3.50|>": 50540,
1178
+ "<|3.52|>": 50541,
1179
+ "<|3.54|>": 50542,
1180
+ "<|3.56|>": 50543,
1181
+ "<|3.58|>": 50544,
1182
+ "<|3.60|>": 50545,
1183
+ "<|3.62|>": 50546,
1184
+ "<|3.64|>": 50547,
1185
+ "<|3.66|>": 50548,
1186
+ "<|3.68|>": 50549,
1187
+ "<|3.70|>": 50550,
1188
+ "<|3.72|>": 50551,
1189
+ "<|3.74|>": 50552,
1190
+ "<|3.76|>": 50553,
1191
+ "<|3.78|>": 50554,
1192
+ "<|3.80|>": 50555,
1193
+ "<|3.82|>": 50556,
1194
+ "<|3.84|>": 50557,
1195
+ "<|3.86|>": 50558,
1196
+ "<|3.88|>": 50559,
1197
+ "<|3.90|>": 50560,
1198
+ "<|3.92|>": 50561,
1199
+ "<|3.94|>": 50562,
1200
+ "<|3.96|>": 50563,
1201
+ "<|3.98|>": 50564,
1202
+ "<|30.00|>": 51865,
1203
+ "<|4.00|>": 50565,
1204
+ "<|4.02|>": 50566,
1205
+ "<|4.04|>": 50567,
1206
+ "<|4.06|>": 50568,
1207
+ "<|4.08|>": 50569,
1208
+ "<|4.10|>": 50570,
1209
+ "<|4.12|>": 50571,
1210
+ "<|4.14|>": 50572,
1211
+ "<|4.16|>": 50573,
1212
+ "<|4.18|>": 50574,
1213
+ "<|4.20|>": 50575,
1214
+ "<|4.22|>": 50576,
1215
+ "<|4.24|>": 50577,
1216
+ "<|4.26|>": 50578,
1217
+ "<|4.28|>": 50579,
1218
+ "<|4.30|>": 50580,
1219
+ "<|4.32|>": 50581,
1220
+ "<|4.34|>": 50582,
1221
+ "<|4.36|>": 50583,
1222
+ "<|4.38|>": 50584,
1223
+ "<|4.40|>": 50585,
1224
+ "<|4.42|>": 50586,
1225
+ "<|4.44|>": 50587,
1226
+ "<|4.46|>": 50588,
1227
+ "<|4.48|>": 50589,
1228
+ "<|4.50|>": 50590,
1229
+ "<|4.52|>": 50591,
1230
+ "<|4.54|>": 50592,
1231
+ "<|4.56|>": 50593,
1232
+ "<|4.58|>": 50594,
1233
+ "<|4.60|>": 50595,
1234
+ "<|4.62|>": 50596,
1235
+ "<|4.64|>": 50597,
1236
+ "<|4.66|>": 50598,
1237
+ "<|4.68|>": 50599,
1238
+ "<|4.70|>": 50600,
1239
+ "<|4.72|>": 50601,
1240
+ "<|4.74|>": 50602,
1241
+ "<|4.76|>": 50603,
1242
+ "<|4.78|>": 50604,
1243
+ "<|4.80|>": 50605,
1244
+ "<|4.82|>": 50606,
1245
+ "<|4.84|>": 50607,
1246
+ "<|4.86|>": 50608,
1247
+ "<|4.88|>": 50609,
1248
+ "<|4.90|>": 50610,
1249
+ "<|4.92|>": 50611,
1250
+ "<|4.94|>": 50612,
1251
+ "<|4.96|>": 50613,
1252
+ "<|4.98|>": 50614,
1253
+ "<|5.00|>": 50615,
1254
+ "<|5.02|>": 50616,
1255
+ "<|5.04|>": 50617,
1256
+ "<|5.06|>": 50618,
1257
+ "<|5.08|>": 50619,
1258
+ "<|5.10|>": 50620,
1259
+ "<|5.12|>": 50621,
1260
+ "<|5.14|>": 50622,
1261
+ "<|5.16|>": 50623,
1262
+ "<|5.18|>": 50624,
1263
+ "<|5.20|>": 50625,
1264
+ "<|5.22|>": 50626,
1265
+ "<|5.24|>": 50627,
1266
+ "<|5.26|>": 50628,
1267
+ "<|5.28|>": 50629,
1268
+ "<|5.30|>": 50630,
1269
+ "<|5.32|>": 50631,
1270
+ "<|5.34|>": 50632,
1271
+ "<|5.36|>": 50633,
1272
+ "<|5.38|>": 50634,
1273
+ "<|5.40|>": 50635,
1274
+ "<|5.42|>": 50636,
1275
+ "<|5.44|>": 50637,
1276
+ "<|5.46|>": 50638,
1277
+ "<|5.48|>": 50639,
1278
+ "<|5.50|>": 50640,
1279
+ "<|5.52|>": 50641,
1280
+ "<|5.54|>": 50642,
1281
+ "<|5.56|>": 50643,
1282
+ "<|5.58|>": 50644,
1283
+ "<|5.60|>": 50645,
1284
+ "<|5.62|>": 50646,
1285
+ "<|5.64|>": 50647,
1286
+ "<|5.66|>": 50648,
1287
+ "<|5.68|>": 50649,
1288
+ "<|5.70|>": 50650,
1289
+ "<|5.72|>": 50651,
1290
+ "<|5.74|>": 50652,
1291
+ "<|5.76|>": 50653,
1292
+ "<|5.78|>": 50654,
1293
+ "<|5.80|>": 50655,
1294
+ "<|5.82|>": 50656,
1295
+ "<|5.84|>": 50657,
1296
+ "<|5.86|>": 50658,
1297
+ "<|5.88|>": 50659,
1298
+ "<|5.90|>": 50660,
1299
+ "<|5.92|>": 50661,
1300
+ "<|5.94|>": 50662,
1301
+ "<|5.96|>": 50663,
1302
+ "<|5.98|>": 50664,
1303
+ "<|6.00|>": 50665,
1304
+ "<|6.02|>": 50666,
1305
+ "<|6.04|>": 50667,
1306
+ "<|6.06|>": 50668,
1307
+ "<|6.08|>": 50669,
1308
+ "<|6.10|>": 50670,
1309
+ "<|6.12|>": 50671,
1310
+ "<|6.14|>": 50672,
1311
+ "<|6.16|>": 50673,
1312
+ "<|6.18|>": 50674,
1313
+ "<|6.20|>": 50675,
1314
+ "<|6.22|>": 50676,
1315
+ "<|6.24|>": 50677,
1316
+ "<|6.26|>": 50678,
1317
+ "<|6.28|>": 50679,
1318
+ "<|6.30|>": 50680,
1319
+ "<|6.32|>": 50681,
1320
+ "<|6.34|>": 50682,
1321
+ "<|6.36|>": 50683,
1322
+ "<|6.38|>": 50684,
1323
+ "<|6.40|>": 50685,
1324
+ "<|6.42|>": 50686,
1325
+ "<|6.44|>": 50687,
1326
+ "<|6.46|>": 50688,
1327
+ "<|6.48|>": 50689,
1328
+ "<|6.50|>": 50690,
1329
+ "<|6.52|>": 50691,
1330
+ "<|6.54|>": 50692,
1331
+ "<|6.56|>": 50693,
1332
+ "<|6.58|>": 50694,
1333
+ "<|6.60|>": 50695,
1334
+ "<|6.62|>": 50696,
1335
+ "<|6.64|>": 50697,
1336
+ "<|6.66|>": 50698,
1337
+ "<|6.68|>": 50699,
1338
+ "<|6.70|>": 50700,
1339
+ "<|6.72|>": 50701,
1340
+ "<|6.74|>": 50702,
1341
+ "<|6.76|>": 50703,
1342
+ "<|6.78|>": 50704,
1343
+ "<|6.80|>": 50705,
1344
+ "<|6.82|>": 50706,
1345
+ "<|6.84|>": 50707,
1346
+ "<|6.86|>": 50708,
1347
+ "<|6.88|>": 50709,
1348
+ "<|6.90|>": 50710,
1349
+ "<|6.92|>": 50711,
1350
+ "<|6.94|>": 50712,
1351
+ "<|6.96|>": 50713,
1352
+ "<|6.98|>": 50714,
1353
+ "<|7.00|>": 50715,
1354
+ "<|7.02|>": 50716,
1355
+ "<|7.04|>": 50717,
1356
+ "<|7.06|>": 50718,
1357
+ "<|7.08|>": 50719,
1358
+ "<|7.10|>": 50720,
1359
+ "<|7.12|>": 50721,
1360
+ "<|7.14|>": 50722,
1361
+ "<|7.16|>": 50723,
1362
+ "<|7.18|>": 50724,
1363
+ "<|7.20|>": 50725,
1364
+ "<|7.22|>": 50726,
1365
+ "<|7.24|>": 50727,
1366
+ "<|7.26|>": 50728,
1367
+ "<|7.28|>": 50729,
1368
+ "<|7.30|>": 50730,
1369
+ "<|7.32|>": 50731,
1370
+ "<|7.34|>": 50732,
1371
+ "<|7.36|>": 50733,
1372
+ "<|7.38|>": 50734,
1373
+ "<|7.40|>": 50735,
1374
+ "<|7.42|>": 50736,
1375
+ "<|7.44|>": 50737,
1376
+ "<|7.46|>": 50738,
1377
+ "<|7.48|>": 50739,
1378
+ "<|7.50|>": 50740,
1379
+ "<|7.52|>": 50741,
1380
+ "<|7.54|>": 50742,
1381
+ "<|7.56|>": 50743,
1382
+ "<|7.58|>": 50744,
1383
+ "<|7.60|>": 50745,
1384
+ "<|7.62|>": 50746,
1385
+ "<|7.64|>": 50747,
1386
+ "<|7.66|>": 50748,
1387
+ "<|7.68|>": 50749,
1388
+ "<|7.70|>": 50750,
1389
+ "<|7.72|>": 50751,
1390
+ "<|7.74|>": 50752,
1391
+ "<|7.76|>": 50753,
1392
+ "<|7.78|>": 50754,
1393
+ "<|7.80|>": 50755,
1394
+ "<|7.82|>": 50756,
1395
+ "<|7.84|>": 50757,
1396
+ "<|7.86|>": 50758,
1397
+ "<|7.88|>": 50759,
1398
+ "<|7.90|>": 50760,
1399
+ "<|7.92|>": 50761,
1400
+ "<|7.94|>": 50762,
1401
+ "<|7.96|>": 50763,
1402
+ "<|7.98|>": 50764,
1403
+ "<|8.00|>": 50765,
1404
+ "<|8.02|>": 50766,
1405
+ "<|8.04|>": 50767,
1406
+ "<|8.06|>": 50768,
1407
+ "<|8.08|>": 50769,
1408
+ "<|8.10|>": 50770,
1409
+ "<|8.12|>": 50771,
1410
+ "<|8.14|>": 50772,
1411
+ "<|8.16|>": 50773,
1412
+ "<|8.18|>": 50774,
1413
+ "<|8.20|>": 50775,
1414
+ "<|8.22|>": 50776,
1415
+ "<|8.24|>": 50777,
1416
+ "<|8.26|>": 50778,
1417
+ "<|8.28|>": 50779,
1418
+ "<|8.30|>": 50780,
1419
+ "<|8.32|>": 50781,
1420
+ "<|8.34|>": 50782,
1421
+ "<|8.36|>": 50783,
1422
+ "<|8.38|>": 50784,
1423
+ "<|8.40|>": 50785,
1424
+ "<|8.42|>": 50786,
1425
+ "<|8.44|>": 50787,
1426
+ "<|8.46|>": 50788,
1427
+ "<|8.48|>": 50789,
1428
+ "<|8.50|>": 50790,
1429
+ "<|8.52|>": 50791,
1430
+ "<|8.54|>": 50792,
1431
+ "<|8.56|>": 50793,
1432
+ "<|8.58|>": 50794,
1433
+ "<|8.60|>": 50795,
1434
+ "<|8.62|>": 50796,
1435
+ "<|8.64|>": 50797,
1436
+ "<|8.66|>": 50798,
1437
+ "<|8.68|>": 50799,
1438
+ "<|8.70|>": 50800,
1439
+ "<|8.72|>": 50801,
1440
+ "<|8.74|>": 50802,
1441
+ "<|8.76|>": 50803,
1442
+ "<|8.78|>": 50804,
1443
+ "<|8.80|>": 50805,
1444
+ "<|8.82|>": 50806,
1445
+ "<|8.84|>": 50807,
1446
+ "<|8.86|>": 50808,
1447
+ "<|8.88|>": 50809,
1448
+ "<|8.90|>": 50810,
1449
+ "<|8.92|>": 50811,
1450
+ "<|8.94|>": 50812,
1451
+ "<|8.96|>": 50813,
1452
+ "<|8.98|>": 50814,
1453
+ "<|9.00|>": 50815,
1454
+ "<|9.02|>": 50816,
1455
+ "<|9.04|>": 50817,
1456
+ "<|9.06|>": 50818,
1457
+ "<|9.08|>": 50819,
1458
+ "<|9.10|>": 50820,
1459
+ "<|9.12|>": 50821,
1460
+ "<|9.14|>": 50822,
1461
+ "<|9.16|>": 50823,
1462
+ "<|9.18|>": 50824,
1463
+ "<|9.20|>": 50825,
1464
+ "<|9.22|>": 50826,
1465
+ "<|9.24|>": 50827,
1466
+ "<|9.26|>": 50828,
1467
+ "<|9.28|>": 50829,
1468
+ "<|9.30|>": 50830,
1469
+ "<|9.32|>": 50831,
1470
+ "<|9.34|>": 50832,
1471
+ "<|9.36|>": 50833,
1472
+ "<|9.38|>": 50834,
1473
+ "<|9.40|>": 50835,
1474
+ "<|9.42|>": 50836,
1475
+ "<|9.44|>": 50837,
1476
+ "<|9.46|>": 50838,
1477
+ "<|9.48|>": 50839,
1478
+ "<|9.50|>": 50840,
1479
+ "<|9.52|>": 50841,
1480
+ "<|9.54|>": 50842,
1481
+ "<|9.56|>": 50843,
1482
+ "<|9.58|>": 50844,
1483
+ "<|9.60|>": 50845,
1484
+ "<|9.62|>": 50846,
1485
+ "<|9.64|>": 50847,
1486
+ "<|9.66|>": 50848,
1487
+ "<|9.68|>": 50849,
1488
+ "<|9.70|>": 50850,
1489
+ "<|9.72|>": 50851,
1490
+ "<|9.74|>": 50852,
1491
+ "<|9.76|>": 50853,
1492
+ "<|9.78|>": 50854,
1493
+ "<|9.80|>": 50855,
1494
+ "<|9.82|>": 50856,
1495
+ "<|9.84|>": 50857,
1496
+ "<|9.86|>": 50858,
1497
+ "<|9.88|>": 50859,
1498
+ "<|9.90|>": 50860,
1499
+ "<|9.92|>": 50861,
1500
+ "<|9.94|>": 50862,
1501
+ "<|9.96|>": 50863,
1502
+ "<|9.98|>": 50864,
1503
+ "<|af|>": 50327,
1504
+ "<|am|>": 50334,
1505
+ "<|ar|>": 50272,
1506
+ "<|as|>": 50350,
1507
+ "<|az|>": 50304,
1508
+ "<|ba|>": 50355,
1509
+ "<|be|>": 50330,
1510
+ "<|bg|>": 50292,
1511
+ "<|bn|>": 50302,
1512
+ "<|bo|>": 50347,
1513
+ "<|br|>": 50309,
1514
+ "<|bs|>": 50315,
1515
+ "<|ca|>": 50270,
1516
+ "<|cs|>": 50283,
1517
+ "<|cy|>": 50297,
1518
+ "<|da|>": 50285,
1519
+ "<|de|>": 50261,
1520
+ "<|el|>": 50281,
1521
+ "<|endoftext|>": 50257,
1522
+ "<|en|>": 50259,
1523
+ "<|es|>": 50262,
1524
+ "<|et|>": 50307,
1525
+ "<|eu|>": 50310,
1526
+ "<|fa|>": 50300,
1527
+ "<|fi|>": 50277,
1528
+ "<|fo|>": 50338,
1529
+ "<|fr|>": 50265,
1530
+ "<|gl|>": 50319,
1531
+ "<|gu|>": 50333,
1532
+ "<|haw|>": 50352,
1533
+ "<|ha|>": 50354,
1534
+ "<|he|>": 50279,
1535
+ "<|hi|>": 50276,
1536
+ "<|hr|>": 50291,
1537
+ "<|ht|>": 50339,
1538
+ "<|hu|>": 50286,
1539
+ "<|hy|>": 50312,
1540
+ "<|id|>": 50275,
1541
+ "<|is|>": 50311,
1542
+ "<|it|>": 50274,
1543
+ "<|ja|>": 50266,
1544
+ "<|jw|>": 50356,
1545
+ "<|ka|>": 50329,
1546
+ "<|kk|>": 50316,
1547
+ "<|km|>": 50323,
1548
+ "<|kn|>": 50306,
1549
+ "<|ko|>": 50264,
1550
+ "<|la|>": 50294,
1551
+ "<|lb|>": 50345,
1552
+ "<|ln|>": 50353,
1553
+ "<|lo|>": 50336,
1554
+ "<|lt|>": 50293,
1555
+ "<|lv|>": 50301,
1556
+ "<|mg|>": 50349,
1557
+ "<|mi|>": 50295,
1558
+ "<|mk|>": 50308,
1559
+ "<|ml|>": 50296,
1560
+ "<|mn|>": 50314,
1561
+ "<|mr|>": 50320,
1562
+ "<|ms|>": 50282,
1563
+ "<|mt|>": 50343,
1564
+ "<|my|>": 50346,
1565
+ "<|ne|>": 50313,
1566
+ "<|nl|>": 50271,
1567
+ "<|nn|>": 50342,
1568
+ "<|nospeech|>": 50363,
1569
+ "<|notimestamps|>": 50364,
1570
+ "<|no|>": 50288,
1571
+ "<|oc|>": 50328,
1572
+ "<|pa|>": 50321,
1573
+ "<|pl|>": 50269,
1574
+ "<|ps|>": 50340,
1575
+ "<|pt|>": 50267,
1576
+ "<|ro|>": 50284,
1577
+ "<|ru|>": 50263,
1578
+ "<|sa|>": 50344,
1579
+ "<|sd|>": 50332,
1580
+ "<|si|>": 50322,
1581
+ "<|sk|>": 50298,
1582
+ "<|sl|>": 50305,
1583
+ "<|sn|>": 50324,
1584
+ "<|so|>": 50326,
1585
+ "<|sq|>": 50317,
1586
+ "<|sr|>": 50303,
1587
+ "<|startoflm|>": 50361,
1588
+ "<|startofprev|>": 50362,
1589
+ "<|startoftranscript|>": 50258,
1590
+ "<|su|>": 50357,
1591
+ "<|sv|>": 50273,
1592
+ "<|sw|>": 50318,
1593
+ "<|ta|>": 50287,
1594
+ "<|te|>": 50299,
1595
+ "<|tg|>": 50331,
1596
+ "<|th|>": 50289,
1597
+ "<|tk|>": 50341,
1598
+ "<|tl|>": 50348,
1599
+ "<|transcribe|>": 50360,
1600
+ "<|translate|>": 50359,
1601
+ "<|tr|>": 50268,
1602
+ "<|tt|>": 50351,
1603
+ "<|uk|>": 50280,
1604
+ "<|ur|>": 50290,
1605
+ "<|uz|>": 50337,
1606
+ "<|vi|>": 50278,
1607
+ "<|yi|>": 50335,
1608
+ "<|yo|>": 50325,
1609
+ "<|yue|>": 50358,
1610
+ "<|zh|>": 50260
1611
+ }
nb-distil-large-init/config.json ADDED
@@ -0,0 +1,288 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "./",
3
+ "activation_dropout": 0.1,
4
+ "activation_function": "gelu",
5
+ "alignment_heads": [
6
+ [
7
+ 7,
8
+ 0
9
+ ],
10
+ [
11
+ 10,
12
+ 17
13
+ ],
14
+ [
15
+ 12,
16
+ 18
17
+ ],
18
+ [
19
+ 13,
20
+ 12
21
+ ],
22
+ [
23
+ 16,
24
+ 1
25
+ ],
26
+ [
27
+ 17,
28
+ 14
29
+ ],
30
+ [
31
+ 19,
32
+ 11
33
+ ],
34
+ [
35
+ 21,
36
+ 4
37
+ ],
38
+ [
39
+ 24,
40
+ 1
41
+ ],
42
+ [
43
+ 25,
44
+ 6
45
+ ]
46
+ ],
47
+ "apply_spec_augment": false,
48
+ "architectures": [
49
+ "WhisperForConditionalGeneration"
50
+ ],
51
+ "attention_dropout": 0,
52
+ "begin_suppress_tokens": [
53
+ 220,
54
+ 50257
55
+ ],
56
+ "bos_token_id": 50257,
57
+ "classifier_proj_size": 256,
58
+ "d_model": 1280,
59
+ "decoder_attention_heads": 20,
60
+ "decoder_ffn_dim": 5120,
61
+ "decoder_layerdrop": 0,
62
+ "decoder_layers": 2,
63
+ "decoder_start_token_id": 50258,
64
+ "dropout": 0,
65
+ "encoder_attention_heads": 20,
66
+ "encoder_ffn_dim": 5120,
67
+ "encoder_layerdrop": 0,
68
+ "encoder_layers": 32,
69
+ "eos_token_id": 50257,
70
+ "init_std": 0.02,
71
+ "is_encoder_decoder": true,
72
+ "lang_ids": [
73
+ 50259,
74
+ 50260,
75
+ 50261,
76
+ 50262,
77
+ 50263,
78
+ 50264,
79
+ 50265,
80
+ 50266,
81
+ 50267,
82
+ 50268,
83
+ 50269,
84
+ 50270,
85
+ 50271,
86
+ 50272,
87
+ 50273,
88
+ 50274,
89
+ 50275,
90
+ 50276,
91
+ 50277,
92
+ 50278,
93
+ 50279,
94
+ 50280,
95
+ 50281,
96
+ 50282,
97
+ 50283,
98
+ 50284,
99
+ 50285,
100
+ 50286,
101
+ 50287,
102
+ 50288,
103
+ 50289,
104
+ 50290,
105
+ 50291,
106
+ 50292,
107
+ 50293,
108
+ 50294,
109
+ 50295,
110
+ 50296,
111
+ 50297,
112
+ 50298,
113
+ 50299,
114
+ 50300,
115
+ 50301,
116
+ 50302,
117
+ 50303,
118
+ 50304,
119
+ 50305,
120
+ 50306,
121
+ 50307,
122
+ 50308,
123
+ 50309,
124
+ 50310,
125
+ 50311,
126
+ 50312,
127
+ 50313,
128
+ 50314,
129
+ 50315,
130
+ 50316,
131
+ 50317,
132
+ 50318,
133
+ 50319,
134
+ 50320,
135
+ 50321,
136
+ 50322,
137
+ 50323,
138
+ 50324,
139
+ 50325,
140
+ 50326,
141
+ 50327,
142
+ 50328,
143
+ 50329,
144
+ 50330,
145
+ 50331,
146
+ 50332,
147
+ 50333,
148
+ 50334,
149
+ 50335,
150
+ 50336,
151
+ 50337,
152
+ 50338,
153
+ 50339,
154
+ 50340,
155
+ 50341,
156
+ 50342,
157
+ 50343,
158
+ 50344,
159
+ 50345,
160
+ 50346,
161
+ 50347,
162
+ 50348,
163
+ 50349,
164
+ 50350,
165
+ 50351,
166
+ 50352,
167
+ 50353,
168
+ 50354,
169
+ 50355,
170
+ 50356,
171
+ 50357,
172
+ 50358
173
+ ],
174
+ "mask_feature_length": 10,
175
+ "mask_feature_min_masks": 0,
176
+ "mask_feature_prob": 0,
177
+ "mask_time_length": 10,
178
+ "mask_time_min_masks": 2,
179
+ "mask_time_prob": 0.05,
180
+ "max_length": 448,
181
+ "max_source_positions": 1500,
182
+ "max_target_positions": 448,
183
+ "median_filter_width": 7,
184
+ "model_type": "whisper",
185
+ "num_hidden_layers": 32,
186
+ "num_mel_bins": 128,
187
+ "pad_token_id": 50256,
188
+ "scale_embedding": false,
189
+ "suppress_ids": [
190
+ 1,
191
+ 2,
192
+ 7,
193
+ 8,
194
+ 9,
195
+ 10,
196
+ 14,
197
+ 25,
198
+ 26,
199
+ 27,
200
+ 28,
201
+ 29,
202
+ 31,
203
+ 58,
204
+ 59,
205
+ 60,
206
+ 61,
207
+ 62,
208
+ 63,
209
+ 90,
210
+ 91,
211
+ 92,
212
+ 93,
213
+ 359,
214
+ 503,
215
+ 522,
216
+ 542,
217
+ 873,
218
+ 893,
219
+ 902,
220
+ 918,
221
+ 922,
222
+ 931,
223
+ 1350,
224
+ 1853,
225
+ 1982,
226
+ 2460,
227
+ 2627,
228
+ 3246,
229
+ 3253,
230
+ 3268,
231
+ 3536,
232
+ 3846,
233
+ 3961,
234
+ 4183,
235
+ 4667,
236
+ 6585,
237
+ 6647,
238
+ 7273,
239
+ 9061,
240
+ 9383,
241
+ 10428,
242
+ 10929,
243
+ 11938,
244
+ 12033,
245
+ 12331,
246
+ 12562,
247
+ 13793,
248
+ 14157,
249
+ 14635,
250
+ 15265,
251
+ 15618,
252
+ 16553,
253
+ 16604,
254
+ 18362,
255
+ 18956,
256
+ 20075,
257
+ 21675,
258
+ 22520,
259
+ 26130,
260
+ 26161,
261
+ 26435,
262
+ 28279,
263
+ 29464,
264
+ 31650,
265
+ 32302,
266
+ 32470,
267
+ 36865,
268
+ 42863,
269
+ 47425,
270
+ 49870,
271
+ 50254,
272
+ 50258,
273
+ 50359,
274
+ 50360,
275
+ 50361,
276
+ 50362,
277
+ 50363
278
+ ],
279
+ "suppress_ids_begin": [
280
+ 220,
281
+ 50257
282
+ ],
283
+ "torch_dtype": "float32",
284
+ "transformers_version": "4.46.2",
285
+ "use_cache": true,
286
+ "use_weighted_layer_sum": false,
287
+ "vocab_size": 51866
288
+ }
nb-distil-large-init/flax_model.msgpack ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:60f608eb7887b643bfb0d6b11d3ad8564c648c296a90c1e558aa61075b1f2839
3
+ size 1512831199
nb-distil-large-init/generation_config.json ADDED
@@ -0,0 +1,270 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alignment_heads": [
3
+ [
4
+ 7,
5
+ 0
6
+ ],
7
+ [
8
+ 10,
9
+ 17
10
+ ],
11
+ [
12
+ 12,
13
+ 18
14
+ ],
15
+ [
16
+ 13,
17
+ 12
18
+ ],
19
+ [
20
+ 16,
21
+ 1
22
+ ],
23
+ [
24
+ 17,
25
+ 14
26
+ ],
27
+ [
28
+ 19,
29
+ 11
30
+ ],
31
+ [
32
+ 21,
33
+ 4
34
+ ],
35
+ [
36
+ 24,
37
+ 1
38
+ ],
39
+ [
40
+ 25,
41
+ 6
42
+ ]
43
+ ],
44
+ "begin_suppress_tokens": [
45
+ 220,
46
+ 50257
47
+ ],
48
+ "bos_token_id": 50257,
49
+ "decoder_start_token_id": 50258,
50
+ "eos_token_id": 50257,
51
+ "forced_decoder_ids": [
52
+ [
53
+ 1,
54
+ 50288
55
+ ],
56
+ [
57
+ 2,
58
+ 50360
59
+ ],
60
+ [
61
+ 3,
62
+ 50364
63
+ ]
64
+ ],
65
+ "is_multilingual": true,
66
+ "lang_to_id": {
67
+ "<|af|>": 50327,
68
+ "<|am|>": 50334,
69
+ "<|ar|>": 50272,
70
+ "<|as|>": 50350,
71
+ "<|az|>": 50304,
72
+ "<|ba|>": 50355,
73
+ "<|be|>": 50330,
74
+ "<|bg|>": 50292,
75
+ "<|bn|>": 50302,
76
+ "<|bo|>": 50347,
77
+ "<|br|>": 50309,
78
+ "<|bs|>": 50315,
79
+ "<|ca|>": 50270,
80
+ "<|cs|>": 50283,
81
+ "<|cy|>": 50297,
82
+ "<|da|>": 50285,
83
+ "<|de|>": 50261,
84
+ "<|el|>": 50281,
85
+ "<|en|>": 50259,
86
+ "<|es|>": 50262,
87
+ "<|et|>": 50307,
88
+ "<|eu|>": 50310,
89
+ "<|fa|>": 50300,
90
+ "<|fi|>": 50277,
91
+ "<|fo|>": 50338,
92
+ "<|fr|>": 50265,
93
+ "<|gl|>": 50319,
94
+ "<|gu|>": 50333,
95
+ "<|haw|>": 50352,
96
+ "<|ha|>": 50354,
97
+ "<|he|>": 50279,
98
+ "<|hi|>": 50276,
99
+ "<|hr|>": 50291,
100
+ "<|ht|>": 50339,
101
+ "<|hu|>": 50286,
102
+ "<|hy|>": 50312,
103
+ "<|id|>": 50275,
104
+ "<|is|>": 50311,
105
+ "<|it|>": 50274,
106
+ "<|ja|>": 50266,
107
+ "<|jw|>": 50356,
108
+ "<|ka|>": 50329,
109
+ "<|kk|>": 50316,
110
+ "<|km|>": 50323,
111
+ "<|kn|>": 50306,
112
+ "<|ko|>": 50264,
113
+ "<|la|>": 50294,
114
+ "<|lb|>": 50345,
115
+ "<|ln|>": 50353,
116
+ "<|lo|>": 50336,
117
+ "<|lt|>": 50293,
118
+ "<|lv|>": 50301,
119
+ "<|mg|>": 50349,
120
+ "<|mi|>": 50295,
121
+ "<|mk|>": 50308,
122
+ "<|ml|>": 50296,
123
+ "<|mn|>": 50314,
124
+ "<|mr|>": 50320,
125
+ "<|ms|>": 50282,
126
+ "<|mt|>": 50343,
127
+ "<|my|>": 50346,
128
+ "<|ne|>": 50313,
129
+ "<|nl|>": 50271,
130
+ "<|nn|>": 50342,
131
+ "<|no|>": 50288,
132
+ "<|oc|>": 50328,
133
+ "<|pa|>": 50321,
134
+ "<|pl|>": 50269,
135
+ "<|ps|>": 50340,
136
+ "<|pt|>": 50267,
137
+ "<|ro|>": 50284,
138
+ "<|ru|>": 50263,
139
+ "<|sa|>": 50344,
140
+ "<|sd|>": 50332,
141
+ "<|si|>": 50322,
142
+ "<|sk|>": 50298,
143
+ "<|sl|>": 50305,
144
+ "<|sn|>": 50324,
145
+ "<|so|>": 50326,
146
+ "<|sq|>": 50317,
147
+ "<|sr|>": 50303,
148
+ "<|su|>": 50357,
149
+ "<|sv|>": 50273,
150
+ "<|sw|>": 50318,
151
+ "<|ta|>": 50287,
152
+ "<|te|>": 50299,
153
+ "<|tg|>": 50331,
154
+ "<|th|>": 50289,
155
+ "<|tk|>": 50341,
156
+ "<|tl|>": 50348,
157
+ "<|tr|>": 50268,
158
+ "<|tt|>": 50351,
159
+ "<|uk|>": 50280,
160
+ "<|ur|>": 50290,
161
+ "<|uz|>": 50337,
162
+ "<|vi|>": 50278,
163
+ "<|yi|>": 50335,
164
+ "<|yo|>": 50325,
165
+ "<|yue|>": 50358,
166
+ "<|zh|>": 50260
167
+ },
168
+ "language": "<|no|>",
169
+ "max_initial_timestamp_index": 1,
170
+ "max_length": 448,
171
+ "no_timestamps_token_id": 50364,
172
+ "pad_token_id": 50257,
173
+ "return_timestamps": false,
174
+ "suppress_tokens": [
175
+ 1,
176
+ 2,
177
+ 7,
178
+ 8,
179
+ 9,
180
+ 10,
181
+ 14,
182
+ 25,
183
+ 26,
184
+ 27,
185
+ 28,
186
+ 29,
187
+ 31,
188
+ 58,
189
+ 59,
190
+ 60,
191
+ 61,
192
+ 62,
193
+ 63,
194
+ 90,
195
+ 91,
196
+ 92,
197
+ 93,
198
+ 359,
199
+ 503,
200
+ 522,
201
+ 542,
202
+ 873,
203
+ 893,
204
+ 902,
205
+ 918,
206
+ 922,
207
+ 931,
208
+ 1350,
209
+ 1853,
210
+ 1982,
211
+ 2460,
212
+ 2627,
213
+ 3246,
214
+ 3253,
215
+ 3268,
216
+ 3536,
217
+ 3846,
218
+ 3961,
219
+ 4183,
220
+ 4667,
221
+ 6585,
222
+ 6647,
223
+ 7273,
224
+ 9061,
225
+ 9383,
226
+ 10428,
227
+ 10929,
228
+ 11938,
229
+ 12033,
230
+ 12331,
231
+ 12562,
232
+ 13793,
233
+ 14157,
234
+ 14635,
235
+ 15265,
236
+ 15618,
237
+ 16553,
238
+ 16604,
239
+ 18362,
240
+ 18956,
241
+ 20075,
242
+ 21675,
243
+ 22520,
244
+ 26130,
245
+ 26161,
246
+ 26435,
247
+ 28279,
248
+ 29464,
249
+ 31650,
250
+ 32302,
251
+ 32470,
252
+ 36865,
253
+ 42863,
254
+ 47425,
255
+ 49870,
256
+ 50254,
257
+ 50258,
258
+ 50359,
259
+ 50360,
260
+ 50361,
261
+ 50362,
262
+ 50363
263
+ ],
264
+ "task": "transcribe",
265
+ "task_to_id": {
266
+ "transcribe": 50360,
267
+ "translate": 50359
268
+ },
269
+ "transformers_version": "4.46.2"
270
+ }
nb-distil-large-init/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
nb-distil-large-init/preprocessor_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "chunk_length": 30,
3
+ "feature_extractor_type": "WhisperFeatureExtractor",
4
+ "feature_size": 128,
5
+ "hop_length": 160,
6
+ "n_fft": 400,
7
+ "n_samples": 480000,
8
+ "nb_max_frames": 3000,
9
+ "padding_side": "right",
10
+ "padding_value": 0.0,
11
+ "processor_class": "WhisperProcessor",
12
+ "return_attention_mask": false,
13
+ "sampling_rate": 16000
14
+ }
nb-distil-large-init/special_tokens_map.json ADDED
@@ -0,0 +1,139 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|startoftranscript|>",
4
+ "<|en|>",
5
+ "<|zh|>",
6
+ "<|de|>",
7
+ "<|es|>",
8
+ "<|ru|>",
9
+ "<|ko|>",
10
+ "<|fr|>",
11
+ "<|ja|>",
12
+ "<|pt|>",
13
+ "<|tr|>",
14
+ "<|pl|>",
15
+ "<|ca|>",
16
+ "<|nl|>",
17
+ "<|ar|>",
18
+ "<|sv|>",
19
+ "<|it|>",
20
+ "<|id|>",
21
+ "<|hi|>",
22
+ "<|fi|>",
23
+ "<|vi|>",
24
+ "<|he|>",
25
+ "<|uk|>",
26
+ "<|el|>",
27
+ "<|ms|>",
28
+ "<|cs|>",
29
+ "<|ro|>",
30
+ "<|da|>",
31
+ "<|hu|>",
32
+ "<|ta|>",
33
+ "<|no|>",
34
+ "<|th|>",
35
+ "<|ur|>",
36
+ "<|hr|>",
37
+ "<|bg|>",
38
+ "<|lt|>",
39
+ "<|la|>",
40
+ "<|mi|>",
41
+ "<|ml|>",
42
+ "<|cy|>",
43
+ "<|sk|>",
44
+ "<|te|>",
45
+ "<|fa|>",
46
+ "<|lv|>",
47
+ "<|bn|>",
48
+ "<|sr|>",
49
+ "<|az|>",
50
+ "<|sl|>",
51
+ "<|kn|>",
52
+ "<|et|>",
53
+ "<|mk|>",
54
+ "<|br|>",
55
+ "<|eu|>",
56
+ "<|is|>",
57
+ "<|hy|>",
58
+ "<|ne|>",
59
+ "<|mn|>",
60
+ "<|bs|>",
61
+ "<|kk|>",
62
+ "<|sq|>",
63
+ "<|sw|>",
64
+ "<|gl|>",
65
+ "<|mr|>",
66
+ "<|pa|>",
67
+ "<|si|>",
68
+ "<|km|>",
69
+ "<|sn|>",
70
+ "<|yo|>",
71
+ "<|so|>",
72
+ "<|af|>",
73
+ "<|oc|>",
74
+ "<|ka|>",
75
+ "<|be|>",
76
+ "<|tg|>",
77
+ "<|sd|>",
78
+ "<|gu|>",
79
+ "<|am|>",
80
+ "<|yi|>",
81
+ "<|lo|>",
82
+ "<|uz|>",
83
+ "<|fo|>",
84
+ "<|ht|>",
85
+ "<|ps|>",
86
+ "<|tk|>",
87
+ "<|nn|>",
88
+ "<|mt|>",
89
+ "<|sa|>",
90
+ "<|lb|>",
91
+ "<|my|>",
92
+ "<|bo|>",
93
+ "<|tl|>",
94
+ "<|mg|>",
95
+ "<|as|>",
96
+ "<|tt|>",
97
+ "<|haw|>",
98
+ "<|ln|>",
99
+ "<|ha|>",
100
+ "<|ba|>",
101
+ "<|jw|>",
102
+ "<|su|>",
103
+ "<|yue|>",
104
+ "<|translate|>",
105
+ "<|transcribe|>",
106
+ "<|startoflm|>",
107
+ "<|startofprev|>",
108
+ "<|nospeech|>",
109
+ "<|notimestamps|>"
110
+ ],
111
+ "bos_token": {
112
+ "content": "<|endoftext|>",
113
+ "lstrip": false,
114
+ "normalized": false,
115
+ "rstrip": false,
116
+ "single_word": false
117
+ },
118
+ "eos_token": {
119
+ "content": "<|endoftext|>",
120
+ "lstrip": false,
121
+ "normalized": false,
122
+ "rstrip": false,
123
+ "single_word": false
124
+ },
125
+ "pad_token": {
126
+ "content": "<|endoftext|>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false
131
+ },
132
+ "unk_token": {
133
+ "content": "<|endoftext|>",
134
+ "lstrip": false,
135
+ "normalized": false,
136
+ "rstrip": false,
137
+ "single_word": false
138
+ }
139
+ }
nb-distil-large-init/tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff
 
nb-distil-large-init/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
preprocessor_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "chunk_length": 30,
3
+ "feature_extractor_type": "FlaxWhisperFeatureExtractor",
4
+ "feature_size": 128,
5
+ "hop_length": 160,
6
+ "n_fft": 400,
7
+ "n_samples": 480000,
8
+ "nb_max_frames": 3000,
9
+ "padding_side": "right",
10
+ "padding_value": 0.0,
11
+ "processor_class": "WhisperProcessor",
12
+ "return_attention_mask": false,
13
+ "sampling_rate": 16000
14
+ }
run_distillation.py ADDED
@@ -0,0 +1,2156 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python
2
+ # coding=utf-8
3
+ # Copyright 2023 The HuggingFace Inc. team. All rights reserved.
4
+ #
5
+ # Licensed under the Apache License, Version 2.0 (the "License");
6
+ # you may not use this file except in compliance with the License.
7
+ # You may obtain a copy of the License at
8
+ #
9
+ # http://www.apache.org/licenses/LICENSE-2.0
10
+ #
11
+ # Unless required by applicable law or agreed to in writing, software
12
+ # distributed under the License is distributed on an "AS IS" BASIS,
13
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14
+ # See the License for the specific language governing permissions and
15
+ # limitations under the License.
16
+ """
17
+ Training the Whisper model for sequence to sequence speech recognition via teacher-student distillation.
18
+ """
19
+ # You can also adapt this script for your own distillation tasks. Pointers for this are left as comments.
20
+
21
+ import logging
22
+ import os
23
+ import re
24
+ import shutil
25
+ import string
26
+ import sys
27
+ import time
28
+ from dataclasses import dataclass, field
29
+ from functools import partial
30
+ from pathlib import Path
31
+ from typing import Any, Callable, Dict, List, Optional, Union
32
+
33
+ import datasets
34
+ import evaluate
35
+ import flax
36
+ import jax
37
+ import jax.numpy as jnp
38
+ import numpy as np
39
+ import optax
40
+ import torch
41
+ import transformers
42
+ from datasets import (
43
+ DatasetDict,
44
+ IterableDataset,
45
+ IterableDatasetDict,
46
+ concatenate_datasets,
47
+ interleave_datasets,
48
+ load_dataset,
49
+ )
50
+ from flax import jax_utils, traverse_util
51
+ from flax.jax_utils import pad_shard_unpad, unreplicate
52
+ from flax.serialization import from_bytes, to_bytes
53
+ from flax.training import train_state
54
+ from flax.training.common_utils import get_metrics, onehot, shard, shard_prng_key
55
+ from huggingface_hub import Repository, create_repo
56
+ from jax.experimental.compilation_cache import compilation_cache as cc
57
+ from optax._src import linear_algebra
58
+ from torch.utils.data import DataLoader
59
+ from torchdata.datapipes.iter import IterableWrapper
60
+ from tqdm import tqdm
61
+ from transformers import (
62
+ AddedToken,
63
+ HfArgumentParser,
64
+ Seq2SeqTrainingArguments,
65
+ WhisperConfig,
66
+ WhisperFeatureExtractor,
67
+ WhisperProcessor,
68
+ WhisperTokenizerFast,
69
+ is_tensorboard_available,
70
+ is_wandb_available,
71
+ set_seed,
72
+ )
73
+ from transformers.file_utils import get_full_repo_name
74
+ from transformers.modeling_flax_outputs import FlaxBaseModelOutput
75
+ from transformers.models.whisper.english_normalizer import BasicTextNormalizer,EnglishTextNormalizer
76
+ from transformers.utils import check_min_version, send_example_telemetry
77
+ from transformers.utils.versions import require_version
78
+
79
+ from distil_whisper import FlaxWhisperForConditionalGeneration
80
+
81
+
82
+ # Will error if the minimal version of Transformers is not installed. Remove at your own risks.
83
+ check_min_version("4.27.0.dev0")
84
+
85
+ require_version(
86
+ "datasets>=1.18.0",
87
+ "To fix: pip install -r examples/flax/speech-recogintion/requirements.txt",
88
+ )
89
+
90
+ logger = logging.getLogger(__name__)
91
+
92
+
93
+ @flax.struct.dataclass
94
+ class ModelArguments:
95
+ """
96
+ Arguments pertaining to which model/config/tokenizer we are going to fine-tune from.
97
+ """
98
+
99
+ model_name_or_path: str = field(
100
+ metadata={"help": ("Path to pretrained student model or model identifier from huggingface.co/models")}
101
+ )
102
+ teacher_model_name_or_path: str = field(
103
+ metadata={"help": ("Path to pretrained teacher model or model identifier from huggingface.co/models")}
104
+ )
105
+ config_name: Optional[str] = field(
106
+ default=None,
107
+ metadata={"help": "Pretrained config name or path if not the same as model_name"},
108
+ )
109
+ tokenizer_name: Optional[str] = field(
110
+ default=None,
111
+ metadata={"help": "Pretrained tokenizer name or path if not the same as model_name"},
112
+ )
113
+ feature_extractor_name: Optional[str] = field(
114
+ default=None,
115
+ metadata={"help": "feature extractor name or path if not the same as model_name"},
116
+ )
117
+ cache_dir: Optional[str] = field(
118
+ default=None,
119
+ metadata={"help": ("Where to store the pretrained models downloaded from huggingface.co")},
120
+ )
121
+ use_fast_tokenizer: bool = field(
122
+ default=True,
123
+ metadata={"help": ("Whether to use one of the fast tokenizer (backed by the tokenizers library) or not.")},
124
+ )
125
+ model_revision: str = field(
126
+ default="main",
127
+ metadata={"help": ("The specific model version to use (can be a branch name, tag name or commit id).")},
128
+ )
129
+ subfolder: str = field(
130
+ default="",
131
+ metadata={
132
+ "help": "In case the relevant files are located inside a subfolder of the model repo on huggingface.co, you can"
133
+ "specify the folder name here."
134
+ },
135
+ )
136
+ use_auth_token: bool = field(
137
+ default=False,
138
+ metadata={
139
+ "help": (
140
+ "Will use the token generated when running `transformers-cli login`"
141
+ " (necessary to use this script with private models)."
142
+ )
143
+ },
144
+ )
145
+ dtype: Optional[str] = field(
146
+ default="float32",
147
+ metadata={
148
+ "help": (
149
+ "Floating-point format in which the model weights should be initialized"
150
+ " and trained. Choose one of `[float32, float16, bfloat16]`."
151
+ )
152
+ },
153
+ )
154
+ load_with_scan_weights: bool = field(
155
+ default=False,
156
+ metadata={
157
+ "help": "Whether the pre-trained checkpoint has its weights stored in scan format. Set to True for scanned "
158
+ "weights, defaults to False for non-scan (unrolled) weights."
159
+ },
160
+ )
161
+ activation_dropout: float = field(
162
+ default=0.0,
163
+ metadata={"help": "The dropout ratio for activations inside the fully connected layer."},
164
+ )
165
+ attention_dropout: float = field(
166
+ default=0.0,
167
+ metadata={"help": "The dropout ratio for the attention probabilities."},
168
+ )
169
+ dropout: float = field(
170
+ default=0.0,
171
+ metadata={
172
+ "help": "The dropout probability for all fully connected layers in the embeddings, encoder, and pooler."
173
+ },
174
+ )
175
+
176
+
177
+ @flax.struct.dataclass
178
+ class DataTrainingArguments:
179
+ """
180
+ Arguments pertaining to what data we are going to input our model for training and eval.
181
+ """
182
+
183
+ train_dataset_name: str = field(
184
+ default=None,
185
+ metadata={
186
+ "help": "The name of the training dataset to use (via the datasets library). Load and combine "
187
+ "multiple datasets by separating dataset ids by a '+' symbol. For example, to load and combine "
188
+ " librispeech and common voice, set `train_dataset_name='librispeech_asr+common_voice'`."
189
+ },
190
+ )
191
+ train_dataset_config_name: Optional[str] = field(
192
+ default=None,
193
+ metadata={
194
+ "help": "The configuration name of the training dataset to use (via the datasets library). Load and combine "
195
+ "multiple datasets by separating dataset configs by a '+' symbol."
196
+ },
197
+ )
198
+ train_dataset_samples: str = field(
199
+ default=None,
200
+ metadata={
201
+ "help": "Number of samples in the training data. Load and combine "
202
+ "multiple datasets by separating dataset samples by a '+' symbol."
203
+ },
204
+ )
205
+ eval_dataset_name: str = field(
206
+ default=None,
207
+ metadata={
208
+ "help": "The name of the evaluation dataset to use (via the datasets library). Defaults to the training dataset name if unspecified."
209
+ },
210
+ )
211
+ eval_dataset_config_name: Optional[str] = field(
212
+ default=None,
213
+ metadata={
214
+ "help": "The configuration name of the evaluation dataset to use (via the datasets library). Defaults to the training dataset config name if unspecified"
215
+ },
216
+ )
217
+ dataset_cache_dir: Optional[str] = field(
218
+ default=None,
219
+ metadata={"help": "Path to cache directory for saving and loading datasets"},
220
+ )
221
+ overwrite_cache: bool = field(
222
+ default=False,
223
+ metadata={"help": "Overwrite the cached training and evaluation sets"},
224
+ )
225
+ preprocessing_num_workers: Optional[int] = field(
226
+ default=None,
227
+ metadata={"help": "The number of processes to use for the preprocessing."},
228
+ )
229
+ max_train_samples: Optional[int] = field(
230
+ default=None,
231
+ metadata={
232
+ "help": (
233
+ "For debugging purposes or quicker training, truncate the number of"
234
+ " training examples to this value if set."
235
+ )
236
+ },
237
+ )
238
+ max_eval_samples: Optional[int] = field(
239
+ default=None,
240
+ metadata={
241
+ "help": (
242
+ "For debugging purposes or quicker training, truncate the number of"
243
+ " evaluation examples to this value if set."
244
+ )
245
+ },
246
+ )
247
+ audio_column_name: str = field(
248
+ default="audio",
249
+ metadata={"help": ("The name of the dataset column containing the audio data. Defaults to 'audio'")},
250
+ )
251
+ train_text_column_name: str = field(
252
+ default="whisper_transcript",
253
+ metadata={
254
+ "help": (
255
+ "The name of the dataset column containing the text data. Defaults to"
256
+ " 'whisper_transcript'which is the pseudo-labelled Whisper"
257
+ " transcription data."
258
+ )
259
+ },
260
+ )
261
+ eval_text_column_name: str = field(
262
+ default="text",
263
+ metadata={
264
+ "help": (
265
+ "The name of the dataset column containing the text data. Defaults to"
266
+ " 'text', which is the original text data"
267
+ )
268
+ },
269
+ )
270
+ max_duration_in_seconds: float = field(
271
+ default=30.0,
272
+ metadata={"help": ("Filter audio files that are longer than `max_duration_in_seconds` seconds")},
273
+ )
274
+ min_duration_in_seconds: float = field(
275
+ default=0.0,
276
+ metadata={"help": ("Filter audio files that are shorter than `min_duration_in_seconds` seconds")},
277
+ )
278
+ max_label_length: int = field(
279
+ default=128,
280
+ metadata={"help": "Truncate transcriptions that are longer `max_label_length` tokens."},
281
+ )
282
+ pad_target_to_multiple_of: Optional[int] = field(
283
+ default=None,
284
+ metadata={
285
+ "help": (
286
+ "If set will pad the target sequence to a multiple of the provided"
287
+ " value. This is important to avoid triggering recompilations on TPU."
288
+ " If unspecified, will default to padding the targets to max length."
289
+ )
290
+ },
291
+ )
292
+ preprocessing_only: bool = field(
293
+ default=False,
294
+ metadata={
295
+ "help": (
296
+ "Whether to only do data preprocessing and skip training. This is"
297
+ " especially useful when data preprocessing errors out in distributed"
298
+ " training due to timeout. In this case, one should run the"
299
+ " preprocessing in a non-distributed setup with"
300
+ " `preprocessing_only=True` so that the cached datasets can"
301
+ " consequently be loaded in distributed training"
302
+ )
303
+ },
304
+ )
305
+ train_split_name: str = field(
306
+ default="train",
307
+ metadata={
308
+ "help": ("The name of the training data set split to use (via the datasets library). Defaults to 'train'")
309
+ },
310
+ )
311
+ eval_split_name: str = field(
312
+ default="validation",
313
+ metadata={
314
+ "help": (
315
+ "The name of the evaluation data set split to use (via the datasets"
316
+ " library). Defaults to 'validation'"
317
+ )
318
+ },
319
+ )
320
+ wandb_project: str = field(
321
+ default="distil-whisper",
322
+ metadata={"help": "The name of the wandb project."},
323
+ )
324
+ wandb_name: str = field(
325
+ default=None,
326
+ metadata={"help": "The name of the wandb run."},
327
+ )
328
+ wandb_job_type: str = field(
329
+ default="distil-whisper",
330
+ metadata={"help": "The name of the wandb job type."},
331
+ )
332
+ wandb_dir: str = field(
333
+ default=None,
334
+ metadata={"help": "The absolute path to save the wandb logs."},
335
+ )
336
+ save_code_to_wandb: bool = field(
337
+ default=False,
338
+ metadata={
339
+ "help": (
340
+ "Whether to save main script to wandb. This is valuable for improving"
341
+ " experiment reproducibility and to diff code across experiments in"
342
+ " the UI."
343
+ )
344
+ },
345
+ )
346
+ streaming: bool = field(
347
+ default=True,
348
+ metadata={"help": "Whether to use Datasets' streaming mode to load and the data."},
349
+ )
350
+ wer_threshold: float = field(
351
+ default=None,
352
+ metadata={
353
+ "help": "Filter training data with Whisper transcriptions that have greater than `wer_threshold` "
354
+ "WER with the normalised transcriptions."
355
+ },
356
+ )
357
+ prefetch_size: int = field(
358
+ default=0,
359
+ metadata={"help": "Number of samples to pre-fetch if using an iterable dataset."},
360
+ )
361
+ timestamp_probability: float = field(
362
+ default=0.5, metadata={"help": "Probability for training on timestamped tokens if the data contains it."}
363
+ )
364
+ return_timestamps: bool = field(
365
+ default=False, metadata={"help": "Whether or not to predict timestamps in the generation step."}
366
+ )
367
+ round_timestamps: bool = field(
368
+ default=False,
369
+ metadata={
370
+ "help": "Whether or not to round the timestamp tokens to the nearest tenth of a second."
371
+ "By default, Whisper predicts timestamps to the nearest hundredth of a second."
372
+ "Reducing the timestamp precision to one tenth of a second simplifies the timestamp"
373
+ "prediction task, at the expense of timestamp granularity."
374
+ },
375
+ )
376
+
377
+
378
+ @dataclass
379
+ class FlaxSeq2SeqTrainingArguments(Seq2SeqTrainingArguments):
380
+ use_scan: Optional[bool] = field(
381
+ default=True,
382
+ metadata={
383
+ "help": (
384
+ "Whether or not to use `scan_with_axes` over the encoder and decoder blocks. Using scan results "
385
+ "in faster compile times and more efficient memory use during training, since all of the layers "
386
+ "in the encoder/decoder are stacked, and we perform a lax.scan over the stacked block to index "
387
+ "each layer. However, it results in slower inference time due to the overhead of stacking the "
388
+ "layers this way. Thus, we **always** default to disabling scan for the inference step."
389
+ )
390
+ },
391
+ )
392
+ freeze_encoder: Optional[bool] = field(
393
+ default=False,
394
+ metadata={
395
+ "help": (
396
+ "Whether to freeze the entire encoder model. Only recommended when the entire encoder has been "
397
+ "copied from the teacher model."
398
+ )
399
+ },
400
+ )
401
+ temperature: Optional[float] = field(
402
+ default=2.0, metadata={"help": "Temperature to anneal the logits when computing the softmax."}
403
+ )
404
+ kl_weight: Optional[float] = field(
405
+ default=1.0,
406
+ metadata={
407
+ "help": (
408
+ "Weighting assigned to the MSE loss in the KD formulation. MSE loss is "
409
+ "computed between the teacher-student hidden states and attentions."
410
+ )
411
+ },
412
+ )
413
+ mse_weight: Optional[float] = field(
414
+ default=0.0,
415
+ metadata={
416
+ "help": (
417
+ "Weighting assigned to the MSE loss in the KD formulation. MSE loss is "
418
+ "computed between the teacher-student hidden states and attentions."
419
+ )
420
+ },
421
+ )
422
+ precision: Optional[str] = field(
423
+ default="half_mixed",
424
+ metadata={
425
+ "help": (
426
+ "Precision with which run training, Can be one of `full`, `half_mixed` or `full_mixed`, the latter two"
427
+ "of which enable *mixed-precision* training. **Note that this only specifies the dtype of the computation "
428
+ "and optimizer state. It does not influence the dtype of model parameters.** An explanation of the three "
429
+ "settings is provided below:"
430
+ " 1. Full precision: forward pass, backward pass and optimiser states all in float32."
431
+ " 2. Half mixed precision: forward pass in bfloat16, backward pass and optimiser states in float32. This "
432
+ " corresponds to setting the dtype argument to bfloat16 when instantiating the model."
433
+ " 3. Full mixed precision: forward pass, backward pass and optimiser states all in bfloat16. The dtype "
434
+ " argument is set to bfloat16 for the forward pass, and the gradients computed with respect to the bfloat16 "
435
+ " parameters in the backward pass (giving bfloat16 gradients). The new optimiser states and parameter "
436
+ " updates are computed in float32 by upcasting the bfloat16 gradients and optimiser states to float32 "
437
+ " prior to the optimiser update step. The optimiser states are returned in float32 (but not saved to "
438
+ " memory) and then downcasted to bfloat16 (saved to memory) for the subsequent train step."
439
+ "For further details, refer to https://github.com/deepmind/optax/discussions/336"
440
+ )
441
+ },
442
+ )
443
+ compilation_cache: Optional[bool] = field(
444
+ default=False,
445
+ metadata={
446
+ "help": (
447
+ "Whether to enable the JAX (experimental) compilation cache. The compilation step is *cached* the "
448
+ "first time it is run. Successive compilation steps for the same function utilise the cache to reduce"
449
+ "the compilation time."
450
+ )
451
+ },
452
+ )
453
+ save_train_state: Optional[bool] = field(
454
+ default=False,
455
+ metadata={
456
+ "help": "Whether or not to save the Flax Train State on each `save_steps` steps. Required if you intend"
457
+ "to resume training from partial training runs. If False, only the model weights will be saved."
458
+ "If True, both the model weights and Flax Train state will be saved."
459
+ },
460
+ )
461
+
462
+
463
+ def shift_tokens_right(label_ids: np.array, decoder_start_token_id: int) -> np.ndarray:
464
+ """
465
+ Shift label ids one token to the right.
466
+ """
467
+ shifted_label_ids = np.zeros_like(label_ids)
468
+ shifted_label_ids[:, 1:] = label_ids[:, :-1]
469
+ shifted_label_ids[:, 0] = decoder_start_token_id
470
+
471
+ return shifted_label_ids
472
+
473
+
474
+ @flax.struct.dataclass
475
+ class FlaxDataCollatorSpeechSeq2SeqWithPadding:
476
+ """
477
+ Data collator that will dynamically pad the inputs received.
478
+ Args:
479
+ processor ([`Wav2Vec2Processor`])
480
+ The processor used for proccessing the data.
481
+ decoder_start_token_id (:obj: `int`)
482
+ The start-of-sequence token id of the decoder.
483
+ decoder_prev_token_id (:obj: `int`)
484
+ The start-of-prompt token id of the decoder
485
+ input_padding (:obj:`bool`, :obj:`str` or :class:`~transformers.tokenization_utils_base.PaddingStrategy`, `optional`, defaults to :obj:`True`):
486
+ Select a strategy to pad the returned input sequences (according to the model's padding side and padding index)
487
+ among:
488
+ * :obj:`True` or :obj:`'longest'`: Pad to the longest sequence in the batch (or no padding if only a single
489
+ sequence if provided).
490
+ * :obj:`'max_length'`: Pad to a maximum length specified with the argument :obj:`max_length` or to the
491
+ maximum acceptable input length for the model if that argument is not provided.
492
+ * :obj:`False` or :obj:`'do_not_pad'` (default): No padding (i.e., can output a batch with sequences of
493
+ different lengths).
494
+ target_padding (:obj:`bool`, :obj:`str` or :class:`~transformers.tokenization_utils_base.PaddingStrategy`, `optional`, defaults to :obj:`True`):
495
+ Select a strategy to pad the returned target sequences (according to the model's padding side and padding index).
496
+ See above for details.
497
+ max_target_length (:obj:`int`, `optional`):
498
+ Maximum length of the ``labels`` of the returned list and optionally padding length (see above).
499
+ """
500
+
501
+ processor: Any
502
+ decoder_start_token_id: int
503
+ decoder_prev_token_id: int
504
+ input_padding: Union[bool, str] = "max_length"
505
+ target_padding: Union[bool, str] = "max_length"
506
+ max_target_length: Optional[int] = None
507
+
508
+ def __call__(self, features: List[Dict[str, Union[List[int], np.ndarray]]]) -> Dict[str, np.ndarray]:
509
+ # split inputs and labels since they have to be of different lengths and need
510
+ # different padding methods
511
+ model_input_name = self.processor.model_input_names[0]
512
+
513
+ # dataloader returns a list of features which we convert to a dict
514
+ input_features = {model_input_name: [feature[model_input_name] for feature in features]}
515
+ label_features = {"input_ids": [feature["labels"] for feature in features]}
516
+
517
+ # reformat list to dict and set to pytorch format
518
+ batch = self.processor.feature_extractor.pad(
519
+ input_features,
520
+ padding=self.input_padding,
521
+ return_tensors="np",
522
+ )
523
+
524
+ labels_batch = self.processor.tokenizer.pad(
525
+ label_features,
526
+ max_length=self.max_target_length,
527
+ padding=self.target_padding,
528
+ return_tensors="np",
529
+ )
530
+
531
+ # if bos token is appended in previous tokenization step,
532
+ # cut bos token here as it's append later anyways
533
+ labels = labels_batch["input_ids"]
534
+ if set(np.unique(labels[:, 0])).issubset({self.decoder_start_token_id, self.decoder_prev_token_id}):
535
+ decoder_input_ids = labels[:, :-1]
536
+ labels = labels[:, 1:]
537
+ labels_batch.attention_mask = labels_batch.attention_mask[:, 1:]
538
+ else:
539
+ decoder_input_ids = shift_tokens_right(labels, self.decoder_start_token_id)
540
+
541
+ # replace padding with -100 to ignore correctly when computing the loss
542
+ labels = np.ma.array(labels, mask=np.not_equal(labels_batch.attention_mask, 1))
543
+ labels = labels.filled(fill_value=-100)
544
+
545
+ # replace initial prompt tokens with -100 to ignore correctly when computing the loss
546
+ bos_index = np.argmax(labels == self.decoder_start_token_id, axis=1)
547
+ prompt_mask = np.arange(labels.shape[1]) < bos_index[:, None]
548
+ labels = np.where(prompt_mask, -100, labels)
549
+
550
+ batch["labels"] = labels
551
+ batch["decoder_input_ids"] = decoder_input_ids
552
+
553
+ return batch
554
+
555
+
556
+ def get_data_loader(
557
+ seed: int,
558
+ dataset: IterableDataset,
559
+ batch_size: int,
560
+ data_collator: FlaxDataCollatorSpeechSeq2SeqWithPadding,
561
+ shuffle: bool = False,
562
+ drop_last: bool = True,
563
+ dataloader_num_workers: int = 0,
564
+ skip_batches: int = 0,
565
+ pin_memory: bool = True,
566
+ prefetch_size: int = 0,
567
+ ) -> DataLoader:
568
+ """
569
+ Returns batches of size `batch_size` from `dataset`. If `drop_last` is set to `False`, the final batch may be incomplete,
570
+ and range in size from 1 to `batch_size`. Shuffle batches if `shuffle` is `True`.
571
+
572
+ Args:
573
+ seed (int): Numpy seed for generating pseudo random numbers. Used if shuffling the dataset.
574
+ dataset (IterableDataset): streaming dataset from which to load the data.
575
+ batch_size (int): how many samples per batch to load.
576
+ data_collator (FlaxDataCollatorSpeechSeq2SeqWithPadding, optional): merges a list of samples to form a
577
+ mini-batch of Tensor(s). Used when using batched loading from a map-style dataset.
578
+ shuffle (bool, optional): set to `True` to have the batches reshuffled.
579
+ drop_last (bool, optional): set to ``True`` to drop the last incomplete batch,
580
+ if the dataset size is not divisible by the batch size. If ``False`` and
581
+ the size of dataset is not divisible by the batch size, then the last batch
582
+ will be smaller. (default: ``False``)
583
+ dataloader_num_workers (int, optional): how many subprocesses to use for data
584
+ loading. ``0`` means that the data will be loaded in the main process.
585
+ (default: ``0``)
586
+ skip_batches (int, optional): Efficiently skip the first `skip_batches`.
587
+ pin_memory (bool, optional): If ``True``, the data loader will copy Tensors
588
+ into device/CUDA pinned memory before returning them. If your data elements
589
+ are a custom type, or your :attr:`collate_fn` returns a batch that is a custom type,
590
+ see the example below.
591
+
592
+ """
593
+ if shuffle:
594
+ dataset = dataset.shuffle(seed)
595
+
596
+ if skip_batches > 0:
597
+ dataset = dataset.skip(skip_batches * batch_size)
598
+
599
+ if prefetch_size > 0:
600
+ dataset = IterableWrapper(dataset)
601
+ dataset = dataset.prefetch(prefetch_size)
602
+
603
+ data_loader = DataLoader(
604
+ dataset,
605
+ batch_size=batch_size,
606
+ drop_last=drop_last,
607
+ pin_memory=pin_memory,
608
+ collate_fn=data_collator,
609
+ num_workers=dataloader_num_workers,
610
+ )
611
+
612
+ return data_loader
613
+
614
+
615
+ def sorted_checkpoints(output_dir=None, checkpoint_prefix="checkpoint", use_mtime=False) -> List[str]:
616
+ ordering_and_checkpoint_path = []
617
+
618
+ glob_checkpoints = [str(x) for x in Path(output_dir).glob(f"{checkpoint_prefix}-*") if os.path.isdir(x)]
619
+
620
+ for path in glob_checkpoints:
621
+ if use_mtime:
622
+ ordering_and_checkpoint_path.append((os.path.getmtime(path), path))
623
+ else:
624
+ regex_match = re.match(f".*{checkpoint_prefix}-([0-9]+)", path)
625
+ if regex_match is not None and regex_match.groups() is not None:
626
+ ordering_and_checkpoint_path.append((int(regex_match.groups()[0]), path))
627
+
628
+ checkpoints_sorted = sorted(ordering_and_checkpoint_path)
629
+ checkpoints_sorted = [checkpoint[1] for checkpoint in checkpoints_sorted]
630
+ return checkpoints_sorted
631
+
632
+
633
+ def rotate_checkpoints(
634
+ save_total_limit=None, use_mtime=False, output_dir=None, checkpoint_prefix="checkpoint"
635
+ ) -> None:
636
+ if save_total_limit is None or save_total_limit <= 0:
637
+ return
638
+
639
+ # Check if we should delete older checkpoint(s)
640
+ checkpoints_sorted = sorted_checkpoints(
641
+ use_mtime=use_mtime, output_dir=output_dir, checkpoint_prefix=checkpoint_prefix
642
+ )
643
+ if len(checkpoints_sorted) <= save_total_limit:
644
+ return
645
+
646
+ number_of_checkpoints_to_delete = max(0, len(checkpoints_sorted) - save_total_limit)
647
+ checkpoints_to_be_deleted = checkpoints_sorted[:number_of_checkpoints_to_delete]
648
+ for checkpoint in checkpoints_to_be_deleted:
649
+ logger.info(f"Deleting older checkpoint [{checkpoint}] due to args.save_total_limit")
650
+ shutil.rmtree(checkpoint, ignore_errors=True)
651
+
652
+
653
+ def to_fp32(t):
654
+ return jax.tree_map(lambda x: x.astype(jnp.float32) if x.dtype == jnp.bfloat16 else x, t)
655
+
656
+
657
+ def to_bf16(t):
658
+ return jax.tree_map(lambda x: x.astype(jnp.bfloat16) if x.dtype == jnp.float32 else x, t)
659
+
660
+
661
+ class TrainState(train_state.TrainState):
662
+ dropout_rng: jnp.ndarray
663
+ max_grad_norm: float
664
+
665
+ def apply_gradients(self, *, grads, to_dtype: to_fp32, **kwargs):
666
+ """Updates `step`, `params`, `opt_state` and `**kwargs` in return value, clipping the
667
+ gradients by the maximum grad norm.
668
+
669
+ Note that internally this function calls `.tx.update()` followed by a call
670
+ to `optax.apply_updates()` to update `params` and `opt_state`.
671
+
672
+ Args:
673
+ grads: Gradients that have the same pytree structure as `.params`.
674
+ **kwargs: Additional dataclass attributes that should be `.replace()`-ed.
675
+
676
+ Returns:
677
+ An updated instance of `self` with `step` incremented by one, `params`
678
+ and `opt_state` updated by applying `grads`, and additional attributes
679
+ replaced as specified by `kwargs`.
680
+ """
681
+ # clip gradients by global l2 norm
682
+ casted_max_grad_norm = to_dtype(self.max_grad_norm)
683
+ g_norm = linear_algebra.global_norm(grads)
684
+ g_norm = jnp.maximum(casted_max_grad_norm, g_norm)
685
+ grads = jax.tree_map(lambda t: (t / g_norm) * casted_max_grad_norm, grads)
686
+
687
+ # perform update step in fp32 and subsequently downcast optimizer states if mixed precision training
688
+ # grads and opt_state in bf16 (need to upcast), params in fp32 (leave as is)
689
+ updates, new_opt_state = self.tx.update(to_fp32(grads), to_fp32(self.opt_state), self.params)
690
+
691
+ new_params = optax.apply_updates(self.params, updates)
692
+
693
+ return self.replace(
694
+ step=self.step + 1,
695
+ params=new_params,
696
+ opt_state=to_dtype(new_opt_state),
697
+ **kwargs,
698
+ )
699
+
700
+ @classmethod
701
+ def create(cls, *, apply_fn, params, tx, to_dtype: to_fp32, **kwargs):
702
+ """Creates a new instance with `step=0` and initialized `opt_state`."""
703
+ # downcast optimizer state to bf16 if mixed-precision training
704
+ opt_state = tx.init(to_dtype(params))
705
+ return cls(
706
+ step=0,
707
+ apply_fn=apply_fn,
708
+ params=params,
709
+ tx=tx,
710
+ opt_state=opt_state,
711
+ **kwargs,
712
+ )
713
+
714
+ def replicate(self):
715
+ return jax_utils.replicate(self).replace(dropout_rng=shard_prng_key(self.dropout_rng))
716
+
717
+ def unreplicate(self):
718
+ return jax_utils.unreplicate(self)
719
+
720
+ def save_state(self, output_dir, save_total_limit=None, checkpoint_prefix="checkpoint"):
721
+ step = int(jax.device_get(unreplicate(self.step)))
722
+ serialized_state = to_bytes(self.unreplicate())
723
+
724
+ output_file = Path(os.path.join(output_dir, f"{checkpoint_prefix}-{step}", "train_state.msgpack"))
725
+ output_file.parent.mkdir(exist_ok=True, parents=True)
726
+
727
+ with output_file.open("wb") as f:
728
+ f.write(serialized_state)
729
+
730
+ logger.info(f"Flax train state saved in {output_file}")
731
+ rotate_checkpoints(
732
+ save_total_limit=save_total_limit, output_dir=output_dir, checkpoint_prefix=checkpoint_prefix
733
+ )
734
+
735
+
736
+ def save_hf_weights(
737
+ student_state: TrainState,
738
+ student_model: FlaxWhisperForConditionalGeneration,
739
+ processor: WhisperProcessor,
740
+ output_dir: str,
741
+ cur_step: int,
742
+ total_train_steps: int,
743
+ use_scan: bool = True,
744
+ checkpoint_prefix: str = "checkpoint",
745
+ ) -> None:
746
+ # always disable scan in the params / model so that we can load from PyTorch directly - this is a no-op if we're not using scan for training
747
+ student_state_params = unreplicate(student_state.params)
748
+ student_state_params = student_model.convert_scan_to_unroll(student_state_params)
749
+ student_params = jax.device_get(student_state_params)
750
+ student_model.disable_scan()
751
+
752
+ if cur_step != total_train_steps:
753
+ output_dir = os.path.join(output_dir, f"{checkpoint_prefix}-{cur_step}")
754
+ os.makedirs(output_dir, exist_ok=True)
755
+
756
+ student_model.save_pretrained(output_dir, params=student_params)
757
+ processor.save_pretrained(output_dir)
758
+
759
+ # re-enable scan only if required for training
760
+ if use_scan:
761
+ student_model.enable_scan()
762
+
763
+
764
+ def write_train_metric(summary_writer, train_metrics, train_time, step, logging_steps):
765
+ summary_writer.scalar("train/time", train_time, step)
766
+
767
+ train_metrics = get_metrics(train_metrics)
768
+ for key, vals in train_metrics.items():
769
+ steps_arr = np.arange(0, step, logging_steps)[-len(vals) :]
770
+ tag = f"train/{key}"
771
+ for i, val in enumerate(vals):
772
+ summary_writer.scalar(tag, val, steps_arr[i])
773
+
774
+
775
+ def write_eval_metric(summary_writer, eval_metrics, step, prefix="eval"):
776
+ for metric_name, value in eval_metrics.items():
777
+ summary_writer.scalar(f"{prefix}/{metric_name}", value, step)
778
+
779
+
780
+ def write_wandb_metric(wandb_logger, metrics, train_time, step, epoch, prefix="train"):
781
+ log_metrics = {}
782
+ for k, v in metrics.items():
783
+ log_metrics[f"{prefix}/{k}"] = v
784
+ log_metrics[f"{prefix}/time"] = train_time
785
+ log_metrics[f"{prefix}/epoch"] = epoch
786
+ wandb_logger.log(log_metrics, step)
787
+
788
+
789
+ def write_wandb_pred(
790
+ wandb_logger, pred_str, label_str, norm_pred_str, norm_label_str, cur_step, prefix="eval", num_lines=200000
791
+ ):
792
+ # pretty name for current step: step 50000 -> step 50k
793
+ cur_step_pretty = f"{int(cur_step // 1000)}k" if cur_step > 1000 else cur_step
794
+ # convert str data to a wandb compatible format
795
+ str_data = [[label_str[i], pred_str[i], norm_label_str[i], norm_pred_str[i]] for i in range(len(pred_str))]
796
+ # log as a table with the appropriate headers
797
+ wandb_logger.log(
798
+ {
799
+ f"predictions/{prefix.replace('/', '-')}-step-{cur_step_pretty}": wandb_logger.Table(
800
+ columns=["Target", "Pred", "Norm Target", "Norm Pred"], data=str_data[:num_lines]
801
+ )
802
+ },
803
+ cur_step,
804
+ )
805
+ # log incorrect normalised predictions
806
+ str_data = np.asarray(str_data)
807
+ str_data_incorrect = str_data[str_data[:, -2] != str_data[:, -1]]
808
+ # log as a table with the appropriate headers
809
+ wandb_logger.log(
810
+ {
811
+ f"incorrect_predictions/{prefix.replace('/', '-')}-step-{cur_step_pretty}": wandb_logger.Table(
812
+ columns=["Target", "Pred", "Norm Target", "Norm Pred"], data=str_data_incorrect[:num_lines]
813
+ )
814
+ },
815
+ cur_step,
816
+ )
817
+
818
+
819
+ def create_learning_rate_fn(
820
+ num_train_steps: int, lr_scheduler_type: str, num_warmup_steps: int, learning_rate: float
821
+ ) -> Callable[[int], jnp.array]:
822
+ """Returns a linear warmup, linear_decay learning rate function."""
823
+ lr_scheduler_types = ("linear", "constant_with_warmup")
824
+
825
+ if lr_scheduler_type not in lr_scheduler_types:
826
+ raise ValueError(
827
+ f"lr_scheduler_type of type {lr_scheduler_type} not supported, choose from {lr_scheduler_types}."
828
+ )
829
+
830
+ warmup_fn = optax.linear_schedule(init_value=0.0, end_value=learning_rate, transition_steps=num_warmup_steps)
831
+ decay_fn = optax.linear_schedule(
832
+ init_value=learning_rate,
833
+ end_value=0 if lr_scheduler_type == "linear" else learning_rate,
834
+ transition_steps=num_train_steps - num_warmup_steps,
835
+ )
836
+ schedule_fn = optax.join_schedules(schedules=[warmup_fn, decay_fn], boundaries=[num_warmup_steps])
837
+ return schedule_fn
838
+
839
+
840
+ def convert_dataset_str_to_list(
841
+ dataset_names,
842
+ dataset_config_names,
843
+ splits=None,
844
+ text_column_names=None,
845
+ dataset_samples=None,
846
+ default_split="train",
847
+ ):
848
+ if isinstance(dataset_names, str):
849
+ dataset_names = dataset_names.split("+")
850
+
851
+ # we assume that all the datasets we're using derive from the distil-whisper org on the Hub - prepend the org name if necessary
852
+ for i in range(len(dataset_names)):
853
+ ds_name = dataset_names[i]
854
+ dataset_names[i] = f"distil-whisper/{ds_name}" if "/" not in ds_name else ds_name
855
+
856
+ dataset_config_names = dataset_config_names.split("+")
857
+ splits = splits.split("+") if splits is not None else None
858
+ text_column_names = text_column_names.split("+") if text_column_names is not None else None
859
+ dataset_samples = dataset_samples.split("+") if dataset_samples is not None else None
860
+
861
+ # basic checks to ensure we've got the right number of datasets/configs/splits/columns/probs
862
+ if len(dataset_names) != len(dataset_config_names):
863
+ raise ValueError(
864
+ f"Ensure one config is passed for each dataset, got {len(dataset_names)} datasets and"
865
+ f" {len(dataset_config_names)} configs."
866
+ )
867
+
868
+ if splits is not None and len(splits) != len(dataset_names):
869
+ raise ValueError(
870
+ f"Ensure one split is passed for each dataset, got {len(dataset_names)} datasets and {len(splits)} splits."
871
+ )
872
+
873
+ if text_column_names is not None and len(text_column_names) != len(dataset_names):
874
+ raise ValueError(
875
+ f"Ensure one text column name is passed for each dataset, got {len(dataset_names)} datasets and"
876
+ f" {len(text_column_names)} text column names."
877
+ )
878
+
879
+ if dataset_samples is not None:
880
+ if len(dataset_samples) != len(dataset_names):
881
+ raise ValueError(
882
+ f"Ensure one sample is passed for each dataset, got {len(dataset_names)} datasets and "
883
+ f"{len(dataset_samples)} samples."
884
+ )
885
+ dataset_samples = [float(ds_sample) for ds_sample in dataset_samples]
886
+ else:
887
+ dataset_samples = [None] * len(dataset_names)
888
+
889
+ text_column_names = (
890
+ text_column_names if text_column_names is not None else ["text" for _ in range(len(dataset_names))]
891
+ )
892
+ splits = splits if splits is not None else [default_split for _ in range(len(dataset_names))]
893
+
894
+ dataset_names_dict = []
895
+ for i, ds_name in enumerate(dataset_names):
896
+ dataset_names_dict.append(
897
+ {
898
+ "name": ds_name,
899
+ "config": dataset_config_names[i],
900
+ "split": splits[i],
901
+ "text_column_name": text_column_names[i],
902
+ "samples": dataset_samples[i],
903
+ }
904
+ )
905
+ return dataset_names_dict
906
+
907
+
908
+ def load_multiple_datasets(
909
+ dataset_names: Union[List, str],
910
+ dataset_config_names: Union[List, str],
911
+ splits: Optional[Union[List, str]] = None,
912
+ text_column_names: Optional[List] = None,
913
+ sampling_rate: Optional[int] = 16000,
914
+ stopping_strategy: Optional[str] = "first_exhausted",
915
+ dataset_samples: Optional[Union[List, np.array]] = None,
916
+ streaming: bool = True,
917
+ seed: int = None,
918
+ **kwargs,
919
+ ) -> IterableDataset:
920
+ dataset_names_dict = convert_dataset_str_to_list(
921
+ dataset_names, dataset_config_names, splits, text_column_names, dataset_samples
922
+ )
923
+
924
+ if dataset_samples is not None:
925
+ dataset_samples = [ds_dict["samples"] for ds_dict in dataset_names_dict]
926
+ probabilities = np.array(dataset_samples) / np.sum(dataset_samples)
927
+ else:
928
+ probabilities = None
929
+
930
+ if len(dataset_names_dict) == 1:
931
+ dataset_dict = dataset_names_dict[0]
932
+ # we have a single dataset so just return it as is
933
+ return load_dataset(
934
+ dataset_dict["name"],
935
+ dataset_dict["config"],
936
+ split=dataset_dict["split"],
937
+ streaming=streaming,
938
+ **kwargs,
939
+ )
940
+
941
+ all_datasets = []
942
+ # iterate over the datasets we want to interleave
943
+ for dataset_dict in tqdm(dataset_names_dict, desc="Combining datasets..."):
944
+ dataset = load_dataset(
945
+ dataset_dict["name"],
946
+ dataset_dict["config"],
947
+ split=dataset_dict["split"],
948
+ streaming=streaming,
949
+ **kwargs,
950
+ )
951
+ # resample to specified sampling rate
952
+ dataset = dataset.cast_column("audio", datasets.features.Audio(sampling_rate))
953
+ dataset = dataset.remove_columns(
954
+ set(dataset.features.keys()) - {"audio", dataset_dict["text_column_name"], "whisper_transcript"}
955
+ )
956
+ all_datasets.append(dataset)
957
+
958
+ if streaming:
959
+ interleaved_dataset = interleave_datasets(
960
+ all_datasets,
961
+ stopping_strategy=stopping_strategy,
962
+ probabilities=probabilities,
963
+ seed=seed,
964
+ )
965
+ else:
966
+ interleaved_dataset = concatenate_datasets(all_datasets)
967
+
968
+ return interleaved_dataset
969
+
970
+
971
+ def get_layers_to_supervise(student_layers: int, teacher_layers: int) -> dict:
972
+ """Helper function to map the student layer i to the teacher layer j whose output we'd like them to emulate. Used
973
+ for MSE loss terms in distillation (hidden-states and activations). Student layers are paired with teacher layers
974
+ in equal increments, e.g. for a 12-layer model distilled to a 3-layer model, student layer 0 emulates teacher layer
975
+ 3 (such that it behaves like the first 4 teacher layers), student layer 1 emulates teacher layer 7, and student layer
976
+ 2 emulates teacher layer 11. This mapping is summarised by the dictionary: {0: 3, 1: 7, 2: 11}, which is precisely
977
+ the output of this function for the arguments (student_layers=3, teacher_layers=12)."""
978
+ layer_intervals = np.linspace(teacher_layers // student_layers - 1, teacher_layers - 1, student_layers, dtype=int)
979
+ layer_intervals[-1] = teacher_layers - 1
980
+ layer_map = {}
981
+
982
+ for student_layer, teacher_layer in enumerate(layer_intervals):
983
+ layer_map[student_layer] = teacher_layer
984
+
985
+ return layer_map
986
+
987
+
988
+ class FlaxWhisperFeatureExtractor(WhisperFeatureExtractor):
989
+ def _np_extract_fbank_features(self, waveform: np.array) -> np.ndarray:
990
+ """
991
+ Compute the log-mel spectrogram of the provided audio using torch filters. Using the torch implementation
992
+ computes stft filter banks approx 5x faster than its numpy counterpart, which is the native implementation
993
+ in transformers, and matches to within 1e-5 abs tolerance.
994
+ """
995
+ waveform = torch.from_numpy(waveform).type(torch.float32)
996
+
997
+ window = torch.hann_window(self.n_fft)
998
+ stft = torch.stft(waveform, self.n_fft, self.hop_length, window=window, return_complex=True)
999
+ magnitudes = stft[..., :-1].abs() ** 2
1000
+
1001
+ mel_filters = torch.from_numpy(self.mel_filters).type(torch.float32)
1002
+ mel_spec = mel_filters.T @ magnitudes
1003
+
1004
+ log_spec = torch.clamp(mel_spec, min=1e-10).log10()
1005
+ log_spec = torch.maximum(log_spec, log_spec.max() - 8.0)
1006
+ log_spec = (log_spec + 4.0) / 4.0
1007
+ return log_spec.numpy()
1008
+
1009
+
1010
+ def main():
1011
+ # 1. Parse input arguments
1012
+ # See all possible arguments in src/transformers/training_args.py
1013
+ # or by passing the --help flag to this script.
1014
+ # We now keep distinct sets of args, for a cleaner separation of concerns.
1015
+ parser = HfArgumentParser((ModelArguments, DataTrainingArguments, FlaxSeq2SeqTrainingArguments))
1016
+
1017
+ if len(sys.argv) == 2 and sys.argv[1].endswith(".json"):
1018
+ # If we pass only one argument to the script and it's the path to a json file,
1019
+ # let's parse it to get our arguments.
1020
+ model_args, data_args, training_args = parser.parse_json_file(json_file=os.path.abspath(sys.argv[1]))
1021
+ else:
1022
+ model_args, data_args, training_args = parser.parse_args_into_dataclasses()
1023
+
1024
+ # Sending telemetry. Tracking the example usage helps us better allocate resources to maintain them. The
1025
+ # information sent is the one passed as arguments along with your JAX/Flax versions.
1026
+ send_example_telemetry("run_flax_speech_recognition_seq2seq", model_args, data_args, framework="flax")
1027
+
1028
+ # 2. Define remote logging - do this early so that we get the full traceback on our remote logs
1029
+ # Enable tensorboard only on the master node
1030
+ has_tensorboard = is_tensorboard_available()
1031
+ if has_tensorboard:
1032
+ if jax.process_index() == 0:
1033
+ try:
1034
+ from flax.metrics.tensorboard import SummaryWriter
1035
+
1036
+ summary_writer = SummaryWriter(log_dir=os.path.join(Path(training_args.output_dir), "runs"))
1037
+ except ImportError as ie:
1038
+ has_tensorboard = False
1039
+ logger.warning(
1040
+ "Unable to display metrics through TensorBoard because some package" f" are not installed: {ie}"
1041
+ )
1042
+ else:
1043
+ logger.warning(
1044
+ "Unable to display metrics through TensorBoard because the package is not"
1045
+ " installed: Please run `pip install tensorboard` to enable."
1046
+ )
1047
+
1048
+ # Enable wandb only on the master node
1049
+ has_wandb = is_wandb_available()
1050
+ if has_wandb:
1051
+ import wandb as wandb_logger
1052
+
1053
+ # Set up wandb run
1054
+ if jax.process_index() == 0:
1055
+ wandb_logger.init(
1056
+ project=data_args.wandb_project,
1057
+ name=data_args.wandb_name,
1058
+ job_type=data_args.wandb_job_type,
1059
+ dir=data_args.wandb_dir,
1060
+ save_code=data_args.save_code_to_wandb,
1061
+ )
1062
+ else:
1063
+ logger.warning("Wandb logging requires wandb to be installed. Run `pip install wandb` to enable.")
1064
+
1065
+ # 3. Setup local logging
1066
+ # Make one log on every process with the configuration for debugging.
1067
+ logging.basicConfig(
1068
+ format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
1069
+ datefmt="%m/%d/%Y %H:%M:%S",
1070
+ handlers=[logging.StreamHandler(sys.stdout)],
1071
+ )
1072
+ # Set the verbosity to info of the Transformers logger.
1073
+ # We only want one process per machine to log things on the screen.
1074
+ logger.setLevel(logging.INFO if jax.process_index() == 0 else logging.ERROR)
1075
+ if jax.process_index() == 0:
1076
+ datasets.utils.logging.set_verbosity_warning()
1077
+ transformers.utils.logging.set_verbosity_info()
1078
+ else:
1079
+ datasets.utils.logging.set_verbosity_error()
1080
+ transformers.utils.logging.set_verbosity_error()
1081
+
1082
+ logger.info("Training/evaluation parameters %s", training_args)
1083
+
1084
+ # Check the output dir is valid
1085
+ if (
1086
+ os.path.exists(training_args.output_dir)
1087
+ and os.listdir(training_args.output_dir)
1088
+ and training_args.do_train
1089
+ and not training_args.overwrite_output_dir
1090
+ ):
1091
+ raise ValueError(
1092
+ f"Output directory ({training_args.output_dir}) already exists and is not"
1093
+ " empty. Use `--overwrite_output_dir` to overcome."
1094
+ )
1095
+
1096
+ # 4. Handle the repository creation
1097
+ if training_args.push_to_hub:
1098
+ if training_args.hub_model_id is None:
1099
+ repo_name = get_full_repo_name(
1100
+ Path(training_args.output_dir).absolute().name,
1101
+ token=training_args.hub_token,
1102
+ )
1103
+ else:
1104
+ repo_name = training_args.hub_model_id
1105
+ create_repo(repo_name, exist_ok=True, token=training_args.hub_token)
1106
+ repo = Repository(
1107
+ training_args.output_dir,
1108
+ clone_from=repo_name,
1109
+ token=training_args.hub_token,
1110
+ )
1111
+
1112
+ if training_args.compilation_cache:
1113
+ cc.initialize_cache(os.path.join(model_args.cache_dir, "jax_cache"))
1114
+
1115
+ # 5. Load dataset
1116
+ raw_datasets = IterableDatasetDict() if data_args.streaming else DatasetDict()
1117
+
1118
+ # set seed for determinism
1119
+ set_seed(training_args.seed)
1120
+
1121
+ if training_args.do_train:
1122
+ raw_datasets["train"] = load_multiple_datasets(
1123
+ data_args.train_dataset_name,
1124
+ data_args.train_dataset_config_name,
1125
+ splits=data_args.train_split_name,
1126
+ streaming=data_args.streaming,
1127
+ dataset_samples=data_args.train_dataset_samples,
1128
+ seed=training_args.seed,
1129
+ cache_dir=data_args.dataset_cache_dir,
1130
+ token=True if model_args.use_auth_token else None,
1131
+ )
1132
+
1133
+ if training_args.do_eval:
1134
+ dataset_names_dict = convert_dataset_str_to_list(
1135
+ data_args.eval_dataset_name if data_args.eval_dataset_name else data_args.train_dataset_name,
1136
+ (
1137
+ data_args.eval_dataset_config_name
1138
+ if data_args.eval_dataset_config_name
1139
+ else data_args.train_dataset_config_name
1140
+ ),
1141
+ splits=data_args.eval_split_name,
1142
+ text_column_names=data_args.eval_text_column_name,
1143
+ )
1144
+ all_eval_splits = []
1145
+ if len(dataset_names_dict) == 1:
1146
+ # load a single eval set
1147
+ dataset_dict = dataset_names_dict[0]
1148
+ all_eval_splits.append("eval")
1149
+ raw_datasets["eval"] = load_dataset(
1150
+ dataset_dict["name"],
1151
+ dataset_dict["config"],
1152
+ split=dataset_dict["split"],
1153
+ cache_dir=data_args.dataset_cache_dir,
1154
+ token=True if model_args.use_auth_token else None,
1155
+ streaming=data_args.streaming,
1156
+ )
1157
+ else:
1158
+ # load multiple eval sets
1159
+ for dataset_dict in dataset_names_dict:
1160
+ if dataset_dict["name"] == "esb/diagnostic-dataset":
1161
+ # for the ESB diagnostic dataset, the dataset name is effectively the config
1162
+ pretty_name = f"{dataset_dict['config']}-diagnostic/{dataset_dict['split']}"
1163
+ else:
1164
+ pretty_name = f"{dataset_dict['name'].split('/')[-1]}/{dataset_dict['split'].replace('.', '-')}"
1165
+ all_eval_splits.append(pretty_name)
1166
+ raw_datasets[pretty_name] = load_dataset(
1167
+ dataset_dict["name"],
1168
+ dataset_dict["config"],
1169
+ split=dataset_dict["split"],
1170
+ cache_dir=data_args.dataset_cache_dir,
1171
+ token=True if model_args.use_auth_token else None,
1172
+ streaming=data_args.streaming,
1173
+ )
1174
+ features = raw_datasets[pretty_name].features.keys()
1175
+ if "text" not in features:
1176
+ raw_datasets[pretty_name] = raw_datasets[pretty_name].rename_column(
1177
+ dataset_dict["text_column_name"], "text"
1178
+ )
1179
+ raw_datasets[pretty_name] = raw_datasets[pretty_name].remove_columns(
1180
+ set(raw_datasets[pretty_name].features.keys()) - {"audio", "text"}
1181
+ )
1182
+
1183
+ if not training_args.do_train and not training_args.do_eval:
1184
+ raise ValueError(
1185
+ "Cannot not train and not do evaluation. At least one of training or evaluation has to be performed."
1186
+ )
1187
+
1188
+ raw_datasets_train_features = list(raw_datasets["train"].features.keys())
1189
+
1190
+ if data_args.audio_column_name not in raw_datasets_train_features:
1191
+ raise ValueError(
1192
+ f"--audio_column_name '{data_args.audio_column_name}' not found in dataset"
1193
+ f" '{data_args.dataset_name}'. Make sure to set `--audio_column_name` to"
1194
+ " the correct audio column - one of"
1195
+ f" {', '.join(raw_datasets_train_features)}."
1196
+ )
1197
+
1198
+ if data_args.train_text_column_name not in raw_datasets_train_features:
1199
+ raise ValueError(
1200
+ f"--train_text_column_name {data_args.train_text_column_name} not found in dataset"
1201
+ f" '{data_args.dataset_name}'. Make sure to set `--train_text_column_name` to the"
1202
+ " correct text column - one of"
1203
+ f" {', '.join(raw_datasets_train_features)}."
1204
+ )
1205
+
1206
+ # 6. Load pretrained model, tokenizer, and feature extractor
1207
+ config = WhisperConfig.from_pretrained(
1208
+ (model_args.config_name if model_args.config_name else model_args.model_name_or_path),
1209
+ cache_dir=model_args.cache_dir,
1210
+ revision=model_args.model_revision,
1211
+ token=True if model_args.use_auth_token else None,
1212
+ )
1213
+ feature_extractor = FlaxWhisperFeatureExtractor.from_pretrained(
1214
+ (model_args.feature_extractor_name if model_args.feature_extractor_name else model_args.model_name_or_path),
1215
+ cache_dir=model_args.cache_dir,
1216
+ revision=model_args.model_revision,
1217
+ token=True if model_args.use_auth_token else None,
1218
+ )
1219
+ tokenizer = WhisperTokenizerFast.from_pretrained(
1220
+ (model_args.tokenizer_name if model_args.tokenizer_name else model_args.model_name_or_path),
1221
+ cache_dir=model_args.cache_dir,
1222
+ use_fast=model_args.use_fast_tokenizer,
1223
+ revision=model_args.model_revision,
1224
+ token=True if model_args.use_auth_token else None,
1225
+ )
1226
+
1227
+ # override timestamp tokens until tokenizer issues are fixed in transformers
1228
+ timestamps = [AddedToken("<|%.2f|>" % (i * 0.02), lstrip=False, rstrip=False) for i in range(1500 + 1)]
1229
+ tokenizer.add_tokens(timestamps)
1230
+
1231
+ config.update(
1232
+ {
1233
+ "activation_dropout": model_args.activation_dropout,
1234
+ "attention_dropout": model_args.attention_dropout,
1235
+ "dropout": model_args.dropout,
1236
+ }
1237
+ )
1238
+
1239
+ if training_args.precision == "full_mixed":
1240
+ # forward pass, backward pass and optimiser states in bf16
1241
+ dtype = jnp.bfloat16
1242
+ to_dtype = to_bf16
1243
+ elif training_args.precision == "half_mixed" or model_args.dtype == "bfloat16":
1244
+ # forward pass in bf16, backward pass and optimiser states in fp32
1245
+ dtype = jnp.bfloat16
1246
+ to_dtype = to_fp32
1247
+ else:
1248
+ if training_args.precision != "full":
1249
+ raise ValueError(
1250
+ f"`precision` should be one of: `full`, `half_mixed` or `full_mixed`, got {training_args.precision}"
1251
+ )
1252
+ # forward pass, backward pass and optimiser states in fp32
1253
+ dtype = jnp.float32
1254
+ to_dtype = to_fp32
1255
+
1256
+ student_model, student_params = FlaxWhisperForConditionalGeneration.from_pretrained(
1257
+ model_args.model_name_or_path,
1258
+ config=config,
1259
+ dtype=dtype,
1260
+ cache_dir=model_args.cache_dir,
1261
+ revision=model_args.model_revision,
1262
+ subfolder=model_args.subfolder,
1263
+ token=True if model_args.use_auth_token else None,
1264
+ _do_init=False,
1265
+ use_scan=model_args.load_with_scan_weights,
1266
+ )
1267
+
1268
+ teacher_model, teacher_params = FlaxWhisperForConditionalGeneration.from_pretrained(
1269
+ model_args.teacher_model_name_or_path,
1270
+ # config=config,
1271
+ dtype=dtype,
1272
+ cache_dir=model_args.cache_dir,
1273
+ # revision=model_args.model_revision,
1274
+ token=True if model_args.use_auth_token else None,
1275
+ _do_init=False,
1276
+ )
1277
+
1278
+ if student_model.config.decoder_start_token_id is None or teacher_model.config.decoder_start_token_id is None:
1279
+ raise ValueError(
1280
+ f"Make sure that `config.decoder_start_token_id` is correctly defined for both the "
1281
+ f"student and teacher model. Got {student_model.config.decoder_start_token_id} for the "
1282
+ f"student and {teacher_model.config.decoder_start_token_id} for the teacher."
1283
+ )
1284
+
1285
+ # enable scan / gradient checkpointing if necessary
1286
+ if training_args.use_scan:
1287
+ student_model.enable_scan() # to enable scan in the nn.Module
1288
+ student_params = student_model.convert_unroll_to_scan(student_params) # to convert the unrolled params to scan
1289
+
1290
+ teacher_model.enable_scan() # faster compile time (even though we don't train the teacher)
1291
+ teacher_params = teacher_model.convert_unroll_to_scan(teacher_params)
1292
+
1293
+ if training_args.gradient_checkpointing:
1294
+ student_model.enable_gradient_checkpointing() # to enable checkpointing in the nn.Module, there is no change to the params structure
1295
+ teacher_model.enable_gradient_checkpointing()
1296
+
1297
+ if hasattr(teacher_model.generation_config, "is_multilingual") and teacher_model.generation_config.is_multilingual:
1298
+ # We need to set the language and task ids for previously multilingual checkpoints - for now we hardcode this to Norwegian
1299
+ tokenizer.set_prefix_tokens(language="Norwegian", task="transcribe", predict_timestamps=False)
1300
+ student_model.generation_config.update(
1301
+ **{
1302
+ "language": "<|no|>",
1303
+ "task": "transcribe",
1304
+ }
1305
+ )
1306
+
1307
+ # 7. Resample speech dataset: `datasets` takes care of automatically loading and resampling the audio,
1308
+ # so we just need to set the correct target sampling rate.
1309
+ raw_datasets = raw_datasets.cast_column(
1310
+ data_args.audio_column_name,
1311
+ datasets.features.Audio(sampling_rate=feature_extractor.sampling_rate),
1312
+ )
1313
+
1314
+ # 8. Preprocessing the datasets.
1315
+ # We need to read the audio files as arrays and tokenize the targets.
1316
+ max_input_length = int(data_args.max_duration_in_seconds * feature_extractor.sampling_rate)
1317
+ min_input_length = int(data_args.min_duration_in_seconds * feature_extractor.sampling_rate)
1318
+ max_label_length = (
1319
+ data_args.max_label_length if data_args.max_label_length is not None else student_model.config.max_length
1320
+ )
1321
+ audio_column_name = data_args.audio_column_name
1322
+ num_workers = data_args.preprocessing_num_workers
1323
+ dataloader_num_workers = training_args.dataloader_num_workers
1324
+ dataloader_prefetch_size = data_args.prefetch_size
1325
+ train_text_column_name = data_args.train_text_column_name
1326
+ eval_text_column_name = "text"
1327
+ model_input_name = feature_extractor.model_input_names[0]
1328
+ normalizer = BasicTextNormalizer(tokenizer.english_spelling_normalizer)
1329
+ wer_threshold = data_args.wer_threshold
1330
+ round_timestamps = data_args.round_timestamps
1331
+
1332
+ if training_args.do_train and data_args.max_train_samples is not None:
1333
+ raw_datasets["train"] = (
1334
+ raw_datasets["train"].take(data_args.max_train_samples)
1335
+ if data_args.streaming
1336
+ else raw_datasets["train"].select(range(data_args.max_train_samples))
1337
+ )
1338
+
1339
+ if training_args.do_eval and data_args.max_eval_samples is not None:
1340
+ for eval_split in all_eval_splits:
1341
+ raw_datasets[eval_split] = (
1342
+ raw_datasets[eval_split].take(data_args.max_eval_samples)
1343
+ if data_args.streaming
1344
+ else raw_datasets[eval_split].select(range(data_args.max_eval_samples))
1345
+ )
1346
+
1347
+ # 10.3: filter training data based on WER threshold -> this is KEY to good distillation performance
1348
+ def is_wer_in_range(ground_truth, whisper_transcript):
1349
+ norm_ground_truth = normalizer(ground_truth)
1350
+ if whisper_transcript is not None and whisper_transcript.upper() == whisper_transcript:
1351
+ # filter entirely upper-case transcriptions: these are erroneous generations from large-v3
1352
+ return False
1353
+ elif len(norm_ground_truth) == 0 and len(normalizer(whisper_transcript)) == 0:
1354
+ return True
1355
+ elif len(norm_ground_truth.strip()) > 0 and whisper_transcript is not None and len(normalizer(whisper_transcript).strip()) > 0:
1356
+ norm_whisper_transcript = normalizer(whisper_transcript)
1357
+ wer = 100 * metric.compute(predictions=[norm_whisper_transcript], references=[norm_ground_truth])
1358
+ return wer < wer_threshold
1359
+ else:
1360
+ # filter automatically since we cant know WER
1361
+ return False
1362
+
1363
+
1364
+ filter_by_wer_threshold = partial(
1365
+ raw_datasets["train"].filter,
1366
+ function=is_wer_in_range,
1367
+ input_columns=[eval_text_column_name, train_text_column_name],
1368
+ )
1369
+
1370
+ if wer_threshold is not None:
1371
+ raw_datasets["train"] = (
1372
+ filter_by_wer_threshold(num_proc=num_workers, desc="filtering train dataset by wer")
1373
+ if not data_args.streaming
1374
+ else filter_by_wer_threshold()
1375
+ )
1376
+
1377
+ def has_timestamp_tokens(input_str):
1378
+ """
1379
+ Identify whether the input string contains timestamp tokens, of the form <|0.00|>, by searching for
1380
+ pairs of left and right-angle brackets.
1381
+ """
1382
+ return bool(re.search("\<[^\>]*\>", input_str))
1383
+
1384
+ def round_timestamp_tokens(input_str: str, ndigits: int = 1):
1385
+ timestamps = re.findall("\<[^\>]*\>", input_str, re.DOTALL)
1386
+ for token in timestamps:
1387
+ # extract time digits from timestamp token, e.g. <|6.24|> to 6.24
1388
+ time_digit = token[2:-2]
1389
+ # round to specified number of digits, e.g. 6.24 to 6.2
1390
+ time_digit = round(float(time_digit), ndigits=ndigits)
1391
+ # replace in original string with the same precision, e.g. <|6.24|> to <|6.20|>
1392
+ input_str = input_str.replace(token, "<|{:.2f}|>".format(time_digit))
1393
+ return input_str
1394
+
1395
+ def prepare_train_dataset(batch):
1396
+ # process audio input
1397
+ sample = batch[audio_column_name]
1398
+ inputs = feature_extractor(sample["array"], sampling_rate=sample["sampling_rate"])
1399
+ batch[model_input_name] = inputs.get(model_input_name)[0]
1400
+ batch["input_length"] = len(sample["array"])
1401
+
1402
+ # process text targets
1403
+ input_str = batch[train_text_column_name]
1404
+
1405
+ # prompt & timestamp processing: for now, we only do one or the other
1406
+ if input_str.startswith("<|startoftranscript|>") or input_str.startswith("<|startofprev|>"):
1407
+ # prompted target text already has special ids added, so don't add them here
1408
+ batch["labels"] = tokenizer(input_str, add_special_tokens=False).input_ids
1409
+ return batch
1410
+
1411
+ has_timestamps = has_timestamp_tokens(input_str)
1412
+
1413
+ if has_timestamps:
1414
+ predict_timestamps = bool(np.random.binomial(1, data_args.timestamp_probability))
1415
+ if not predict_timestamps:
1416
+ # filter timestamp token ids if not part of the prediction task
1417
+ input_str = tokenizer._filter_timestamp_ids(input_str)
1418
+ elif round_timestamps:
1419
+ input_str = round_timestamp_tokens(input_str)
1420
+ else:
1421
+ predict_timestamps = False
1422
+
1423
+ tokenizer.set_prefix_tokens(language="Norwegian", task="transcribe", predict_timestamps=predict_timestamps)
1424
+ input_ids = tokenizer(input_str).input_ids
1425
+ batch["labels"] = input_ids
1426
+ return batch
1427
+
1428
+ def prepare_eval_dataset(batch):
1429
+ # process audio
1430
+ sample = batch[audio_column_name]
1431
+ inputs = feature_extractor(sample["array"], sampling_rate=sample["sampling_rate"])
1432
+ # process audio length
1433
+ batch[model_input_name] = inputs.get(model_input_name)[0]
1434
+ batch["input_length"] = len(sample["array"])
1435
+
1436
+ # process targets
1437
+ input_str = batch[eval_text_column_name]
1438
+ batch["labels"] = tokenizer(input_str).input_ids
1439
+ return batch
1440
+
1441
+ vectorized_datasets = IterableDatasetDict() if data_args.streaming else DatasetDict()
1442
+ if training_args.do_train:
1443
+ map_fn_train = partial(
1444
+ raw_datasets["train"].map, function=prepare_train_dataset, remove_columns=raw_datasets_train_features
1445
+ )
1446
+ vectorized_datasets["train"] = (
1447
+ map_fn_train(num_proc=num_workers, desc="preprocess train dataset")
1448
+ if not data_args.streaming
1449
+ else map_fn_train()
1450
+ )
1451
+ if training_args.do_eval:
1452
+ for eval_split in all_eval_splits:
1453
+ raw_datasets_eval_features = list(raw_datasets[eval_split].features.keys())
1454
+ map_fn_eval = partial(
1455
+ raw_datasets[eval_split].map, function=prepare_eval_dataset, remove_columns=raw_datasets_eval_features
1456
+ )
1457
+ vectorized_datasets[eval_split] = (
1458
+ map_fn_eval(num_proc=num_workers, desc="preprocess eval dataset")
1459
+ if not data_args.streaming
1460
+ else map_fn_eval()
1461
+ )
1462
+
1463
+ # filter training data with inputs longer than max_input_length
1464
+ def is_audio_in_length_range(length):
1465
+ return min_input_length < length < max_input_length
1466
+
1467
+ filter_by_audio_fn = partial(
1468
+ vectorized_datasets.filter, function=is_audio_in_length_range, input_columns=["input_length"]
1469
+ )
1470
+ vectorized_datasets = (
1471
+ filter_by_audio_fn(num_proc=num_workers, desc="filtering train dataset by audio length")
1472
+ if not data_args.streaming
1473
+ else filter_by_audio_fn()
1474
+ )
1475
+
1476
+ # filter training data with labels longer than max_label_length
1477
+ def is_labels_in_length_range(labels):
1478
+ return 0 < len(labels) < max_label_length
1479
+
1480
+ filter_by_labels_fn = partial(
1481
+ vectorized_datasets.filter, function=is_labels_in_length_range, input_columns=["labels"]
1482
+ )
1483
+ vectorized_datasets = (
1484
+ filter_by_labels_fn(num_proc=num_workers, desc="filtering train dataset")
1485
+ if not data_args.streaming
1486
+ else filter_by_labels_fn()
1487
+ )
1488
+
1489
+ # for large datasets it is advised to run the preprocessing on a
1490
+ # single machine first with `args.preprocessing_only` since there will mostly likely
1491
+ # be a timeout when running the script in distributed mode.
1492
+ # In a second step `args.preprocessing_only` can then be set to `False` to load the
1493
+ # cached dataset
1494
+ if data_args.preprocessing_only:
1495
+ cache = {k: v.cache_files for k, v in vectorized_datasets.items()}
1496
+ logger.info(f"Data preprocessing finished. Files cached at {cache}.")
1497
+ return
1498
+
1499
+ # 8. Load Metric
1500
+ metric = evaluate.load("wer")
1501
+ # convention is that we space all punctuation *except* apostrophes
1502
+ all_punctuation = list(string.punctuation.replace("'", ""))
1503
+ return_timestamps = data_args.return_timestamps if data_args.timestamp_probability > 0 else False
1504
+
1505
+ def compute_metrics(preds, labels):
1506
+ # replace padded labels by the padding token
1507
+ for idx in range(len(labels)):
1508
+ labels[idx][labels[idx] == -100] = tokenizer.pad_token_id
1509
+
1510
+ pred_str = tokenizer.batch_decode(preds, skip_special_tokens=True, decode_with_timestamps=return_timestamps)
1511
+ # we do not want to group tokens when computing the metrics
1512
+ label_str = tokenizer.batch_decode(labels, skip_special_tokens=True)
1513
+
1514
+ # space punctuation for orthographic WER (c.f. ESB paper https://arxiv.org/abs/2210.13352)
1515
+ spaced_pred_str = [
1516
+ pred_str[i].replace(punctuation, f" {punctuation} ")
1517
+ for punctuation in all_punctuation
1518
+ for i in range(len(pred_str))
1519
+ ]
1520
+ spaced_label_str = [
1521
+ label_str[i].replace(punctuation, f" {punctuation} ")
1522
+ for punctuation in all_punctuation
1523
+ for i in range(len(label_str))
1524
+ ]
1525
+ wer_ortho = 100 * metric.compute(predictions=spaced_pred_str, references=spaced_label_str)
1526
+
1527
+ norm_pred_str, norm_label_str = [], []
1528
+
1529
+ # Iterate through all predictions and labels
1530
+ for pred, label in zip(pred_str, label_str):
1531
+ # Normalize the prediction and label
1532
+ normalized_pred = normalizer(pred)
1533
+ normalized_label = normalizer(label)
1534
+
1535
+ # If either normalized string is empty after normalization, replace with "<|nospeech|>"
1536
+ if not normalized_pred.strip():
1537
+ normalized_pred = "<|nospeech|>"
1538
+ if not normalized_label.strip():
1539
+ normalized_label = "<|nospeech|>"
1540
+
1541
+ norm_pred_str.append(normalized_pred)
1542
+ norm_label_str.append(normalized_label)
1543
+
1544
+ # Replace original strings with "<|nocaptions|>" where necessary for consistency
1545
+ pred_str = [pred if len(pred.strip()) > 0 else "<|nospeech|>" for pred in pred_str]
1546
+ label_str = [label if len(label.strip()) > 0 else "<|nospeech|>" for label in label_str]
1547
+
1548
+ # Compute WER using all entries, including those with "<|nocaptions|>"
1549
+ wer = 100 * metric.compute(predictions=norm_pred_str, references=norm_label_str)
1550
+ return {"wer": wer, "wer_ortho": wer_ortho}, pred_str, label_str, norm_pred_str, norm_label_str
1551
+
1552
+
1553
+ # 9. Save feature extractor, tokenizer, config and generation config
1554
+ feature_extractor.save_pretrained(training_args.output_dir)
1555
+ tokenizer.save_pretrained(training_args.output_dir)
1556
+ config.save_pretrained(training_args.output_dir)
1557
+ student_model.generation_config.save_pretrained(
1558
+ training_args.output_dir
1559
+ ) # generation config stays bound to model to make it easy to jit
1560
+
1561
+ processor = WhisperProcessor.from_pretrained(training_args.output_dir)
1562
+
1563
+ data_collator = FlaxDataCollatorSpeechSeq2SeqWithPadding(
1564
+ processor=processor,
1565
+ decoder_start_token_id=student_model.config.decoder_start_token_id, # <|startoftranscript|>
1566
+ decoder_prev_token_id=tokenizer.all_special_ids[-3], # <|startofprev|>
1567
+ input_padding="longest",
1568
+ target_padding="max_length",
1569
+ max_target_length=max_label_length,
1570
+ )
1571
+
1572
+ # Initialize our training
1573
+ rng = jax.random.PRNGKey(training_args.seed)
1574
+ rng, dropout_rng = jax.random.split(rng)
1575
+
1576
+ # Store some constants
1577
+ train_batch_size = int(training_args.per_device_train_batch_size) * jax.device_count()
1578
+ gradient_accumulation_steps = int(training_args.gradient_accumulation_steps)
1579
+ per_device_eval_batch_size = int(training_args.per_device_eval_batch_size)
1580
+ eval_batch_size = per_device_eval_batch_size * jax.device_count()
1581
+
1582
+ if not data_args.streaming and training_args.max_steps < 0:
1583
+ num_epochs = int(training_args.num_train_epochs)
1584
+ steps_per_epoch = len(vectorized_datasets["train"]) // train_batch_size
1585
+ total_train_steps = steps_per_epoch * num_epochs
1586
+ elif training_args.max_steps > 0:
1587
+ logger.info("max_steps is given, it will override any value given in num_train_epochs")
1588
+ total_train_steps = int(training_args.max_steps)
1589
+ # Setting a very large number of epochs so we go as many times as necessary over the iterator.
1590
+ num_epochs = sys.maxsize
1591
+ steps_per_epoch = total_train_steps
1592
+ else:
1593
+ raise ValueError("max_steps must be specified when training with a streaming (iterable) dataset")
1594
+
1595
+ if training_args.eval_steps is None:
1596
+ logger.info(
1597
+ f"eval_steps is not set, evaluating at the end of {'each epoch' if not data_args.streaming else 'training'}"
1598
+ )
1599
+ eval_steps = steps_per_epoch
1600
+ else:
1601
+ eval_steps = training_args.eval_steps
1602
+
1603
+ # Create learning rate schedule
1604
+ linear_decay_lr_schedule_fn = create_learning_rate_fn(
1605
+ total_train_steps * gradient_accumulation_steps,
1606
+ training_args.lr_scheduler_type,
1607
+ training_args.warmup_steps * gradient_accumulation_steps,
1608
+ training_args.learning_rate,
1609
+ )
1610
+
1611
+ # We use Optax's "masking" functionality to not apply weight decay
1612
+ # to bias and LayerNorm scale parameters. decay_mask_fn returns a
1613
+ # mask boolean with the same structure as the parameters.
1614
+ # The mask is True for parameters that should be decayed.
1615
+ def decay_mask_fn(params):
1616
+ flat_params = traverse_util.flatten_dict(params)
1617
+ # find out all LayerNorm parameters
1618
+ layer_norm_candidates = [
1619
+ "layer_norm",
1620
+ "self_attn_layer_norm",
1621
+ "final_layer_norm",
1622
+ "encoder_attn_layer_norm",
1623
+ ]
1624
+ layer_norm_named_params = {
1625
+ layer[-2:]
1626
+ for layer_norm_name in layer_norm_candidates
1627
+ for layer in flat_params.keys()
1628
+ if layer_norm_name in "".join(layer).lower()
1629
+ }
1630
+ flat_mask = {path: path[-1] != "bias" and path[-2:] not in layer_norm_named_params for path in flat_params}
1631
+ return traverse_util.unflatten_dict(flat_mask)
1632
+
1633
+ # create adam optimizer
1634
+ adamw = optax.adamw(
1635
+ learning_rate=linear_decay_lr_schedule_fn,
1636
+ b1=training_args.adam_beta1,
1637
+ b2=training_args.adam_beta2,
1638
+ eps=training_args.adam_epsilon,
1639
+ weight_decay=training_args.weight_decay,
1640
+ mask=decay_mask_fn,
1641
+ )
1642
+
1643
+ if gradient_accumulation_steps > 1:
1644
+ # accumulate gradients and apply once every k steps
1645
+ adamw = optax.MultiSteps(adamw, every_k_schedule=gradient_accumulation_steps)
1646
+
1647
+ share_hidden_states = training_args.freeze_encoder and student_model.config.d_model == teacher_model.config.d_model
1648
+ encoder_layer_mapping = get_layers_to_supervise(
1649
+ student_model.config.encoder_layers, teacher_model.config.encoder_layers
1650
+ )
1651
+ decoder_layer_mapping = get_layers_to_supervise(
1652
+ student_model.config.decoder_layers, teacher_model.config.decoder_layers
1653
+ )
1654
+
1655
+ # Setup train state
1656
+ student_state = TrainState.create(
1657
+ apply_fn=student_model.decode if share_hidden_states else student_model.__call__,
1658
+ params=student_params,
1659
+ tx=adamw,
1660
+ to_dtype=to_dtype,
1661
+ dropout_rng=dropout_rng,
1662
+ max_grad_norm=training_args.max_grad_norm,
1663
+ )
1664
+
1665
+ if training_args.resume_from_checkpoint is not None:
1666
+ if os.path.isfile(os.path.join(training_args.resume_from_checkpoint, "train_state.msgpack")):
1667
+ logger.info(
1668
+ f"Checkpoint detected, resuming training at {training_args.resume_from_checkpoint}. To avoid "
1669
+ "this behavior, omit the resume_from_checkpoint argument."
1670
+ )
1671
+ with Path(os.path.join(training_args.resume_from_checkpoint, "train_state.msgpack")).open("rb") as f:
1672
+ student_state = from_bytes(student_state, f.read())
1673
+ else:
1674
+ logger.warning(
1675
+ f"Checkpoint {training_args.resume_from_checkpoint} not detected, training from scratch. Ensure "
1676
+ f"you pass the path to a folder with a valid checkpoint for your model."
1677
+ )
1678
+
1679
+ def cross_entropy_loss(logits, labels):
1680
+ vocab_size = logits.shape[-1]
1681
+ # optax onehot always returns a float32 device array, need to downcast if performing mixed precision training
1682
+ onehot_targets = to_dtype(onehot(labels, vocab_size))
1683
+ loss = optax.softmax_cross_entropy(logits, onehot_targets)
1684
+ # ignore padded tokens from loss, i.e. where labels are not set to -100
1685
+ padding = labels >= 0
1686
+ loss = loss * padding
1687
+ loss = loss.sum()
1688
+ num_labels = padding.sum()
1689
+ return loss, num_labels
1690
+
1691
+ # temperature smoothed kl-divergence
1692
+ def kl_divergence(target_distribution, log_predicted_distribution, labels, eps=1e-20):
1693
+ divergence = -target_distribution * (log_predicted_distribution - jnp.log(target_distribution + eps))
1694
+ # ignore padded tokens from divergence, i.e. where labels are not set to -100
1695
+ padding_mask = labels >= 0
1696
+ padding_mask = jnp.expand_dims(padding_mask, axis=-1)
1697
+ divergence = (divergence * padding_mask).sum()
1698
+ return to_dtype(divergence) # respect the dtype of the backprop
1699
+
1700
+ def mean_square_error_loss(student_outputs, teacher_outputs):
1701
+ mse = dtype(0.0)
1702
+
1703
+ # tie encoder embeddings
1704
+ mse += jnp.mean(
1705
+ jnp.square(teacher_outputs.encoder_hidden_states[0] - student_outputs.encoder_hidden_states[0])
1706
+ )
1707
+
1708
+ for student_layer_id, teacher_layer_id in encoder_layer_mapping.items():
1709
+ # offset the hidden-state layer ids by 1 to account for the extra embedding hidden-state
1710
+ student_hidden_state = student_outputs.encoder_hidden_states[student_layer_id + 1]
1711
+ teacher_hidden_state = teacher_outputs.encoder_hidden_states[teacher_layer_id + 1]
1712
+ mse += jnp.mean(jnp.square(teacher_hidden_state - student_hidden_state))
1713
+
1714
+ # student_attention = student_outputs.encoder_attentions[student_layer_id]
1715
+ # teacher_attention = teacher_outputs.encoder_attentions[teacher_layer_id]
1716
+ # mse += jnp.mean(jnp.square(student_attention - teacher_attention))
1717
+
1718
+ # tie decoder embeddings
1719
+ mse += jnp.mean(
1720
+ jnp.square(teacher_outputs.decoder_hidden_states[0] - student_outputs.decoder_hidden_states[0])
1721
+ )
1722
+
1723
+ for student_layer_id, teacher_layer_id in decoder_layer_mapping.items():
1724
+ # offset the hidden-state layer ids by 1 to account for the extra embedding hidden-state
1725
+ student_hidden_state = student_outputs.decoder_hidden_states[student_layer_id + 1]
1726
+ teacher_hidden_state = teacher_outputs.decoder_hidden_states[teacher_layer_id + 1]
1727
+ mse += jnp.mean(jnp.square(teacher_hidden_state - student_hidden_state))
1728
+
1729
+ # student_attention = student_outputs.decoder_attentions[student_layer_id]
1730
+ # teacher_attention = teacher_outputs.decoder_attentions[teacher_layer_id]
1731
+ # mse += jnp.mean(jnp.square(student_attention - teacher_attention))
1732
+
1733
+ # student_cross_attention = student_outputs.cross_attentions[student_layer_id]
1734
+ # teacher_cross_attention = teacher_outputs.cross_attentions[teacher_layer_id]
1735
+ # mse += jnp.mean(jnp.square(student_cross_attention - teacher_cross_attention))
1736
+
1737
+ return to_dtype(mse) # respect the dtype of the backprop
1738
+
1739
+ # Define gradient update step fn
1740
+ def train_step(
1741
+ student_state,
1742
+ teacher_params,
1743
+ batch,
1744
+ freeze_encoder,
1745
+ share_hidden_states,
1746
+ temperature=2.0,
1747
+ ):
1748
+ dropout_rng, new_dropout_rng = jax.random.split(student_state.dropout_rng)
1749
+
1750
+ def compute_loss(student_params):
1751
+ labels = batch.pop("labels")
1752
+ output_hidden_states = not share_hidden_states and training_args.mse_weight > 0.0
1753
+
1754
+ teacher_outputs = teacher_model(
1755
+ **batch,
1756
+ params=teacher_params,
1757
+ freeze_encoder=True,
1758
+ output_hidden_states=output_hidden_states,
1759
+ train=False,
1760
+ )
1761
+
1762
+ if share_hidden_states:
1763
+ # if the student and teacher share the same frozen encoder then we don't have to recompute the
1764
+ # encoder hidden-states for the student model, we can just re-use from the teacher
1765
+ encoder_hidden_states = jax.lax.stop_gradient(teacher_outputs.encoder_last_hidden_state)
1766
+ encoder_outputs = FlaxBaseModelOutput(last_hidden_state=encoder_hidden_states)
1767
+
1768
+ student_outputs = student_state.apply_fn(
1769
+ decoder_input_ids=batch["decoder_input_ids"],
1770
+ encoder_outputs=encoder_outputs,
1771
+ params=student_params,
1772
+ dropout_rng=dropout_rng,
1773
+ train=True,
1774
+ )
1775
+ else:
1776
+ # do the full forward pass for the student model (encoder + decoder)
1777
+ student_outputs = student_state.apply_fn(
1778
+ **batch,
1779
+ params=student_params,
1780
+ dropout_rng=dropout_rng,
1781
+ freeze_encoder=freeze_encoder,
1782
+ output_hidden_states=output_hidden_states,
1783
+ train=True,
1784
+ )
1785
+
1786
+ # CE (data) loss
1787
+ ce_loss, num_labels = cross_entropy_loss(student_outputs.logits, labels)
1788
+
1789
+ # rescale by temperature to ensure gradients scale correctly
1790
+ teacher_distribution = jax.nn.softmax(teacher_outputs.logits / temperature, axis=-1)
1791
+ # ensure no information flow backwards through teacher
1792
+ teacher_distribution = jax.lax.stop_gradient(teacher_distribution)
1793
+ # log softmax of student predictions for numerical stability
1794
+ student_distribution = jax.nn.log_softmax(student_outputs.logits / temperature, axis=-1)
1795
+ # KL-divergence loss (scaled by temperature)
1796
+ kl_loss = kl_divergence(teacher_distribution, student_distribution, labels) * temperature**2
1797
+
1798
+ # MSE loss between enc-dec hidden-states and attentions
1799
+ mse_loss = (
1800
+ mean_square_error_loss(student_outputs, teacher_outputs)
1801
+ if output_hidden_states
1802
+ else jnp.zeros_like(kl_loss)
1803
+ )
1804
+
1805
+ # use DistilBart formulation - only tune the MSE weight and take remaining HPs from DistilBERT
1806
+ ce_weight = 0.8 if training_args.kl_weight > 0 else 1.0
1807
+ loss = ce_weight * ce_loss + training_args.kl_weight * kl_loss + training_args.mse_weight * mse_loss
1808
+
1809
+ return loss, (
1810
+ ce_loss,
1811
+ kl_loss,
1812
+ mse_loss,
1813
+ num_labels,
1814
+ )
1815
+
1816
+ grad_fn = jax.value_and_grad(compute_loss, has_aux=True)
1817
+ (loss, (ce_loss, kl_loss, mse_loss, num_labels)), grad = grad_fn(to_dtype(student_state.params))
1818
+
1819
+ # true loss = total loss / total samples
1820
+ loss = jax.lax.psum(loss, "batch")
1821
+ num_labels = jax.lax.psum(num_labels, "batch")
1822
+ loss = jax.tree_util.tree_map(lambda x: x / num_labels, loss)
1823
+
1824
+ # true grad = total grad / total samples
1825
+ grad = jax.lax.psum(grad, "batch")
1826
+ grad = jax.tree_util.tree_map(lambda x: x / num_labels, grad)
1827
+ new_state = student_state.apply_gradients(grads=grad, dropout_rng=new_dropout_rng, to_dtype=to_dtype)
1828
+
1829
+ # CE/KL/MSE losses for logging
1830
+ ce_loss = jax.lax.psum(ce_loss, "batch")
1831
+ ce_loss = jax.tree_util.tree_map(lambda x: x / num_labels, ce_loss)
1832
+
1833
+ kl_loss = jax.lax.psum(kl_loss, "batch")
1834
+ kl_loss = jax.tree_util.tree_map(lambda x: x / num_labels, kl_loss)
1835
+
1836
+ mse_loss = jax.lax.psum(mse_loss, "batch")
1837
+ mse_loss = jax.tree_util.tree_map(lambda x: x / num_labels, mse_loss)
1838
+
1839
+ metrics = {
1840
+ "loss": loss,
1841
+ "learning_rate": linear_decay_lr_schedule_fn(student_state.step),
1842
+ "ce_loss": ce_loss,
1843
+ "kl_loss": kl_loss,
1844
+ "mse_loss": mse_loss,
1845
+ }
1846
+ return new_state, metrics
1847
+
1848
+ # Define eval fn
1849
+ def eval_step(student_params, teacher_params, batch):
1850
+ labels = batch.pop("labels")
1851
+ output_hidden_states = not share_hidden_states and training_args.mse_weight > 0
1852
+
1853
+ student_outputs = student_model(
1854
+ **batch,
1855
+ params=student_params,
1856
+ output_hidden_states=output_hidden_states,
1857
+ train=False,
1858
+ )
1859
+ student_distribution = jax.nn.log_softmax(student_outputs.logits, axis=-1)
1860
+ ce_loss, num_labels = cross_entropy_loss(student_outputs.logits, labels)
1861
+
1862
+ teacher_outputs = teacher_model(
1863
+ **batch,
1864
+ params=teacher_params,
1865
+ output_hidden_states=output_hidden_states,
1866
+ train=False,
1867
+ )
1868
+ teacher_distribution = jax.nn.softmax(teacher_outputs.logits, axis=-1)
1869
+ # temperature is always 1 for eval
1870
+ kl_loss = kl_divergence(teacher_distribution, student_distribution, labels)
1871
+
1872
+ mse_loss = (
1873
+ mean_square_error_loss(student_outputs, teacher_outputs)
1874
+ if output_hidden_states
1875
+ else jnp.zeros_like(kl_loss)
1876
+ )
1877
+
1878
+ ce_weight = 0.8 if training_args.kl_weight > 0 else 1.0
1879
+ loss = ce_weight * ce_loss + training_args.kl_weight * kl_loss + training_args.mse_weight * mse_loss
1880
+ # true loss = total loss / total samples
1881
+ loss = jax.lax.psum(loss, "batch")
1882
+ num_labels = jax.lax.psum(num_labels, "batch")
1883
+ loss = jax.tree_util.tree_map(lambda x: x / num_labels, loss)
1884
+
1885
+ # CE/KL/MSE losses for logging
1886
+ ce_loss = jax.lax.psum(ce_loss, "batch")
1887
+ ce_loss = jax.tree_util.tree_map(lambda x: x / num_labels, ce_loss)
1888
+
1889
+ kl_loss = jax.lax.psum(kl_loss, "batch")
1890
+ kl_loss = jax.tree_util.tree_map(lambda x: x / num_labels, kl_loss)
1891
+
1892
+ mse_loss = jax.lax.psum(mse_loss, "batch")
1893
+ mse_loss = jax.tree_util.tree_map(lambda x: x / num_labels, mse_loss)
1894
+
1895
+ metrics = {"loss": loss, "ce_loss": ce_loss, "kl_loss": kl_loss, "mse_loss": mse_loss}
1896
+ return metrics
1897
+
1898
+ # Define generation function
1899
+ num_beams = (
1900
+ training_args.generation_num_beams
1901
+ if training_args.generation_num_beams is not None
1902
+ else student_model.config.num_beams
1903
+ )
1904
+
1905
+ # forcing the language and task tokens helps the model in its generations
1906
+ gen_kwargs = {
1907
+ "max_length": max_label_length,
1908
+ "num_beams": num_beams,
1909
+ "language": "<|en|>",
1910
+ "task": "transcribe",
1911
+ "return_timestamps": return_timestamps,
1912
+ }
1913
+
1914
+ def generate_step(student_params, batch):
1915
+ output_ids = student_model.generate(
1916
+ batch[model_input_name],
1917
+ attention_mask=batch.get("attention_mask"),
1918
+ params=student_params,
1919
+ **gen_kwargs,
1920
+ )
1921
+ return output_ids.sequences
1922
+
1923
+ # Replicate the train state on each device
1924
+ student_state = student_state.replicate()
1925
+
1926
+ # Replicate the teacher params on each device
1927
+ teacher_params = jax_utils.replicate(teacher_params)
1928
+
1929
+ # Create parallel version of the train and eval step
1930
+ p_train_step = jax.pmap(
1931
+ train_step,
1932
+ "batch",
1933
+ in_axes=(0, 0, 0, None, None, None),
1934
+ donate_argnums=(0,),
1935
+ static_broadcasted_argnums=(
1936
+ 3,
1937
+ 4,
1938
+ ),
1939
+ )
1940
+ p_eval_step = jax.pmap(eval_step, "batch")
1941
+ p_generate_step = jax.pmap(generate_step, "batch")
1942
+
1943
+ logger.info("***** Running training *****")
1944
+ logger.info(f" Num examples = {total_train_steps * train_batch_size * gradient_accumulation_steps}")
1945
+ logger.info(" Instantaneous batch size per device =" f" {training_args.per_device_train_batch_size}")
1946
+ logger.info(" Gradient accumulation steps =" f" {gradient_accumulation_steps}")
1947
+ logger.info(
1948
+ f" Total train batch size (w. parallel & distributed) = {train_batch_size * gradient_accumulation_steps}"
1949
+ )
1950
+ logger.info(f" Total optimization steps = {total_train_steps}")
1951
+
1952
+ # ======================== Training ================================
1953
+ train_time = 0
1954
+ train_start = time.time()
1955
+ train_metrics = []
1956
+ batches_to_skip = jax.device_get(unreplicate(student_state.step))
1957
+ cur_step = int(batches_to_skip) # will be zero if starting from scratch
1958
+ epochs_trained = batches_to_skip // steps_per_epoch
1959
+ steps_trained_progress_bar = tqdm(range(total_train_steps), desc="Train steps ... ", position=0)
1960
+ steps_trained_progress_bar.update(batches_to_skip)
1961
+ continue_training = True
1962
+ minibatch_steps = 0
1963
+
1964
+ if batches_to_skip > 0:
1965
+ logger.info(" Continuing training from checkpoint, will skip to saved global_step")
1966
+ logger.info(f" Continuing training from epoch {epochs_trained}")
1967
+ logger.info(f" Continuing training from global step {batches_to_skip}")
1968
+
1969
+ # Generate a training data loader by shuffling sampling indices from the train dataset
1970
+ train_loader = get_data_loader(
1971
+ training_args.seed,
1972
+ vectorized_datasets["train"],
1973
+ batch_size=train_batch_size,
1974
+ data_collator=data_collator,
1975
+ dataloader_num_workers=dataloader_num_workers,
1976
+ skip_batches=batches_to_skip,
1977
+ prefetch_size=dataloader_prefetch_size,
1978
+ )
1979
+
1980
+ for epoch in range(epochs_trained, num_epochs):
1981
+ if hasattr(train_loader, "dataset") and isinstance(train_loader.dataset, IterableDataset):
1982
+ train_loader.dataset.set_epoch(epoch)
1983
+
1984
+ for batch in train_loader:
1985
+ minibatch_steps += 1
1986
+ update_step = minibatch_steps == gradient_accumulation_steps
1987
+
1988
+ if update_step:
1989
+ steps_trained_progress_bar.update(1)
1990
+ cur_step += 1
1991
+ minibatch_steps = 0
1992
+
1993
+ batch = shard(batch.data)
1994
+ student_state, train_metric = p_train_step(
1995
+ student_state,
1996
+ teacher_params,
1997
+ batch,
1998
+ training_args.freeze_encoder,
1999
+ share_hidden_states,
2000
+ training_args.temperature,
2001
+ )
2002
+
2003
+ if cur_step % training_args.logging_steps == 0 and update_step:
2004
+ train_metrics.append(train_metric)
2005
+ train_metric_to_write = unreplicate(train_metric)
2006
+ steps_trained_progress_bar.write(
2007
+ f"Step... ({cur_step} / {total_train_steps} | Loss:"
2008
+ f" {train_metric_to_write['loss']}, Learning Rate:"
2009
+ f" {train_metric_to_write['learning_rate']})"
2010
+ )
2011
+ if has_wandb and jax.process_index() == 0:
2012
+ write_wandb_metric(
2013
+ wandb_logger,
2014
+ train_metric_to_write,
2015
+ train_time + time.time() - train_start,
2016
+ cur_step,
2017
+ epoch,
2018
+ prefix="train",
2019
+ )
2020
+
2021
+ # save checkpoint and weights after each save_steps and at the end of training
2022
+ if (cur_step % training_args.save_steps == 0 and update_step) or cur_step == total_train_steps:
2023
+ if jax.process_index() == 0:
2024
+ save_hf_weights(
2025
+ student_state,
2026
+ student_model,
2027
+ processor,
2028
+ training_args.output_dir,
2029
+ cur_step,
2030
+ total_train_steps,
2031
+ use_scan=training_args.use_scan,
2032
+ )
2033
+ if training_args.save_train_state:
2034
+ student_state.save_state(
2035
+ training_args.output_dir, save_total_limit=training_args.save_total_limit
2036
+ )
2037
+ if training_args.push_to_hub:
2038
+ repo.push_to_hub(
2039
+ commit_message=f"Saving train state of step {cur_step}",
2040
+ blocking=False,
2041
+ )
2042
+
2043
+ if training_args.do_eval and (
2044
+ (cur_step % eval_steps == 0 and update_step) or cur_step == total_train_steps
2045
+ ):
2046
+ train_time += time.time() - train_start
2047
+ # ======================== Evaluating ==============================
2048
+ for eval_split in all_eval_splits:
2049
+ eval_metrics = []
2050
+ eval_preds = []
2051
+ eval_labels = []
2052
+ eval_start = time.time()
2053
+
2054
+ eval_loader = get_data_loader(
2055
+ training_args.seed,
2056
+ vectorized_datasets[eval_split],
2057
+ batch_size=eval_batch_size,
2058
+ data_collator=data_collator,
2059
+ shuffle=False,
2060
+ drop_last=False,
2061
+ dataloader_num_workers=dataloader_num_workers,
2062
+ )
2063
+ for batch in tqdm(eval_loader, desc=f"Evaluating {eval_split}...", position=2):
2064
+ # Model forward
2065
+ labels = batch["labels"]
2066
+
2067
+ metrics = pad_shard_unpad(
2068
+ p_eval_step,
2069
+ static_argnums=(
2070
+ 0,
2071
+ 1,
2072
+ ),
2073
+ static_return=True,
2074
+ )(
2075
+ student_state.params,
2076
+ teacher_params,
2077
+ batch.data,
2078
+ min_device_batch=per_device_eval_batch_size,
2079
+ )
2080
+ eval_metrics.append(metrics)
2081
+
2082
+ # generation
2083
+ if training_args.predict_with_generate:
2084
+ generated_ids = pad_shard_unpad(p_generate_step)(
2085
+ student_state.params, batch.data, min_device_batch=per_device_eval_batch_size
2086
+ )
2087
+ eval_preds.extend(jax.device_get(generated_ids.reshape(-1, gen_kwargs["max_length"])))
2088
+ eval_labels.extend(labels)
2089
+
2090
+ eval_time = time.time() - eval_start
2091
+
2092
+ # normalize eval metrics
2093
+ eval_metrics = get_metrics(eval_metrics)
2094
+ eval_metrics = jax.tree_util.tree_map(jnp.mean, eval_metrics)
2095
+
2096
+ # compute WER metric
2097
+ wer_desc = ""
2098
+ if training_args.predict_with_generate:
2099
+ wer_metric, pred_str, label_str, norm_pred_str, norm_label_str = compute_metrics(
2100
+ eval_preds, eval_labels
2101
+ )
2102
+ eval_metrics.update(wer_metric)
2103
+ wer_desc = " ".join([f"Eval {key}: {value} |" for key, value in wer_metric.items()])
2104
+
2105
+ # Print metrics and update progress bar
2106
+ steps_trained_progress_bar.write(
2107
+ f"Eval results for step ({cur_step} / {total_train_steps} | Eval Loss: {eval_metrics['loss']} |"
2108
+ f" {wer_desc})"
2109
+ )
2110
+
2111
+ if has_tensorboard and jax.process_index() == 0:
2112
+ write_eval_metric(
2113
+ summary_writer,
2114
+ eval_metrics,
2115
+ cur_step,
2116
+ prefix=eval_split,
2117
+ )
2118
+
2119
+ if has_wandb and jax.process_index() == 0:
2120
+ write_wandb_metric(wandb_logger, eval_metrics, eval_time, cur_step, epoch, prefix=eval_split)
2121
+ if training_args.predict_with_generate:
2122
+ write_wandb_pred(
2123
+ wandb_logger,
2124
+ pred_str,
2125
+ label_str,
2126
+ norm_pred_str,
2127
+ norm_label_str,
2128
+ cur_step,
2129
+ prefix=eval_split,
2130
+ )
2131
+
2132
+ if has_tensorboard and jax.process_index() == 0:
2133
+ # we'll only log to tensorboard every eval steps
2134
+ write_train_metric(
2135
+ summary_writer,
2136
+ train_metrics,
2137
+ train_time,
2138
+ cur_step,
2139
+ training_args.logging_steps,
2140
+ )
2141
+
2142
+ # flush the train metrics
2143
+ train_start = time.time()
2144
+ train_metrics = []
2145
+
2146
+ # break condition
2147
+ if cur_step == total_train_steps:
2148
+ continue_training = False
2149
+ break
2150
+
2151
+ if not continue_training:
2152
+ break
2153
+
2154
+
2155
+ if __name__ == "__main__":
2156
+ main()
run_distillation_debug.py ADDED
@@ -0,0 +1,2162 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python
2
+ # coding=utf-8
3
+ # Copyright 2023 The HuggingFace Inc. team. All rights reserved.
4
+ #
5
+ # Licensed under the Apache License, Version 2.0 (the "License");
6
+ # you may not use this file except in compliance with the License.
7
+ # You may obtain a copy of the License at
8
+ #
9
+ # http://www.apache.org/licenses/LICENSE-2.0
10
+ #
11
+ # Unless required by applicable law or agreed to in writing, software
12
+ # distributed under the License is distributed on an "AS IS" BASIS,
13
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14
+ # See the License for the specific language governing permissions and
15
+ # limitations under the License.
16
+ """
17
+ Training the Whisper model for sequence to sequence speech recognition via teacher-student distillation.
18
+ """
19
+ # You can also adapt this script for your own distillation tasks. Pointers for this are left as comments.
20
+
21
+ import logging
22
+ import os
23
+ import re
24
+ import shutil
25
+ import string
26
+ import sys
27
+ import time
28
+ from dataclasses import dataclass, field
29
+ from functools import partial
30
+ from pathlib import Path
31
+ from typing import Any, Callable, Dict, List, Optional, Union
32
+
33
+ import datasets
34
+ import evaluate
35
+ import flax
36
+ import jax
37
+ import jax.numpy as jnp
38
+ import numpy as np
39
+ import optax
40
+ import torch
41
+ import transformers
42
+ from datasets import (
43
+ DatasetDict,
44
+ IterableDataset,
45
+ IterableDatasetDict,
46
+ concatenate_datasets,
47
+ interleave_datasets,
48
+ load_dataset,
49
+ )
50
+ from flax import jax_utils, traverse_util
51
+ from flax.jax_utils import pad_shard_unpad, unreplicate
52
+ from flax.serialization import from_bytes, to_bytes
53
+ from flax.training import train_state
54
+ from flax.training.common_utils import get_metrics, onehot, shard, shard_prng_key
55
+ from huggingface_hub import Repository, create_repo
56
+ from jax.experimental.compilation_cache import compilation_cache as cc
57
+ from optax._src import linear_algebra
58
+ from torch.utils.data import DataLoader
59
+ from torchdata.datapipes.iter import IterableWrapper
60
+ from tqdm import tqdm
61
+ from transformers import (
62
+ AddedToken,
63
+ HfArgumentParser,
64
+ Seq2SeqTrainingArguments,
65
+ WhisperConfig,
66
+ WhisperFeatureExtractor,
67
+ WhisperProcessor,
68
+ WhisperTokenizerFast,
69
+ is_tensorboard_available,
70
+ is_wandb_available,
71
+ set_seed,
72
+ )
73
+ from transformers.file_utils import get_full_repo_name
74
+ from transformers.modeling_flax_outputs import FlaxBaseModelOutput
75
+ from transformers.models.whisper.english_normalizer import BasicTextNormalizer,EnglishTextNormalizer
76
+ from transformers.utils import check_min_version, send_example_telemetry
77
+ from transformers.utils.versions import require_version
78
+
79
+ from distil_whisper import FlaxWhisperForConditionalGeneration
80
+
81
+
82
+ # Will error if the minimal version of Transformers is not installed. Remove at your own risks.
83
+ check_min_version("4.27.0.dev0")
84
+
85
+ require_version(
86
+ "datasets>=1.18.0",
87
+ "To fix: pip install -r examples/flax/speech-recogintion/requirements.txt",
88
+ )
89
+
90
+ logger = logging.getLogger(__name__)
91
+
92
+
93
+ @flax.struct.dataclass
94
+ class ModelArguments:
95
+ """
96
+ Arguments pertaining to which model/config/tokenizer we are going to fine-tune from.
97
+ """
98
+
99
+ model_name_or_path: str = field(
100
+ metadata={"help": ("Path to pretrained student model or model identifier from huggingface.co/models")}
101
+ )
102
+ teacher_model_name_or_path: str = field(
103
+ metadata={"help": ("Path to pretrained teacher model or model identifier from huggingface.co/models")}
104
+ )
105
+ config_name: Optional[str] = field(
106
+ default=None,
107
+ metadata={"help": "Pretrained config name or path if not the same as model_name"},
108
+ )
109
+ tokenizer_name: Optional[str] = field(
110
+ default=None,
111
+ metadata={"help": "Pretrained tokenizer name or path if not the same as model_name"},
112
+ )
113
+ feature_extractor_name: Optional[str] = field(
114
+ default=None,
115
+ metadata={"help": "feature extractor name or path if not the same as model_name"},
116
+ )
117
+ cache_dir: Optional[str] = field(
118
+ default=None,
119
+ metadata={"help": ("Where to store the pretrained models downloaded from huggingface.co")},
120
+ )
121
+ use_fast_tokenizer: bool = field(
122
+ default=True,
123
+ metadata={"help": ("Whether to use one of the fast tokenizer (backed by the tokenizers library) or not.")},
124
+ )
125
+ model_revision: str = field(
126
+ default="main",
127
+ metadata={"help": ("The specific model version to use (can be a branch name, tag name or commit id).")},
128
+ )
129
+ subfolder: str = field(
130
+ default="",
131
+ metadata={
132
+ "help": "In case the relevant files are located inside a subfolder of the model repo on huggingface.co, you can"
133
+ "specify the folder name here."
134
+ },
135
+ )
136
+ use_auth_token: bool = field(
137
+ default=False,
138
+ metadata={
139
+ "help": (
140
+ "Will use the token generated when running `transformers-cli login`"
141
+ " (necessary to use this script with private models)."
142
+ )
143
+ },
144
+ )
145
+ dtype: Optional[str] = field(
146
+ default="float32",
147
+ metadata={
148
+ "help": (
149
+ "Floating-point format in which the model weights should be initialized"
150
+ " and trained. Choose one of `[float32, float16, bfloat16]`."
151
+ )
152
+ },
153
+ )
154
+ load_with_scan_weights: bool = field(
155
+ default=False,
156
+ metadata={
157
+ "help": "Whether the pre-trained checkpoint has its weights stored in scan format. Set to True for scanned "
158
+ "weights, defaults to False for non-scan (unrolled) weights."
159
+ },
160
+ )
161
+ activation_dropout: float = field(
162
+ default=0.0,
163
+ metadata={"help": "The dropout ratio for activations inside the fully connected layer."},
164
+ )
165
+ attention_dropout: float = field(
166
+ default=0.0,
167
+ metadata={"help": "The dropout ratio for the attention probabilities."},
168
+ )
169
+ dropout: float = field(
170
+ default=0.0,
171
+ metadata={
172
+ "help": "The dropout probability for all fully connected layers in the embeddings, encoder, and pooler."
173
+ },
174
+ )
175
+
176
+
177
+ @flax.struct.dataclass
178
+ class DataTrainingArguments:
179
+ """
180
+ Arguments pertaining to what data we are going to input our model for training and eval.
181
+ """
182
+
183
+ train_dataset_name: str = field(
184
+ default=None,
185
+ metadata={
186
+ "help": "The name of the training dataset to use (via the datasets library). Load and combine "
187
+ "multiple datasets by separating dataset ids by a '+' symbol. For example, to load and combine "
188
+ " librispeech and common voice, set `train_dataset_name='librispeech_asr+common_voice'`."
189
+ },
190
+ )
191
+ train_dataset_config_name: Optional[str] = field(
192
+ default=None,
193
+ metadata={
194
+ "help": "The configuration name of the training dataset to use (via the datasets library). Load and combine "
195
+ "multiple datasets by separating dataset configs by a '+' symbol."
196
+ },
197
+ )
198
+ train_dataset_samples: str = field(
199
+ default=None,
200
+ metadata={
201
+ "help": "Number of samples in the training data. Load and combine "
202
+ "multiple datasets by separating dataset samples by a '+' symbol."
203
+ },
204
+ )
205
+ eval_dataset_name: str = field(
206
+ default=None,
207
+ metadata={
208
+ "help": "The name of the evaluation dataset to use (via the datasets library). Defaults to the training dataset name if unspecified."
209
+ },
210
+ )
211
+ eval_dataset_config_name: Optional[str] = field(
212
+ default=None,
213
+ metadata={
214
+ "help": "The configuration name of the evaluation dataset to use (via the datasets library). Defaults to the training dataset config name if unspecified"
215
+ },
216
+ )
217
+ dataset_cache_dir: Optional[str] = field(
218
+ default=None,
219
+ metadata={"help": "Path to cache directory for saving and loading datasets"},
220
+ )
221
+ overwrite_cache: bool = field(
222
+ default=False,
223
+ metadata={"help": "Overwrite the cached training and evaluation sets"},
224
+ )
225
+ preprocessing_num_workers: Optional[int] = field(
226
+ default=None,
227
+ metadata={"help": "The number of processes to use for the preprocessing."},
228
+ )
229
+ max_train_samples: Optional[int] = field(
230
+ default=None,
231
+ metadata={
232
+ "help": (
233
+ "For debugging purposes or quicker training, truncate the number of"
234
+ " training examples to this value if set."
235
+ )
236
+ },
237
+ )
238
+ max_eval_samples: Optional[int] = field(
239
+ default=None,
240
+ metadata={
241
+ "help": (
242
+ "For debugging purposes or quicker training, truncate the number of"
243
+ " evaluation examples to this value if set."
244
+ )
245
+ },
246
+ )
247
+ audio_column_name: str = field(
248
+ default="audio",
249
+ metadata={"help": ("The name of the dataset column containing the audio data. Defaults to 'audio'")},
250
+ )
251
+ train_text_column_name: str = field(
252
+ default="whisper_transcript",
253
+ metadata={
254
+ "help": (
255
+ "The name of the dataset column containing the text data. Defaults to"
256
+ " 'whisper_transcript'which is the pseudo-labelled Whisper"
257
+ " transcription data."
258
+ )
259
+ },
260
+ )
261
+ eval_text_column_name: str = field(
262
+ default="text",
263
+ metadata={
264
+ "help": (
265
+ "The name of the dataset column containing the text data. Defaults to"
266
+ " 'text', which is the original text data"
267
+ )
268
+ },
269
+ )
270
+ max_duration_in_seconds: float = field(
271
+ default=30.0,
272
+ metadata={"help": ("Filter audio files that are longer than `max_duration_in_seconds` seconds")},
273
+ )
274
+ min_duration_in_seconds: float = field(
275
+ default=0.0,
276
+ metadata={"help": ("Filter audio files that are shorter than `min_duration_in_seconds` seconds")},
277
+ )
278
+ max_label_length: int = field(
279
+ default=128,
280
+ metadata={"help": "Truncate transcriptions that are longer `max_label_length` tokens."},
281
+ )
282
+ pad_target_to_multiple_of: Optional[int] = field(
283
+ default=None,
284
+ metadata={
285
+ "help": (
286
+ "If set will pad the target sequence to a multiple of the provided"
287
+ " value. This is important to avoid triggering recompilations on TPU."
288
+ " If unspecified, will default to padding the targets to max length."
289
+ )
290
+ },
291
+ )
292
+ preprocessing_only: bool = field(
293
+ default=False,
294
+ metadata={
295
+ "help": (
296
+ "Whether to only do data preprocessing and skip training. This is"
297
+ " especially useful when data preprocessing errors out in distributed"
298
+ " training due to timeout. In this case, one should run the"
299
+ " preprocessing in a non-distributed setup with"
300
+ " `preprocessing_only=True` so that the cached datasets can"
301
+ " consequently be loaded in distributed training"
302
+ )
303
+ },
304
+ )
305
+ train_split_name: str = field(
306
+ default="train",
307
+ metadata={
308
+ "help": ("The name of the training data set split to use (via the datasets library). Defaults to 'train'")
309
+ },
310
+ )
311
+ eval_split_name: str = field(
312
+ default="validation",
313
+ metadata={
314
+ "help": (
315
+ "The name of the evaluation data set split to use (via the datasets"
316
+ " library). Defaults to 'validation'"
317
+ )
318
+ },
319
+ )
320
+ wandb_project: str = field(
321
+ default="distil-whisper",
322
+ metadata={"help": "The name of the wandb project."},
323
+ )
324
+ wandb_name: str = field(
325
+ default=None,
326
+ metadata={"help": "The name of the wandb run."},
327
+ )
328
+ wandb_job_type: str = field(
329
+ default="distil-whisper",
330
+ metadata={"help": "The name of the wandb job type."},
331
+ )
332
+ wandb_dir: str = field(
333
+ default=None,
334
+ metadata={"help": "The absolute path to save the wandb logs."},
335
+ )
336
+ save_code_to_wandb: bool = field(
337
+ default=False,
338
+ metadata={
339
+ "help": (
340
+ "Whether to save main script to wandb. This is valuable for improving"
341
+ " experiment reproducibility and to diff code across experiments in"
342
+ " the UI."
343
+ )
344
+ },
345
+ )
346
+ streaming: bool = field(
347
+ default=True,
348
+ metadata={"help": "Whether to use Datasets' streaming mode to load and the data."},
349
+ )
350
+ wer_threshold: float = field(
351
+ default=None,
352
+ metadata={
353
+ "help": "Filter training data with Whisper transcriptions that have greater than `wer_threshold` "
354
+ "WER with the normalised transcriptions."
355
+ },
356
+ )
357
+ prefetch_size: int = field(
358
+ default=0,
359
+ metadata={"help": "Number of samples to pre-fetch if using an iterable dataset."},
360
+ )
361
+ timestamp_probability: float = field(
362
+ default=0.5, metadata={"help": "Probability for training on timestamped tokens if the data contains it."}
363
+ )
364
+ return_timestamps: bool = field(
365
+ default=False, metadata={"help": "Whether or not to predict timestamps in the generation step."}
366
+ )
367
+ round_timestamps: bool = field(
368
+ default=False,
369
+ metadata={
370
+ "help": "Whether or not to round the timestamp tokens to the nearest tenth of a second."
371
+ "By default, Whisper predicts timestamps to the nearest hundredth of a second."
372
+ "Reducing the timestamp precision to one tenth of a second simplifies the timestamp"
373
+ "prediction task, at the expense of timestamp granularity."
374
+ },
375
+ )
376
+
377
+
378
+ @dataclass
379
+ class FlaxSeq2SeqTrainingArguments(Seq2SeqTrainingArguments):
380
+ use_scan: Optional[bool] = field(
381
+ default=True,
382
+ metadata={
383
+ "help": (
384
+ "Whether or not to use `scan_with_axes` over the encoder and decoder blocks. Using scan results "
385
+ "in faster compile times and more efficient memory use during training, since all of the layers "
386
+ "in the encoder/decoder are stacked, and we perform a lax.scan over the stacked block to index "
387
+ "each layer. However, it results in slower inference time due to the overhead of stacking the "
388
+ "layers this way. Thus, we **always** default to disabling scan for the inference step."
389
+ )
390
+ },
391
+ )
392
+ freeze_encoder: Optional[bool] = field(
393
+ default=False,
394
+ metadata={
395
+ "help": (
396
+ "Whether to freeze the entire encoder model. Only recommended when the entire encoder has been "
397
+ "copied from the teacher model."
398
+ )
399
+ },
400
+ )
401
+ temperature: Optional[float] = field(
402
+ default=2.0, metadata={"help": "Temperature to anneal the logits when computing the softmax."}
403
+ )
404
+ kl_weight: Optional[float] = field(
405
+ default=1.0,
406
+ metadata={
407
+ "help": (
408
+ "Weighting assigned to the MSE loss in the KD formulation. MSE loss is "
409
+ "computed between the teacher-student hidden states and attentions."
410
+ )
411
+ },
412
+ )
413
+ mse_weight: Optional[float] = field(
414
+ default=0.0,
415
+ metadata={
416
+ "help": (
417
+ "Weighting assigned to the MSE loss in the KD formulation. MSE loss is "
418
+ "computed between the teacher-student hidden states and attentions."
419
+ )
420
+ },
421
+ )
422
+ precision: Optional[str] = field(
423
+ default="half_mixed",
424
+ metadata={
425
+ "help": (
426
+ "Precision with which run training, Can be one of `full`, `half_mixed` or `full_mixed`, the latter two"
427
+ "of which enable *mixed-precision* training. **Note that this only specifies the dtype of the computation "
428
+ "and optimizer state. It does not influence the dtype of model parameters.** An explanation of the three "
429
+ "settings is provided below:"
430
+ " 1. Full precision: forward pass, backward pass and optimiser states all in float32."
431
+ " 2. Half mixed precision: forward pass in bfloat16, backward pass and optimiser states in float32. This "
432
+ " corresponds to setting the dtype argument to bfloat16 when instantiating the model."
433
+ " 3. Full mixed precision: forward pass, backward pass and optimiser states all in bfloat16. The dtype "
434
+ " argument is set to bfloat16 for the forward pass, and the gradients computed with respect to the bfloat16 "
435
+ " parameters in the backward pass (giving bfloat16 gradients). The new optimiser states and parameter "
436
+ " updates are computed in float32 by upcasting the bfloat16 gradients and optimiser states to float32 "
437
+ " prior to the optimiser update step. The optimiser states are returned in float32 (but not saved to "
438
+ " memory) and then downcasted to bfloat16 (saved to memory) for the subsequent train step."
439
+ "For further details, refer to https://github.com/deepmind/optax/discussions/336"
440
+ )
441
+ },
442
+ )
443
+ compilation_cache: Optional[bool] = field(
444
+ default=False,
445
+ metadata={
446
+ "help": (
447
+ "Whether to enable the JAX (experimental) compilation cache. The compilation step is *cached* the "
448
+ "first time it is run. Successive compilation steps for the same function utilise the cache to reduce"
449
+ "the compilation time."
450
+ )
451
+ },
452
+ )
453
+ save_train_state: Optional[bool] = field(
454
+ default=False,
455
+ metadata={
456
+ "help": "Whether or not to save the Flax Train State on each `save_steps` steps. Required if you intend"
457
+ "to resume training from partial training runs. If False, only the model weights will be saved."
458
+ "If True, both the model weights and Flax Train state will be saved."
459
+ },
460
+ )
461
+
462
+
463
+ def shift_tokens_right(label_ids: np.array, decoder_start_token_id: int) -> np.ndarray:
464
+ """
465
+ Shift label ids one token to the right.
466
+ """
467
+ shifted_label_ids = np.zeros_like(label_ids)
468
+ shifted_label_ids[:, 1:] = label_ids[:, :-1]
469
+ shifted_label_ids[:, 0] = decoder_start_token_id
470
+
471
+ return shifted_label_ids
472
+
473
+
474
+ @flax.struct.dataclass
475
+ class FlaxDataCollatorSpeechSeq2SeqWithPadding:
476
+ """
477
+ Data collator that will dynamically pad the inputs received.
478
+ Args:
479
+ processor ([`Wav2Vec2Processor`])
480
+ The processor used for proccessing the data.
481
+ decoder_start_token_id (:obj: `int`)
482
+ The start-of-sequence token id of the decoder.
483
+ decoder_prev_token_id (:obj: `int`)
484
+ The start-of-prompt token id of the decoder
485
+ input_padding (:obj:`bool`, :obj:`str` or :class:`~transformers.tokenization_utils_base.PaddingStrategy`, `optional`, defaults to :obj:`True`):
486
+ Select a strategy to pad the returned input sequences (according to the model's padding side and padding index)
487
+ among:
488
+ * :obj:`True` or :obj:`'longest'`: Pad to the longest sequence in the batch (or no padding if only a single
489
+ sequence if provided).
490
+ * :obj:`'max_length'`: Pad to a maximum length specified with the argument :obj:`max_length` or to the
491
+ maximum acceptable input length for the model if that argument is not provided.
492
+ * :obj:`False` or :obj:`'do_not_pad'` (default): No padding (i.e., can output a batch with sequences of
493
+ different lengths).
494
+ target_padding (:obj:`bool`, :obj:`str` or :class:`~transformers.tokenization_utils_base.PaddingStrategy`, `optional`, defaults to :obj:`True`):
495
+ Select a strategy to pad the returned target sequences (according to the model's padding side and padding index).
496
+ See above for details.
497
+ max_target_length (:obj:`int`, `optional`):
498
+ Maximum length of the ``labels`` of the returned list and optionally padding length (see above).
499
+ """
500
+
501
+ processor: Any
502
+ decoder_start_token_id: int
503
+ decoder_prev_token_id: int
504
+ input_padding: Union[bool, str] = "max_length"
505
+ target_padding: Union[bool, str] = "max_length"
506
+ max_target_length: Optional[int] = None
507
+
508
+ def __call__(self, features: List[Dict[str, Union[List[int], np.ndarray]]]) -> Dict[str, np.ndarray]:
509
+ # split inputs and labels since they have to be of different lengths and need
510
+ # different padding methods
511
+ model_input_name = self.processor.model_input_names[0]
512
+
513
+ # dataloader returns a list of features which we convert to a dict
514
+ input_features = {model_input_name: [feature[model_input_name] for feature in features]}
515
+ label_features = {"input_ids": [feature["labels"] for feature in features]}
516
+
517
+ # reformat list to dict and set to pytorch format
518
+ batch = self.processor.feature_extractor.pad(
519
+ input_features,
520
+ padding=self.input_padding,
521
+ return_tensors="np",
522
+ )
523
+
524
+ labels_batch = self.processor.tokenizer.pad(
525
+ label_features,
526
+ max_length=self.max_target_length,
527
+ padding=self.target_padding,
528
+ return_tensors="np",
529
+ )
530
+
531
+ # if bos token is appended in previous tokenization step,
532
+ # cut bos token here as it's append later anyways
533
+ labels = labels_batch["input_ids"]
534
+ if set(np.unique(labels[:, 0])).issubset({self.decoder_start_token_id, self.decoder_prev_token_id}):
535
+ decoder_input_ids = labels[:, :-1]
536
+ labels = labels[:, 1:]
537
+ labels_batch.attention_mask = labels_batch.attention_mask[:, 1:]
538
+ else:
539
+ decoder_input_ids = shift_tokens_right(labels, self.decoder_start_token_id)
540
+
541
+ # replace padding with -100 to ignore correctly when computing the loss
542
+ labels = np.ma.array(labels, mask=np.not_equal(labels_batch.attention_mask, 1))
543
+ labels = labels.filled(fill_value=-100)
544
+
545
+ # replace initial prompt tokens with -100 to ignore correctly when computing the loss
546
+ bos_index = np.argmax(labels == self.decoder_start_token_id, axis=1)
547
+ prompt_mask = np.arange(labels.shape[1]) < bos_index[:, None]
548
+ labels = np.where(prompt_mask, -100, labels)
549
+
550
+ batch["labels"] = labels
551
+ batch["decoder_input_ids"] = decoder_input_ids
552
+
553
+ return batch
554
+
555
+
556
+ def get_data_loader(
557
+ seed: int,
558
+ dataset: IterableDataset,
559
+ batch_size: int,
560
+ data_collator: FlaxDataCollatorSpeechSeq2SeqWithPadding,
561
+ shuffle: bool = False,
562
+ drop_last: bool = True,
563
+ dataloader_num_workers: int = 0,
564
+ skip_batches: int = 0,
565
+ pin_memory: bool = True,
566
+ prefetch_size: int = 0,
567
+ ) -> DataLoader:
568
+ """
569
+ Returns batches of size `batch_size` from `dataset`. If `drop_last` is set to `False`, the final batch may be incomplete,
570
+ and range in size from 1 to `batch_size`. Shuffle batches if `shuffle` is `True`.
571
+
572
+ Args:
573
+ seed (int): Numpy seed for generating pseudo random numbers. Used if shuffling the dataset.
574
+ dataset (IterableDataset): streaming dataset from which to load the data.
575
+ batch_size (int): how many samples per batch to load.
576
+ data_collator (FlaxDataCollatorSpeechSeq2SeqWithPadding, optional): merges a list of samples to form a
577
+ mini-batch of Tensor(s). Used when using batched loading from a map-style dataset.
578
+ shuffle (bool, optional): set to `True` to have the batches reshuffled.
579
+ drop_last (bool, optional): set to ``True`` to drop the last incomplete batch,
580
+ if the dataset size is not divisible by the batch size. If ``False`` and
581
+ the size of dataset is not divisible by the batch size, then the last batch
582
+ will be smaller. (default: ``False``)
583
+ dataloader_num_workers (int, optional): how many subprocesses to use for data
584
+ loading. ``0`` means that the data will be loaded in the main process.
585
+ (default: ``0``)
586
+ skip_batches (int, optional): Efficiently skip the first `skip_batches`.
587
+ pin_memory (bool, optional): If ``True``, the data loader will copy Tensors
588
+ into device/CUDA pinned memory before returning them. If your data elements
589
+ are a custom type, or your :attr:`collate_fn` returns a batch that is a custom type,
590
+ see the example below.
591
+
592
+ """
593
+ if shuffle:
594
+ dataset = dataset.shuffle(seed)
595
+
596
+ if skip_batches > 0:
597
+ dataset = dataset.skip(skip_batches * batch_size)
598
+
599
+ if prefetch_size > 0:
600
+ dataset = IterableWrapper(dataset)
601
+ dataset = dataset.prefetch(prefetch_size)
602
+
603
+ data_loader = DataLoader(
604
+ dataset,
605
+ batch_size=batch_size,
606
+ drop_last=drop_last,
607
+ pin_memory=pin_memory,
608
+ collate_fn=data_collator,
609
+ num_workers=dataloader_num_workers,
610
+ )
611
+
612
+ return data_loader
613
+
614
+
615
+ def sorted_checkpoints(output_dir=None, checkpoint_prefix="checkpoint", use_mtime=False) -> List[str]:
616
+ ordering_and_checkpoint_path = []
617
+
618
+ glob_checkpoints = [str(x) for x in Path(output_dir).glob(f"{checkpoint_prefix}-*") if os.path.isdir(x)]
619
+
620
+ for path in glob_checkpoints:
621
+ if use_mtime:
622
+ ordering_and_checkpoint_path.append((os.path.getmtime(path), path))
623
+ else:
624
+ regex_match = re.match(f".*{checkpoint_prefix}-([0-9]+)", path)
625
+ if regex_match is not None and regex_match.groups() is not None:
626
+ ordering_and_checkpoint_path.append((int(regex_match.groups()[0]), path))
627
+
628
+ checkpoints_sorted = sorted(ordering_and_checkpoint_path)
629
+ checkpoints_sorted = [checkpoint[1] for checkpoint in checkpoints_sorted]
630
+ return checkpoints_sorted
631
+
632
+
633
+ def rotate_checkpoints(
634
+ save_total_limit=None, use_mtime=False, output_dir=None, checkpoint_prefix="checkpoint"
635
+ ) -> None:
636
+ if save_total_limit is None or save_total_limit <= 0:
637
+ return
638
+
639
+ # Check if we should delete older checkpoint(s)
640
+ checkpoints_sorted = sorted_checkpoints(
641
+ use_mtime=use_mtime, output_dir=output_dir, checkpoint_prefix=checkpoint_prefix
642
+ )
643
+ if len(checkpoints_sorted) <= save_total_limit:
644
+ return
645
+
646
+ number_of_checkpoints_to_delete = max(0, len(checkpoints_sorted) - save_total_limit)
647
+ checkpoints_to_be_deleted = checkpoints_sorted[:number_of_checkpoints_to_delete]
648
+ for checkpoint in checkpoints_to_be_deleted:
649
+ logger.info(f"Deleting older checkpoint [{checkpoint}] due to args.save_total_limit")
650
+ shutil.rmtree(checkpoint, ignore_errors=True)
651
+
652
+
653
+ def to_fp32(t):
654
+ return jax.tree_map(lambda x: x.astype(jnp.float32) if x.dtype == jnp.bfloat16 else x, t)
655
+
656
+
657
+ def to_bf16(t):
658
+ return jax.tree_map(lambda x: x.astype(jnp.bfloat16) if x.dtype == jnp.float32 else x, t)
659
+
660
+
661
+ class TrainState(train_state.TrainState):
662
+ dropout_rng: jnp.ndarray
663
+ max_grad_norm: float
664
+
665
+ def apply_gradients(self, *, grads, to_dtype: to_fp32, **kwargs):
666
+ """Updates `step`, `params`, `opt_state` and `**kwargs` in return value, clipping the
667
+ gradients by the maximum grad norm.
668
+
669
+ Note that internally this function calls `.tx.update()` followed by a call
670
+ to `optax.apply_updates()` to update `params` and `opt_state`.
671
+
672
+ Args:
673
+ grads: Gradients that have the same pytree structure as `.params`.
674
+ **kwargs: Additional dataclass attributes that should be `.replace()`-ed.
675
+
676
+ Returns:
677
+ An updated instance of `self` with `step` incremented by one, `params`
678
+ and `opt_state` updated by applying `grads`, and additional attributes
679
+ replaced as specified by `kwargs`.
680
+ """
681
+ # clip gradients by global l2 norm
682
+ casted_max_grad_norm = to_dtype(self.max_grad_norm)
683
+ g_norm = linear_algebra.global_norm(grads)
684
+ g_norm = jnp.maximum(casted_max_grad_norm, g_norm)
685
+ grads = jax.tree_map(lambda t: (t / g_norm) * casted_max_grad_norm, grads)
686
+
687
+ # perform update step in fp32 and subsequently downcast optimizer states if mixed precision training
688
+ # grads and opt_state in bf16 (need to upcast), params in fp32 (leave as is)
689
+ updates, new_opt_state = self.tx.update(to_fp32(grads), to_fp32(self.opt_state), self.params)
690
+
691
+ new_params = optax.apply_updates(self.params, updates)
692
+
693
+ return self.replace(
694
+ step=self.step + 1,
695
+ params=new_params,
696
+ opt_state=to_dtype(new_opt_state),
697
+ **kwargs,
698
+ )
699
+
700
+ @classmethod
701
+ def create(cls, *, apply_fn, params, tx, to_dtype: to_fp32, **kwargs):
702
+ """Creates a new instance with `step=0` and initialized `opt_state`."""
703
+ # downcast optimizer state to bf16 if mixed-precision training
704
+ opt_state = tx.init(to_dtype(params))
705
+ return cls(
706
+ step=0,
707
+ apply_fn=apply_fn,
708
+ params=params,
709
+ tx=tx,
710
+ opt_state=opt_state,
711
+ **kwargs,
712
+ )
713
+
714
+ def replicate(self):
715
+ return jax_utils.replicate(self).replace(dropout_rng=shard_prng_key(self.dropout_rng))
716
+
717
+ def unreplicate(self):
718
+ return jax_utils.unreplicate(self)
719
+
720
+ def save_state(self, output_dir, save_total_limit=None, checkpoint_prefix="checkpoint"):
721
+ step = int(jax.device_get(unreplicate(self.step)))
722
+ serialized_state = to_bytes(self.unreplicate())
723
+
724
+ output_file = Path(os.path.join(output_dir, f"{checkpoint_prefix}-{step}", "train_state.msgpack"))
725
+ output_file.parent.mkdir(exist_ok=True, parents=True)
726
+
727
+ with output_file.open("wb") as f:
728
+ f.write(serialized_state)
729
+
730
+ logger.info(f"Flax train state saved in {output_file}")
731
+ rotate_checkpoints(
732
+ save_total_limit=save_total_limit, output_dir=output_dir, checkpoint_prefix=checkpoint_prefix
733
+ )
734
+
735
+
736
+ def save_hf_weights(
737
+ student_state: TrainState,
738
+ student_model: FlaxWhisperForConditionalGeneration,
739
+ processor: WhisperProcessor,
740
+ output_dir: str,
741
+ cur_step: int,
742
+ total_train_steps: int,
743
+ use_scan: bool = True,
744
+ checkpoint_prefix: str = "checkpoint",
745
+ ) -> None:
746
+ # always disable scan in the params / model so that we can load from PyTorch directly - this is a no-op if we're not using scan for training
747
+ student_state_params = unreplicate(student_state.params)
748
+ student_state_params = student_model.convert_scan_to_unroll(student_state_params)
749
+ student_params = jax.device_get(student_state_params)
750
+ student_model.disable_scan()
751
+
752
+ if cur_step != total_train_steps:
753
+ output_dir = os.path.join(output_dir, f"{checkpoint_prefix}-{cur_step}")
754
+ os.makedirs(output_dir, exist_ok=True)
755
+
756
+ student_model.save_pretrained(output_dir, params=student_params)
757
+ processor.save_pretrained(output_dir)
758
+
759
+ # re-enable scan only if required for training
760
+ if use_scan:
761
+ student_model.enable_scan()
762
+
763
+
764
+ def write_train_metric(summary_writer, train_metrics, train_time, step, logging_steps):
765
+ summary_writer.scalar("train/time", train_time, step)
766
+
767
+ train_metrics = get_metrics(train_metrics)
768
+ for key, vals in train_metrics.items():
769
+ steps_arr = np.arange(0, step, logging_steps)[-len(vals) :]
770
+ tag = f"train/{key}"
771
+ for i, val in enumerate(vals):
772
+ summary_writer.scalar(tag, val, steps_arr[i])
773
+
774
+
775
+ def write_eval_metric(summary_writer, eval_metrics, step, prefix="eval"):
776
+ for metric_name, value in eval_metrics.items():
777
+ summary_writer.scalar(f"{prefix}/{metric_name}", value, step)
778
+
779
+
780
+ def write_wandb_metric(wandb_logger, metrics, train_time, step, epoch, prefix="train"):
781
+ log_metrics = {}
782
+ for k, v in metrics.items():
783
+ log_metrics[f"{prefix}/{k}"] = v
784
+ log_metrics[f"{prefix}/time"] = train_time
785
+ log_metrics[f"{prefix}/epoch"] = epoch
786
+ wandb_logger.log(log_metrics, step)
787
+
788
+
789
+ def write_wandb_pred(
790
+ wandb_logger, pred_str, label_str, norm_pred_str, norm_label_str, cur_step, prefix="eval", num_lines=200000
791
+ ):
792
+ # pretty name for current step: step 50000 -> step 50k
793
+ cur_step_pretty = f"{int(cur_step // 1000)}k" if cur_step > 1000 else cur_step
794
+ # convert str data to a wandb compatible format
795
+ str_data = [[label_str[i], pred_str[i], norm_label_str[i], norm_pred_str[i]] for i in range(len(pred_str))]
796
+ # log as a table with the appropriate headers
797
+ wandb_logger.log(
798
+ {
799
+ f"predictions/{prefix.replace('/', '-')}-step-{cur_step_pretty}": wandb_logger.Table(
800
+ columns=["Target", "Pred", "Norm Target", "Norm Pred"], data=str_data[:num_lines]
801
+ )
802
+ },
803
+ cur_step,
804
+ )
805
+ # log incorrect normalised predictions
806
+ str_data = np.asarray(str_data)
807
+ str_data_incorrect = str_data[str_data[:, -2] != str_data[:, -1]]
808
+ # log as a table with the appropriate headers
809
+ wandb_logger.log(
810
+ {
811
+ f"incorrect_predictions/{prefix.replace('/', '-')}-step-{cur_step_pretty}": wandb_logger.Table(
812
+ columns=["Target", "Pred", "Norm Target", "Norm Pred"], data=str_data_incorrect[:num_lines]
813
+ )
814
+ },
815
+ cur_step,
816
+ )
817
+
818
+
819
+ def create_learning_rate_fn(
820
+ num_train_steps: int, lr_scheduler_type: str, num_warmup_steps: int, learning_rate: float
821
+ ) -> Callable[[int], jnp.array]:
822
+ """Returns a linear warmup, linear_decay learning rate function."""
823
+ lr_scheduler_types = ("linear", "constant_with_warmup")
824
+
825
+ if lr_scheduler_type not in lr_scheduler_types:
826
+ raise ValueError(
827
+ f"lr_scheduler_type of type {lr_scheduler_type} not supported, choose from {lr_scheduler_types}."
828
+ )
829
+
830
+ warmup_fn = optax.linear_schedule(init_value=0.0, end_value=learning_rate, transition_steps=num_warmup_steps)
831
+ decay_fn = optax.linear_schedule(
832
+ init_value=learning_rate,
833
+ end_value=0 if lr_scheduler_type == "linear" else learning_rate,
834
+ transition_steps=num_train_steps - num_warmup_steps,
835
+ )
836
+ schedule_fn = optax.join_schedules(schedules=[warmup_fn, decay_fn], boundaries=[num_warmup_steps])
837
+ return schedule_fn
838
+
839
+
840
+ def convert_dataset_str_to_list(
841
+ dataset_names,
842
+ dataset_config_names,
843
+ splits=None,
844
+ text_column_names=None,
845
+ dataset_samples=None,
846
+ default_split="train",
847
+ ):
848
+ if isinstance(dataset_names, str):
849
+ dataset_names = dataset_names.split("+")
850
+
851
+ # we assume that all the datasets we're using derive from the distil-whisper org on the Hub - prepend the org name if necessary
852
+ for i in range(len(dataset_names)):
853
+ ds_name = dataset_names[i]
854
+ dataset_names[i] = f"distil-whisper/{ds_name}" if "/" not in ds_name else ds_name
855
+
856
+ dataset_config_names = dataset_config_names.split("+")
857
+ splits = splits.split("+") if splits is not None else None
858
+ text_column_names = text_column_names.split("+") if text_column_names is not None else None
859
+ dataset_samples = dataset_samples.split("+") if dataset_samples is not None else None
860
+
861
+ # basic checks to ensure we've got the right number of datasets/configs/splits/columns/probs
862
+ if len(dataset_names) != len(dataset_config_names):
863
+ raise ValueError(
864
+ f"Ensure one config is passed for each dataset, got {len(dataset_names)} datasets and"
865
+ f" {len(dataset_config_names)} configs."
866
+ )
867
+
868
+ if splits is not None and len(splits) != len(dataset_names):
869
+ raise ValueError(
870
+ f"Ensure one split is passed for each dataset, got {len(dataset_names)} datasets and {len(splits)} splits."
871
+ )
872
+
873
+ if text_column_names is not None and len(text_column_names) != len(dataset_names):
874
+ raise ValueError(
875
+ f"Ensure one text column name is passed for each dataset, got {len(dataset_names)} datasets and"
876
+ f" {len(text_column_names)} text column names."
877
+ )
878
+
879
+ if dataset_samples is not None:
880
+ if len(dataset_samples) != len(dataset_names):
881
+ raise ValueError(
882
+ f"Ensure one sample is passed for each dataset, got {len(dataset_names)} datasets and "
883
+ f"{len(dataset_samples)} samples."
884
+ )
885
+ dataset_samples = [float(ds_sample) for ds_sample in dataset_samples]
886
+ else:
887
+ dataset_samples = [None] * len(dataset_names)
888
+
889
+ text_column_names = (
890
+ text_column_names if text_column_names is not None else ["text" for _ in range(len(dataset_names))]
891
+ )
892
+ splits = splits if splits is not None else [default_split for _ in range(len(dataset_names))]
893
+
894
+ dataset_names_dict = []
895
+ for i, ds_name in enumerate(dataset_names):
896
+ dataset_names_dict.append(
897
+ {
898
+ "name": ds_name,
899
+ "config": dataset_config_names[i],
900
+ "split": splits[i],
901
+ "text_column_name": text_column_names[i],
902
+ "samples": dataset_samples[i],
903
+ }
904
+ )
905
+ return dataset_names_dict
906
+
907
+
908
+ def load_multiple_datasets(
909
+ dataset_names: Union[List, str],
910
+ dataset_config_names: Union[List, str],
911
+ splits: Optional[Union[List, str]] = None,
912
+ text_column_names: Optional[List] = None,
913
+ sampling_rate: Optional[int] = 16000,
914
+ stopping_strategy: Optional[str] = "first_exhausted",
915
+ dataset_samples: Optional[Union[List, np.array]] = None,
916
+ streaming: bool = True,
917
+ seed: int = None,
918
+ **kwargs,
919
+ ) -> IterableDataset:
920
+ dataset_names_dict = convert_dataset_str_to_list(
921
+ dataset_names, dataset_config_names, splits, text_column_names, dataset_samples
922
+ )
923
+
924
+ if dataset_samples is not None:
925
+ dataset_samples = [ds_dict["samples"] for ds_dict in dataset_names_dict]
926
+ probabilities = np.array(dataset_samples) / np.sum(dataset_samples)
927
+ else:
928
+ probabilities = None
929
+
930
+ if len(dataset_names_dict) == 1:
931
+ dataset_dict = dataset_names_dict[0]
932
+ # we have a single dataset so just return it as is
933
+ return load_dataset(
934
+ dataset_dict["name"],
935
+ dataset_dict["config"],
936
+ split=dataset_dict["split"],
937
+ streaming=streaming,
938
+ **kwargs,
939
+ )
940
+
941
+ all_datasets = []
942
+ # iterate over the datasets we want to interleave
943
+ for dataset_dict in tqdm(dataset_names_dict, desc="Combining datasets..."):
944
+ dataset = load_dataset(
945
+ dataset_dict["name"],
946
+ dataset_dict["config"],
947
+ split=dataset_dict["split"],
948
+ streaming=streaming,
949
+ **kwargs,
950
+ )
951
+ # resample to specified sampling rate
952
+ dataset = dataset.cast_column("audio", datasets.features.Audio(sampling_rate))
953
+ dataset = dataset.remove_columns(
954
+ set(dataset.features.keys()) - {"audio", dataset_dict["text_column_name"], "whisper_transcript"}
955
+ )
956
+ all_datasets.append(dataset)
957
+
958
+ if streaming:
959
+ interleaved_dataset = interleave_datasets(
960
+ all_datasets,
961
+ stopping_strategy=stopping_strategy,
962
+ probabilities=probabilities,
963
+ seed=seed,
964
+ )
965
+ else:
966
+ interleaved_dataset = concatenate_datasets(all_datasets)
967
+
968
+ return interleaved_dataset
969
+
970
+
971
+ def get_layers_to_supervise(student_layers: int, teacher_layers: int) -> dict:
972
+ """Helper function to map the student layer i to the teacher layer j whose output we'd like them to emulate. Used
973
+ for MSE loss terms in distillation (hidden-states and activations). Student layers are paired with teacher layers
974
+ in equal increments, e.g. for a 12-layer model distilled to a 3-layer model, student layer 0 emulates teacher layer
975
+ 3 (such that it behaves like the first 4 teacher layers), student layer 1 emulates teacher layer 7, and student layer
976
+ 2 emulates teacher layer 11. This mapping is summarised by the dictionary: {0: 3, 1: 7, 2: 11}, which is precisely
977
+ the output of this function for the arguments (student_layers=3, teacher_layers=12)."""
978
+ layer_intervals = np.linspace(teacher_layers // student_layers - 1, teacher_layers - 1, student_layers, dtype=int)
979
+ layer_intervals[-1] = teacher_layers - 1
980
+ layer_map = {}
981
+
982
+ for student_layer, teacher_layer in enumerate(layer_intervals):
983
+ layer_map[student_layer] = teacher_layer
984
+
985
+ return layer_map
986
+
987
+
988
+ class FlaxWhisperFeatureExtractor(WhisperFeatureExtractor):
989
+ def _np_extract_fbank_features(self, waveform: np.array) -> np.ndarray:
990
+ """
991
+ Compute the log-mel spectrogram of the provided audio using torch filters. Using the torch implementation
992
+ computes stft filter banks approx 5x faster than its numpy counterpart, which is the native implementation
993
+ in transformers, and matches to within 1e-5 abs tolerance.
994
+ """
995
+ waveform = torch.from_numpy(waveform).type(torch.float32)
996
+
997
+ window = torch.hann_window(self.n_fft)
998
+ stft = torch.stft(waveform, self.n_fft, self.hop_length, window=window, return_complex=True)
999
+ magnitudes = stft[..., :-1].abs() ** 2
1000
+
1001
+ mel_filters = torch.from_numpy(self.mel_filters).type(torch.float32)
1002
+ mel_spec = mel_filters.T @ magnitudes
1003
+
1004
+ log_spec = torch.clamp(mel_spec, min=1e-10).log10()
1005
+ log_spec = torch.maximum(log_spec, log_spec.max() - 8.0)
1006
+ log_spec = (log_spec + 4.0) / 4.0
1007
+ return log_spec.numpy()
1008
+
1009
+
1010
+ def main():
1011
+ # 1. Parse input arguments
1012
+ # See all possible arguments in src/transformers/training_args.py
1013
+ # or by passing the --help flag to this script.
1014
+ # We now keep distinct sets of args, for a cleaner separation of concerns.
1015
+ parser = HfArgumentParser((ModelArguments, DataTrainingArguments, FlaxSeq2SeqTrainingArguments))
1016
+
1017
+ if len(sys.argv) == 2 and sys.argv[1].endswith(".json"):
1018
+ # If we pass only one argument to the script and it's the path to a json file,
1019
+ # let's parse it to get our arguments.
1020
+ model_args, data_args, training_args = parser.parse_json_file(json_file=os.path.abspath(sys.argv[1]))
1021
+ else:
1022
+ model_args, data_args, training_args = parser.parse_args_into_dataclasses()
1023
+
1024
+ # Sending telemetry. Tracking the example usage helps us better allocate resources to maintain them. The
1025
+ # information sent is the one passed as arguments along with your JAX/Flax versions.
1026
+ send_example_telemetry("run_flax_speech_recognition_seq2seq", model_args, data_args, framework="flax")
1027
+
1028
+ # 2. Define remote logging - do this early so that we get the full traceback on our remote logs
1029
+ # Enable tensorboard only on the master node
1030
+ has_tensorboard = is_tensorboard_available()
1031
+ if has_tensorboard:
1032
+ if jax.process_index() == 0:
1033
+ try:
1034
+ from flax.metrics.tensorboard import SummaryWriter
1035
+
1036
+ summary_writer = SummaryWriter(log_dir=os.path.join(Path(training_args.output_dir), "runs"))
1037
+ except ImportError as ie:
1038
+ has_tensorboard = False
1039
+ logger.warning(
1040
+ "Unable to display metrics through TensorBoard because some package" f" are not installed: {ie}"
1041
+ )
1042
+ else:
1043
+ logger.warning(
1044
+ "Unable to display metrics through TensorBoard because the package is not"
1045
+ " installed: Please run `pip install tensorboard` to enable."
1046
+ )
1047
+
1048
+ # Enable wandb only on the master node
1049
+ has_wandb = is_wandb_available()
1050
+ if has_wandb:
1051
+ import wandb as wandb_logger
1052
+
1053
+ # Set up wandb run
1054
+ if jax.process_index() == 0:
1055
+ wandb_logger.init(
1056
+ project=data_args.wandb_project,
1057
+ name=data_args.wandb_name,
1058
+ job_type=data_args.wandb_job_type,
1059
+ dir=data_args.wandb_dir,
1060
+ save_code=data_args.save_code_to_wandb,
1061
+ )
1062
+ else:
1063
+ logger.warning("Wandb logging requires wandb to be installed. Run `pip install wandb` to enable.")
1064
+
1065
+ # 3. Setup local logging
1066
+ # Make one log on every process with the configuration for debugging.
1067
+ logging.basicConfig(
1068
+ format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
1069
+ datefmt="%m/%d/%Y %H:%M:%S",
1070
+ handlers=[logging.StreamHandler(sys.stdout)],
1071
+ )
1072
+ # Set the verbosity to info of the Transformers logger.
1073
+ # We only want one process per machine to log things on the screen.
1074
+ logger.setLevel(logging.INFO if jax.process_index() == 0 else logging.ERROR)
1075
+ if jax.process_index() == 0:
1076
+ datasets.utils.logging.set_verbosity_warning()
1077
+ transformers.utils.logging.set_verbosity_info()
1078
+ else:
1079
+ datasets.utils.logging.set_verbosity_error()
1080
+ transformers.utils.logging.set_verbosity_error()
1081
+
1082
+ logger.info("Training/evaluation parameters %s", training_args)
1083
+
1084
+ # Check the output dir is valid
1085
+ if (
1086
+ os.path.exists(training_args.output_dir)
1087
+ and os.listdir(training_args.output_dir)
1088
+ and training_args.do_train
1089
+ and not training_args.overwrite_output_dir
1090
+ ):
1091
+ raise ValueError(
1092
+ f"Output directory ({training_args.output_dir}) already exists and is not"
1093
+ " empty. Use `--overwrite_output_dir` to overcome."
1094
+ )
1095
+
1096
+ # 4. Handle the repository creation
1097
+ if training_args.push_to_hub:
1098
+ if training_args.hub_model_id is None:
1099
+ repo_name = get_full_repo_name(
1100
+ Path(training_args.output_dir).absolute().name,
1101
+ token=training_args.hub_token,
1102
+ )
1103
+ else:
1104
+ repo_name = training_args.hub_model_id
1105
+ create_repo(repo_name, exist_ok=True, token=training_args.hub_token)
1106
+ repo = Repository(
1107
+ training_args.output_dir,
1108
+ clone_from=repo_name,
1109
+ token=training_args.hub_token,
1110
+ )
1111
+
1112
+ if training_args.compilation_cache:
1113
+ cc.initialize_cache(os.path.join(model_args.cache_dir, "jax_cache"))
1114
+
1115
+ # 5. Load dataset
1116
+ raw_datasets = IterableDatasetDict() if data_args.streaming else DatasetDict()
1117
+
1118
+ # set seed for determinism
1119
+ set_seed(training_args.seed)
1120
+
1121
+ if training_args.do_train:
1122
+ print("loading raw")
1123
+ raw_datasets["train"] = load_multiple_datasets(
1124
+ data_args.train_dataset_name,
1125
+ data_args.train_dataset_config_name,
1126
+ splits=data_args.train_split_name,
1127
+ streaming=data_args.streaming,
1128
+ dataset_samples=data_args.train_dataset_samples,
1129
+ seed=training_args.seed,
1130
+ cache_dir=data_args.dataset_cache_dir,
1131
+ token=True if model_args.use_auth_token else None,
1132
+ )
1133
+
1134
+ if training_args.do_eval:
1135
+ dataset_names_dict = convert_dataset_str_to_list(
1136
+ data_args.eval_dataset_name if data_args.eval_dataset_name else data_args.train_dataset_name,
1137
+ (
1138
+ data_args.eval_dataset_config_name
1139
+ if data_args.eval_dataset_config_name
1140
+ else data_args.train_dataset_config_name
1141
+ ),
1142
+ splits=data_args.eval_split_name,
1143
+ text_column_names=data_args.eval_text_column_name,
1144
+ )
1145
+ all_eval_splits = []
1146
+ if len(dataset_names_dict) == 1:
1147
+ # load a single eval set
1148
+ dataset_dict = dataset_names_dict[0]
1149
+ all_eval_splits.append("eval")
1150
+ raw_datasets["eval"] = load_dataset(
1151
+ dataset_dict["name"],
1152
+ dataset_dict["config"],
1153
+ split=dataset_dict["split"],
1154
+ cache_dir=data_args.dataset_cache_dir,
1155
+ token=True if model_args.use_auth_token else None,
1156
+ streaming=data_args.streaming,
1157
+ )
1158
+ else:
1159
+ # load multiple eval sets
1160
+ for dataset_dict in dataset_names_dict:
1161
+ if dataset_dict["name"] == "esb/diagnostic-dataset":
1162
+ # for the ESB diagnostic dataset, the dataset name is effectively the config
1163
+ pretty_name = f"{dataset_dict['config']}-diagnostic/{dataset_dict['split']}"
1164
+ else:
1165
+ pretty_name = f"{dataset_dict['name'].split('/')[-1]}/{dataset_dict['split'].replace('.', '-')}"
1166
+ all_eval_splits.append(pretty_name)
1167
+ raw_datasets[pretty_name] = load_dataset(
1168
+ dataset_dict["name"],
1169
+ dataset_dict["config"],
1170
+ split=dataset_dict["split"],
1171
+ cache_dir=data_args.dataset_cache_dir,
1172
+ token=True if model_args.use_auth_token else None,
1173
+ streaming=data_args.streaming,
1174
+ )
1175
+ features = raw_datasets[pretty_name].features.keys()
1176
+ if "text" not in features:
1177
+ raw_datasets[pretty_name] = raw_datasets[pretty_name].rename_column(
1178
+ dataset_dict["text_column_name"], "text"
1179
+ )
1180
+ raw_datasets[pretty_name] = raw_datasets[pretty_name].remove_columns(
1181
+ set(raw_datasets[pretty_name].features.keys()) - {"audio", "text"}
1182
+ )
1183
+
1184
+ if not training_args.do_train and not training_args.do_eval:
1185
+ raise ValueError(
1186
+ "Cannot not train and not do evaluation. At least one of training or evaluation has to be performed."
1187
+ )
1188
+
1189
+ raw_datasets_train_features = list(raw_datasets["train"].features.keys())
1190
+ print("debug 1")
1191
+
1192
+ if data_args.audio_column_name not in raw_datasets_train_features:
1193
+ raise ValueError(
1194
+ f"--audio_column_name '{data_args.audio_column_name}' not found in dataset"
1195
+ f" '{data_args.dataset_name}'. Make sure to set `--audio_column_name` to"
1196
+ " the correct audio column - one of"
1197
+ f" {', '.join(raw_datasets_train_features)}."
1198
+ )
1199
+
1200
+ if data_args.train_text_column_name not in raw_datasets_train_features:
1201
+ raise ValueError(
1202
+ f"--train_text_column_name {data_args.train_text_column_name} not found in dataset"
1203
+ f" '{data_args.dataset_name}'. Make sure to set `--train_text_column_name` to the"
1204
+ " correct text column - one of"
1205
+ f" {', '.join(raw_datasets_train_features)}."
1206
+ )
1207
+
1208
+ # 6. Load pretrained model, tokenizer, and feature extractor
1209
+ config = WhisperConfig.from_pretrained(
1210
+ (model_args.config_name if model_args.config_name else model_args.model_name_or_path),
1211
+ cache_dir=model_args.cache_dir,
1212
+ revision=model_args.model_revision,
1213
+ token=True if model_args.use_auth_token else None,
1214
+ )
1215
+ feature_extractor = FlaxWhisperFeatureExtractor.from_pretrained(
1216
+ (model_args.feature_extractor_name if model_args.feature_extractor_name else model_args.model_name_or_path),
1217
+ cache_dir=model_args.cache_dir,
1218
+ revision=model_args.model_revision,
1219
+ token=True if model_args.use_auth_token else None,
1220
+ )
1221
+ tokenizer = WhisperTokenizerFast.from_pretrained(
1222
+ (model_args.tokenizer_name if model_args.tokenizer_name else model_args.model_name_or_path),
1223
+ cache_dir=model_args.cache_dir,
1224
+ use_fast=model_args.use_fast_tokenizer,
1225
+ revision=model_args.model_revision,
1226
+ token=True if model_args.use_auth_token else None,
1227
+ )
1228
+ print("debug2")
1229
+ # override timestamp tokens until tokenizer issues are fixed in transformers
1230
+ timestamps = [AddedToken("<|%.2f|>" % (i * 0.02), lstrip=False, rstrip=False) for i in range(1500 + 1)]
1231
+ tokenizer.add_tokens(timestamps)
1232
+
1233
+ config.update(
1234
+ {
1235
+ "activation_dropout": model_args.activation_dropout,
1236
+ "attention_dropout": model_args.attention_dropout,
1237
+ "dropout": model_args.dropout,
1238
+ }
1239
+ )
1240
+
1241
+ if training_args.precision == "full_mixed":
1242
+ # forward pass, backward pass and optimiser states in bf16
1243
+ dtype = jnp.bfloat16
1244
+ to_dtype = to_bf16
1245
+ elif training_args.precision == "half_mixed" or model_args.dtype == "bfloat16":
1246
+ # forward pass in bf16, backward pass and optimiser states in fp32
1247
+ dtype = jnp.bfloat16
1248
+ to_dtype = to_fp32
1249
+ else:
1250
+ if training_args.precision != "full":
1251
+ raise ValueError(
1252
+ f"`precision` should be one of: `full`, `half_mixed` or `full_mixed`, got {training_args.precision}"
1253
+ )
1254
+ # forward pass, backward pass and optimiser states in fp32
1255
+ dtype = jnp.float32
1256
+ to_dtype = to_fp32
1257
+
1258
+ student_model, student_params = FlaxWhisperForConditionalGeneration.from_pretrained(
1259
+ model_args.model_name_or_path,
1260
+ config=config,
1261
+ dtype=dtype,
1262
+ cache_dir=model_args.cache_dir,
1263
+ revision=model_args.model_revision,
1264
+ subfolder=model_args.subfolder,
1265
+ token=True if model_args.use_auth_token else None,
1266
+ _do_init=False,
1267
+ use_scan=model_args.load_with_scan_weights,
1268
+ )
1269
+
1270
+ teacher_model, teacher_params = FlaxWhisperForConditionalGeneration.from_pretrained(
1271
+ model_args.teacher_model_name_or_path,
1272
+ # config=config,
1273
+ dtype=dtype,
1274
+ cache_dir=model_args.cache_dir,
1275
+ # revision=model_args.model_revision,
1276
+ token=True if model_args.use_auth_token else None,
1277
+ _do_init=False,
1278
+ )
1279
+ print("debug 3")
1280
+ if student_model.config.decoder_start_token_id is None or teacher_model.config.decoder_start_token_id is None:
1281
+ raise ValueError(
1282
+ f"Make sure that `config.decoder_start_token_id` is correctly defined for both the "
1283
+ f"student and teacher model. Got {student_model.config.decoder_start_token_id} for the "
1284
+ f"student and {teacher_model.config.decoder_start_token_id} for the teacher."
1285
+ )
1286
+
1287
+ # enable scan / gradient checkpointing if necessary
1288
+ if training_args.use_scan:
1289
+ student_model.enable_scan() # to enable scan in the nn.Module
1290
+ student_params = student_model.convert_unroll_to_scan(student_params) # to convert the unrolled params to scan
1291
+
1292
+ teacher_model.enable_scan() # faster compile time (even though we don't train the teacher)
1293
+ teacher_params = teacher_model.convert_unroll_to_scan(teacher_params)
1294
+
1295
+ if training_args.gradient_checkpointing:
1296
+ student_model.enable_gradient_checkpointing() # to enable checkpointing in the nn.Module, there is no change to the params structure
1297
+ teacher_model.enable_gradient_checkpointing()
1298
+ print("debug 4")
1299
+ if hasattr(teacher_model.generation_config, "is_multilingual") and teacher_model.generation_config.is_multilingual:
1300
+ # We need to set the language and task ids for previously multilingual checkpoints - for now we hardcode this to Norwegian
1301
+ tokenizer.set_prefix_tokens(language="Norwegian", task="transcribe", predict_timestamps=False)
1302
+ student_model.generation_config.update(
1303
+ **{
1304
+ "language": "<|no|>",
1305
+ "task": "transcribe",
1306
+ }
1307
+ )
1308
+ print("debug 5")
1309
+ # 7. Resample speech dataset: `datasets` takes care of automatically loading and resampling the audio,
1310
+ # so we just need to set the correct target sampling rate.
1311
+ raw_datasets = raw_datasets.cast_column(
1312
+ data_args.audio_column_name,
1313
+ datasets.features.Audio(sampling_rate=feature_extractor.sampling_rate),
1314
+ )
1315
+
1316
+ # 8. Preprocessing the datasets.
1317
+ # We need to read the audio files as arrays and tokenize the targets.
1318
+ max_input_length = int(data_args.max_duration_in_seconds * feature_extractor.sampling_rate)
1319
+ min_input_length = int(data_args.min_duration_in_seconds * feature_extractor.sampling_rate)
1320
+ max_label_length = (
1321
+ data_args.max_label_length if data_args.max_label_length is not None else student_model.config.max_length
1322
+ )
1323
+ audio_column_name = data_args.audio_column_name
1324
+ num_workers = data_args.preprocessing_num_workers
1325
+ dataloader_num_workers = training_args.dataloader_num_workers
1326
+ dataloader_prefetch_size = data_args.prefetch_size
1327
+ train_text_column_name = data_args.train_text_column_name
1328
+ eval_text_column_name = "text"
1329
+ model_input_name = feature_extractor.model_input_names[0]
1330
+ normalizer = BasicTextNormalizer(tokenizer.english_spelling_normalizer)
1331
+ wer_threshold = data_args.wer_threshold
1332
+ round_timestamps = data_args.round_timestamps
1333
+ print("debug 6")
1334
+ if training_args.do_train and data_args.max_train_samples is not None:
1335
+ raw_datasets["train"] = (
1336
+ raw_datasets["train"].take(data_args.max_train_samples)
1337
+ if data_args.streaming
1338
+ else raw_datasets["train"].select(range(data_args.max_train_samples))
1339
+ )
1340
+
1341
+ if training_args.do_eval and data_args.max_eval_samples is not None:
1342
+ for eval_split in all_eval_splits:
1343
+ raw_datasets[eval_split] = (
1344
+ raw_datasets[eval_split].take(data_args.max_eval_samples)
1345
+ if data_args.streaming
1346
+ else raw_datasets[eval_split].select(range(data_args.max_eval_samples))
1347
+ )
1348
+ print("debug 7")
1349
+ # 10.3: filter training data based on WER threshold -> this is KEY to good distillation performance
1350
+ def is_wer_in_range(ground_truth, whisper_transcript):
1351
+ norm_ground_truth = normalizer(ground_truth)
1352
+ if whisper_transcript is not None and whisper_transcript.upper() == whisper_transcript:
1353
+ # filter entirely upper-case transcriptions: these are erroneous generations from large-v3
1354
+ return False
1355
+ elif len(norm_ground_truth) == 0 and len(normalizer(whisper_transcript)) == 0:
1356
+ return True
1357
+ elif len(norm_ground_truth.strip()) > 0 and whisper_transcript is not None and len(normalizer(whisper_transcript).strip()) > 0:
1358
+ norm_whisper_transcript = normalizer(whisper_transcript)
1359
+ wer = 100 * metric.compute(predictions=[norm_whisper_transcript], references=[norm_ground_truth])
1360
+ return wer < wer_threshold
1361
+ else:
1362
+ # filter automatically since we cant know WER
1363
+ return False
1364
+
1365
+
1366
+ filter_by_wer_threshold = partial(
1367
+ raw_datasets["train"].filter,
1368
+ function=is_wer_in_range,
1369
+ input_columns=[eval_text_column_name, train_text_column_name],
1370
+ )
1371
+
1372
+ if wer_threshold is not None:
1373
+ raw_datasets["train"] = (
1374
+ filter_by_wer_threshold(num_proc=num_workers, desc="filtering train dataset by wer")
1375
+ if not data_args.streaming
1376
+ else filter_by_wer_threshold()
1377
+ )
1378
+
1379
+ def has_timestamp_tokens(input_str):
1380
+ """
1381
+ Identify whether the input string contains timestamp tokens, of the form <|0.00|>, by searching for
1382
+ pairs of left and right-angle brackets.
1383
+ """
1384
+ return bool(re.search("\<[^\>]*\>", input_str))
1385
+
1386
+ def round_timestamp_tokens(input_str: str, ndigits: int = 1):
1387
+ timestamps = re.findall("\<[^\>]*\>", input_str, re.DOTALL)
1388
+ for token in timestamps:
1389
+ # extract time digits from timestamp token, e.g. <|6.24|> to 6.24
1390
+ time_digit = token[2:-2]
1391
+ # round to specified number of digits, e.g. 6.24 to 6.2
1392
+ time_digit = round(float(time_digit), ndigits=ndigits)
1393
+ # replace in original string with the same precision, e.g. <|6.24|> to <|6.20|>
1394
+ input_str = input_str.replace(token, "<|{:.2f}|>".format(time_digit))
1395
+ return input_str
1396
+
1397
+ def prepare_train_dataset(batch):
1398
+ # process audio input
1399
+ sample = batch[audio_column_name]
1400
+ inputs = feature_extractor(sample["array"], sampling_rate=sample["sampling_rate"])
1401
+ batch[model_input_name] = inputs.get(model_input_name)[0]
1402
+ batch["input_length"] = len(sample["array"])
1403
+
1404
+ # process text targets
1405
+ input_str = batch[train_text_column_name]
1406
+
1407
+ # prompt & timestamp processing: for now, we only do one or the other
1408
+ if input_str.startswith("<|startoftranscript|>") or input_str.startswith("<|startofprev|>"):
1409
+ # prompted target text already has special ids added, so don't add them here
1410
+ batch["labels"] = tokenizer(input_str, add_special_tokens=False).input_ids
1411
+ return batch
1412
+
1413
+ has_timestamps = has_timestamp_tokens(input_str)
1414
+
1415
+ if has_timestamps:
1416
+ predict_timestamps = bool(np.random.binomial(1, data_args.timestamp_probability))
1417
+ if not predict_timestamps:
1418
+ # filter timestamp token ids if not part of the prediction task
1419
+ input_str = tokenizer._filter_timestamp_ids(input_str)
1420
+ elif round_timestamps:
1421
+ input_str = round_timestamp_tokens(input_str)
1422
+ else:
1423
+ predict_timestamps = False
1424
+
1425
+ tokenizer.set_prefix_tokens(language="Norwegian", task="transcribe", predict_timestamps=predict_timestamps)
1426
+ input_ids = tokenizer(input_str).input_ids
1427
+ batch["labels"] = input_ids
1428
+ return batch
1429
+
1430
+ def prepare_eval_dataset(batch):
1431
+ # process audio
1432
+ sample = batch[audio_column_name]
1433
+ inputs = feature_extractor(sample["array"], sampling_rate=sample["sampling_rate"])
1434
+ # process audio length
1435
+ batch[model_input_name] = inputs.get(model_input_name)[0]
1436
+ batch["input_length"] = len(sample["array"])
1437
+
1438
+ # process targets
1439
+ input_str = batch[eval_text_column_name]
1440
+ batch["labels"] = tokenizer(input_str).input_ids
1441
+ return batch
1442
+
1443
+ vectorized_datasets = IterableDatasetDict() if data_args.streaming else DatasetDict()
1444
+ if training_args.do_train:
1445
+ map_fn_train = partial(
1446
+ raw_datasets["train"].map, function=prepare_train_dataset, remove_columns=raw_datasets_train_features
1447
+ )
1448
+ vectorized_datasets["train"] = (
1449
+ map_fn_train(num_proc=num_workers, desc="preprocess train dataset")
1450
+ if not data_args.streaming
1451
+ else map_fn_train()
1452
+ )
1453
+ if training_args.do_eval:
1454
+ for eval_split in all_eval_splits:
1455
+ raw_datasets_eval_features = list(raw_datasets[eval_split].features.keys())
1456
+ map_fn_eval = partial(
1457
+ raw_datasets[eval_split].map, function=prepare_eval_dataset, remove_columns=raw_datasets_eval_features
1458
+ )
1459
+ vectorized_datasets[eval_split] = (
1460
+ map_fn_eval(num_proc=num_workers, desc="preprocess eval dataset")
1461
+ if not data_args.streaming
1462
+ else map_fn_eval()
1463
+ )
1464
+
1465
+ # filter training data with inputs longer than max_input_length
1466
+ def is_audio_in_length_range(length):
1467
+ return min_input_length < length < max_input_length
1468
+
1469
+ filter_by_audio_fn = partial(
1470
+ vectorized_datasets.filter, function=is_audio_in_length_range, input_columns=["input_length"]
1471
+ )
1472
+ vectorized_datasets = (
1473
+ filter_by_audio_fn(num_proc=num_workers, desc="filtering train dataset by audio length")
1474
+ if not data_args.streaming
1475
+ else filter_by_audio_fn()
1476
+ )
1477
+
1478
+ # filter training data with labels longer than max_label_length
1479
+ def is_labels_in_length_range(labels):
1480
+ return 0 < len(labels) < max_label_length
1481
+
1482
+ filter_by_labels_fn = partial(
1483
+ vectorized_datasets.filter, function=is_labels_in_length_range, input_columns=["labels"]
1484
+ )
1485
+ vectorized_datasets = (
1486
+ filter_by_labels_fn(num_proc=num_workers, desc="filtering train dataset")
1487
+ if not data_args.streaming
1488
+ else filter_by_labels_fn()
1489
+ )
1490
+
1491
+ # for large datasets it is advised to run the preprocessing on a
1492
+ # single machine first with `args.preprocessing_only` since there will mostly likely
1493
+ # be a timeout when running the script in distributed mode.
1494
+ # In a second step `args.preprocessing_only` can then be set to `False` to load the
1495
+ # cached dataset
1496
+ if data_args.preprocessing_only:
1497
+ cache = {k: v.cache_files for k, v in vectorized_datasets.items()}
1498
+ logger.info(f"Data preprocessing finished. Files cached at {cache}.")
1499
+ return
1500
+
1501
+ # 8. Load Metric
1502
+ metric = evaluate.load("wer")
1503
+ # convention is that we space all punctuation *except* apostrophes
1504
+ all_punctuation = list(string.punctuation.replace("'", ""))
1505
+ return_timestamps = data_args.return_timestamps if data_args.timestamp_probability > 0 else False
1506
+
1507
+ def compute_metrics(preds, labels):
1508
+ # replace padded labels by the padding token
1509
+ for idx in range(len(labels)):
1510
+ labels[idx][labels[idx] == -100] = tokenizer.pad_token_id
1511
+
1512
+ pred_str = tokenizer.batch_decode(preds, skip_special_tokens=True, decode_with_timestamps=return_timestamps)
1513
+ # we do not want to group tokens when computing the metrics
1514
+ label_str = tokenizer.batch_decode(labels, skip_special_tokens=True)
1515
+
1516
+ # space punctuation for orthographic WER (c.f. ESB paper https://arxiv.org/abs/2210.13352)
1517
+ spaced_pred_str = [
1518
+ pred_str[i].replace(punctuation, f" {punctuation} ")
1519
+ for punctuation in all_punctuation
1520
+ for i in range(len(pred_str))
1521
+ ]
1522
+ spaced_label_str = [
1523
+ label_str[i].replace(punctuation, f" {punctuation} ")
1524
+ for punctuation in all_punctuation
1525
+ for i in range(len(label_str))
1526
+ ]
1527
+ wer_ortho = 100 * metric.compute(predictions=spaced_pred_str, references=spaced_label_str)
1528
+
1529
+ # Iterate through all predictions and labels
1530
+ for pred, label in zip(pred_str, label_str):
1531
+ # Normalize the prediction and label
1532
+ normalized_pred = normalizer(pred)
1533
+ normalized_label = normalizer(label)
1534
+
1535
+ # If either normalized string is empty after normalization, replace with "<|nospeech|>"
1536
+ if not normalized_pred.strip():
1537
+ normalized_pred = "<|nospeech|>"
1538
+ if not normalized_label.strip():
1539
+ normalized_label = "<|nospeech|>"
1540
+
1541
+ norm_pred_str.append(normalized_pred)
1542
+ norm_label_str.append(normalized_label)
1543
+
1544
+ # Replace original strings with "<|nocaptions|>" where necessary for consistency
1545
+ pred_str = [pred if len(pred.strip()) > 0 else "<|nospeech|>" for pred in pred_str]
1546
+ label_str = [label if len(label.strip()) > 0 else "<|nospeech|>" for label in label_str]
1547
+
1548
+ # Compute WER using all entries, including those with "<|nocaptions|>"
1549
+ wer = 100 * metric.compute(predictions=norm_pred_str, references=norm_label_str)
1550
+ return {"wer": wer, "wer_ortho": wer_ortho}, pred_str, label_str, norm_pred_str, norm_label_str
1551
+
1552
+
1553
+ # 9. Save feature extractor, tokenizer, config and generation config
1554
+ feature_extractor.save_pretrained(training_args.output_dir)
1555
+ tokenizer.save_pretrained(training_args.output_dir)
1556
+ config.save_pretrained(training_args.output_dir)
1557
+ student_model.generation_config.save_pretrained(
1558
+ training_args.output_dir
1559
+ ) # generation config stays bound to model to make it easy to jit
1560
+
1561
+ processor = WhisperProcessor.from_pretrained(training_args.output_dir)
1562
+
1563
+ data_collator = FlaxDataCollatorSpeechSeq2SeqWithPadding(
1564
+ processor=processor,
1565
+ decoder_start_token_id=student_model.config.decoder_start_token_id, # <|startoftranscript|>
1566
+ decoder_prev_token_id=tokenizer.all_special_ids[-3], # <|startofprev|>
1567
+ input_padding="longest",
1568
+ target_padding="max_length",
1569
+ max_target_length=max_label_length,
1570
+ )
1571
+
1572
+ # Initialize our training
1573
+ rng = jax.random.PRNGKey(training_args.seed)
1574
+ rng, dropout_rng = jax.random.split(rng)
1575
+
1576
+ # Store some constants
1577
+ train_batch_size = int(training_args.per_device_train_batch_size) * jax.device_count()
1578
+ gradient_accumulation_steps = int(training_args.gradient_accumulation_steps)
1579
+ per_device_eval_batch_size = int(training_args.per_device_eval_batch_size)
1580
+ eval_batch_size = per_device_eval_batch_size * jax.device_count()
1581
+
1582
+ if not data_args.streaming and training_args.max_steps < 0:
1583
+ num_epochs = int(training_args.num_train_epochs)
1584
+ steps_per_epoch = len(vectorized_datasets["train"]) // train_batch_size
1585
+ total_train_steps = steps_per_epoch * num_epochs
1586
+ elif training_args.max_steps > 0:
1587
+ logger.info("max_steps is given, it will override any value given in num_train_epochs")
1588
+ total_train_steps = int(training_args.max_steps)
1589
+ # Setting a very large number of epochs so we go as many times as necessary over the iterator.
1590
+ num_epochs = sys.maxsize
1591
+ steps_per_epoch = total_train_steps
1592
+ else:
1593
+ raise ValueError("max_steps must be specified when training with a streaming (iterable) dataset")
1594
+
1595
+ if training_args.eval_steps is None:
1596
+ logger.info(
1597
+ f"eval_steps is not set, evaluating at the end of {'each epoch' if not data_args.streaming else 'training'}"
1598
+ )
1599
+ eval_steps = steps_per_epoch
1600
+ else:
1601
+ eval_steps = training_args.eval_steps
1602
+
1603
+ # Create learning rate schedule
1604
+ linear_decay_lr_schedule_fn = create_learning_rate_fn(
1605
+ total_train_steps * gradient_accumulation_steps,
1606
+ training_args.lr_scheduler_type,
1607
+ training_args.warmup_steps * gradient_accumulation_steps,
1608
+ training_args.learning_rate,
1609
+ )
1610
+
1611
+ # We use Optax's "masking" functionality to not apply weight decay
1612
+ # to bias and LayerNorm scale parameters. decay_mask_fn returns a
1613
+ # mask boolean with the same structure as the parameters.
1614
+ # The mask is True for parameters that should be decayed.
1615
+ def decay_mask_fn(params):
1616
+ flat_params = traverse_util.flatten_dict(params)
1617
+ # find out all LayerNorm parameters
1618
+ layer_norm_candidates = [
1619
+ "layer_norm",
1620
+ "self_attn_layer_norm",
1621
+ "final_layer_norm",
1622
+ "encoder_attn_layer_norm",
1623
+ ]
1624
+ layer_norm_named_params = {
1625
+ layer[-2:]
1626
+ for layer_norm_name in layer_norm_candidates
1627
+ for layer in flat_params.keys()
1628
+ if layer_norm_name in "".join(layer).lower()
1629
+ }
1630
+ flat_mask = {path: path[-1] != "bias" and path[-2:] not in layer_norm_named_params for path in flat_params}
1631
+ return traverse_util.unflatten_dict(flat_mask)
1632
+
1633
+ # create adam optimizer
1634
+ adamw = optax.adamw(
1635
+ learning_rate=linear_decay_lr_schedule_fn,
1636
+ b1=training_args.adam_beta1,
1637
+ b2=training_args.adam_beta2,
1638
+ eps=training_args.adam_epsilon,
1639
+ weight_decay=training_args.weight_decay,
1640
+ mask=decay_mask_fn,
1641
+ )
1642
+
1643
+ if gradient_accumulation_steps > 1:
1644
+ # accumulate gradients and apply once every k steps
1645
+ adamw = optax.MultiSteps(adamw, every_k_schedule=gradient_accumulation_steps)
1646
+
1647
+ share_hidden_states = training_args.freeze_encoder and student_model.config.d_model == teacher_model.config.d_model
1648
+ encoder_layer_mapping = get_layers_to_supervise(
1649
+ student_model.config.encoder_layers, teacher_model.config.encoder_layers
1650
+ )
1651
+ decoder_layer_mapping = get_layers_to_supervise(
1652
+ student_model.config.decoder_layers, teacher_model.config.decoder_layers
1653
+ )
1654
+
1655
+ # Setup train state
1656
+ student_state = TrainState.create(
1657
+ apply_fn=student_model.decode if share_hidden_states else student_model.__call__,
1658
+ params=student_params,
1659
+ tx=adamw,
1660
+ to_dtype=to_dtype,
1661
+ dropout_rng=dropout_rng,
1662
+ max_grad_norm=training_args.max_grad_norm,
1663
+ )
1664
+
1665
+ if training_args.resume_from_checkpoint is not None:
1666
+ if os.path.isfile(os.path.join(training_args.resume_from_checkpoint, "train_state.msgpack")):
1667
+ logger.info(
1668
+ f"Checkpoint detected, resuming training at {training_args.resume_from_checkpoint}. To avoid "
1669
+ "this behavior, omit the resume_from_checkpoint argument."
1670
+ )
1671
+ with Path(os.path.join(training_args.resume_from_checkpoint, "train_state.msgpack")).open("rb") as f:
1672
+ student_state = from_bytes(student_state, f.read())
1673
+ else:
1674
+ logger.warning(
1675
+ f"Checkpoint {training_args.resume_from_checkpoint} not detected, training from scratch. Ensure "
1676
+ f"you pass the path to a folder with a valid checkpoint for your model."
1677
+ )
1678
+
1679
+ def cross_entropy_loss(logits, labels):
1680
+ vocab_size = logits.shape[-1]
1681
+ # optax onehot always returns a float32 device array, need to downcast if performing mixed precision training
1682
+ onehot_targets = to_dtype(onehot(labels, vocab_size))
1683
+ loss = optax.softmax_cross_entropy(logits, onehot_targets)
1684
+ # ignore padded tokens from loss, i.e. where labels are not set to -100
1685
+ padding = labels >= 0
1686
+ loss = loss * padding
1687
+ loss = loss.sum()
1688
+ num_labels = padding.sum()
1689
+ return loss, num_labels
1690
+
1691
+ # temperature smoothed kl-divergence
1692
+ def kl_divergence(target_distribution, log_predicted_distribution, labels, eps=1e-20):
1693
+ divergence = -target_distribution * (log_predicted_distribution - jnp.log(target_distribution + eps))
1694
+ # ignore padded tokens from divergence, i.e. where labels are not set to -100
1695
+ padding_mask = labels >= 0
1696
+ padding_mask = jnp.expand_dims(padding_mask, axis=-1)
1697
+ divergence = (divergence * padding_mask).sum()
1698
+ return to_dtype(divergence) # respect the dtype of the backprop
1699
+
1700
+ def mean_square_error_loss(student_outputs, teacher_outputs):
1701
+ mse = dtype(0.0)
1702
+
1703
+ # tie encoder embeddings
1704
+ mse += jnp.mean(
1705
+ jnp.square(teacher_outputs.encoder_hidden_states[0] - student_outputs.encoder_hidden_states[0])
1706
+ )
1707
+
1708
+ for student_layer_id, teacher_layer_id in encoder_layer_mapping.items():
1709
+ # offset the hidden-state layer ids by 1 to account for the extra embedding hidden-state
1710
+ student_hidden_state = student_outputs.encoder_hidden_states[student_layer_id + 1]
1711
+ teacher_hidden_state = teacher_outputs.encoder_hidden_states[teacher_layer_id + 1]
1712
+ mse += jnp.mean(jnp.square(teacher_hidden_state - student_hidden_state))
1713
+
1714
+ # student_attention = student_outputs.encoder_attentions[student_layer_id]
1715
+ # teacher_attention = teacher_outputs.encoder_attentions[teacher_layer_id]
1716
+ # mse += jnp.mean(jnp.square(student_attention - teacher_attention))
1717
+
1718
+ # tie decoder embeddings
1719
+ mse += jnp.mean(
1720
+ jnp.square(teacher_outputs.decoder_hidden_states[0] - student_outputs.decoder_hidden_states[0])
1721
+ )
1722
+
1723
+ for student_layer_id, teacher_layer_id in decoder_layer_mapping.items():
1724
+ # offset the hidden-state layer ids by 1 to account for the extra embedding hidden-state
1725
+ student_hidden_state = student_outputs.decoder_hidden_states[student_layer_id + 1]
1726
+ teacher_hidden_state = teacher_outputs.decoder_hidden_states[teacher_layer_id + 1]
1727
+ mse += jnp.mean(jnp.square(teacher_hidden_state - student_hidden_state))
1728
+
1729
+ # student_attention = student_outputs.decoder_attentions[student_layer_id]
1730
+ # teacher_attention = teacher_outputs.decoder_attentions[teacher_layer_id]
1731
+ # mse += jnp.mean(jnp.square(student_attention - teacher_attention))
1732
+
1733
+ # student_cross_attention = student_outputs.cross_attentions[student_layer_id]
1734
+ # teacher_cross_attention = teacher_outputs.cross_attentions[teacher_layer_id]
1735
+ # mse += jnp.mean(jnp.square(student_cross_attention - teacher_cross_attention))
1736
+
1737
+ return to_dtype(mse) # respect the dtype of the backprop
1738
+
1739
+ # Define gradient update step fn
1740
+ def train_step(
1741
+ student_state,
1742
+ teacher_params,
1743
+ batch,
1744
+ freeze_encoder,
1745
+ share_hidden_states,
1746
+ temperature=2.0,
1747
+ ):
1748
+ dropout_rng, new_dropout_rng = jax.random.split(student_state.dropout_rng)
1749
+
1750
+ def compute_loss(student_params):
1751
+ labels = batch.pop("labels")
1752
+ output_hidden_states = not share_hidden_states and training_args.mse_weight > 0.0
1753
+
1754
+ teacher_outputs = teacher_model(
1755
+ **batch,
1756
+ params=teacher_params,
1757
+ freeze_encoder=True,
1758
+ output_hidden_states=output_hidden_states,
1759
+ train=False,
1760
+ )
1761
+
1762
+ if share_hidden_states:
1763
+ # if the student and teacher share the same frozen encoder then we don't have to recompute the
1764
+ # encoder hidden-states for the student model, we can just re-use from the teacher
1765
+ encoder_hidden_states = jax.lax.stop_gradient(teacher_outputs.encoder_last_hidden_state)
1766
+ encoder_outputs = FlaxBaseModelOutput(last_hidden_state=encoder_hidden_states)
1767
+
1768
+ student_outputs = student_state.apply_fn(
1769
+ decoder_input_ids=batch["decoder_input_ids"],
1770
+ encoder_outputs=encoder_outputs,
1771
+ params=student_params,
1772
+ dropout_rng=dropout_rng,
1773
+ train=True,
1774
+ )
1775
+ else:
1776
+ # do the full forward pass for the student model (encoder + decoder)
1777
+ student_outputs = student_state.apply_fn(
1778
+ **batch,
1779
+ params=student_params,
1780
+ dropout_rng=dropout_rng,
1781
+ freeze_encoder=freeze_encoder,
1782
+ output_hidden_states=output_hidden_states,
1783
+ train=True,
1784
+ )
1785
+
1786
+ # CE (data) loss
1787
+ ce_loss, num_labels = cross_entropy_loss(student_outputs.logits, labels)
1788
+
1789
+ # rescale by temperature to ensure gradients scale correctly
1790
+ teacher_distribution = jax.nn.softmax(teacher_outputs.logits / temperature, axis=-1)
1791
+ # ensure no information flow backwards through teacher
1792
+ teacher_distribution = jax.lax.stop_gradient(teacher_distribution)
1793
+ # log softmax of student predictions for numerical stability
1794
+ student_distribution = jax.nn.log_softmax(student_outputs.logits / temperature, axis=-1)
1795
+ # KL-divergence loss (scaled by temperature)
1796
+ kl_loss = kl_divergence(teacher_distribution, student_distribution, labels) * temperature**2
1797
+
1798
+ # MSE loss between enc-dec hidden-states and attentions
1799
+ mse_loss = (
1800
+ mean_square_error_loss(student_outputs, teacher_outputs)
1801
+ if output_hidden_states
1802
+ else jnp.zeros_like(kl_loss)
1803
+ )
1804
+
1805
+ # use DistilBart formulation - only tune the MSE weight and take remaining HPs from DistilBERT
1806
+ ce_weight = 0.8 if training_args.kl_weight > 0 else 1.0
1807
+ loss = ce_weight * ce_loss + training_args.kl_weight * kl_loss + training_args.mse_weight * mse_loss
1808
+
1809
+ return loss, (
1810
+ ce_loss,
1811
+ kl_loss,
1812
+ mse_loss,
1813
+ num_labels,
1814
+ )
1815
+
1816
+ grad_fn = jax.value_and_grad(compute_loss, has_aux=True)
1817
+ (loss, (ce_loss, kl_loss, mse_loss, num_labels)), grad = grad_fn(to_dtype(student_state.params))
1818
+
1819
+ # true loss = total loss / total samples
1820
+ loss = jax.lax.psum(loss, "batch")
1821
+ num_labels = jax.lax.psum(num_labels, "batch")
1822
+ loss = jax.tree_util.tree_map(lambda x: x / num_labels, loss)
1823
+
1824
+ # true grad = total grad / total samples
1825
+ grad = jax.lax.psum(grad, "batch")
1826
+ grad = jax.tree_util.tree_map(lambda x: x / num_labels, grad)
1827
+ new_state = student_state.apply_gradients(grads=grad, dropout_rng=new_dropout_rng, to_dtype=to_dtype)
1828
+
1829
+ # CE/KL/MSE losses for logging
1830
+ ce_loss = jax.lax.psum(ce_loss, "batch")
1831
+ ce_loss = jax.tree_util.tree_map(lambda x: x / num_labels, ce_loss)
1832
+
1833
+ kl_loss = jax.lax.psum(kl_loss, "batch")
1834
+ kl_loss = jax.tree_util.tree_map(lambda x: x / num_labels, kl_loss)
1835
+
1836
+ mse_loss = jax.lax.psum(mse_loss, "batch")
1837
+ mse_loss = jax.tree_util.tree_map(lambda x: x / num_labels, mse_loss)
1838
+
1839
+ metrics = {
1840
+ "loss": loss,
1841
+ "learning_rate": linear_decay_lr_schedule_fn(student_state.step),
1842
+ "ce_loss": ce_loss,
1843
+ "kl_loss": kl_loss,
1844
+ "mse_loss": mse_loss,
1845
+ }
1846
+ return new_state, metrics
1847
+
1848
+ # Define eval fn
1849
+ def eval_step(student_params, teacher_params, batch):
1850
+ labels = batch.pop("labels")
1851
+ output_hidden_states = not share_hidden_states and training_args.mse_weight > 0
1852
+
1853
+ student_outputs = student_model(
1854
+ **batch,
1855
+ params=student_params,
1856
+ output_hidden_states=output_hidden_states,
1857
+ train=False,
1858
+ )
1859
+ student_distribution = jax.nn.log_softmax(student_outputs.logits, axis=-1)
1860
+ ce_loss, num_labels = cross_entropy_loss(student_outputs.logits, labels)
1861
+
1862
+ teacher_outputs = teacher_model(
1863
+ **batch,
1864
+ params=teacher_params,
1865
+ output_hidden_states=output_hidden_states,
1866
+ train=False,
1867
+ )
1868
+ teacher_distribution = jax.nn.softmax(teacher_outputs.logits, axis=-1)
1869
+ # temperature is always 1 for eval
1870
+ kl_loss = kl_divergence(teacher_distribution, student_distribution, labels)
1871
+
1872
+ mse_loss = (
1873
+ mean_square_error_loss(student_outputs, teacher_outputs)
1874
+ if output_hidden_states
1875
+ else jnp.zeros_like(kl_loss)
1876
+ )
1877
+
1878
+ ce_weight = 0.8 if training_args.kl_weight > 0 else 1.0
1879
+ loss = ce_weight * ce_loss + training_args.kl_weight * kl_loss + training_args.mse_weight * mse_loss
1880
+ # true loss = total loss / total samples
1881
+ loss = jax.lax.psum(loss, "batch")
1882
+ num_labels = jax.lax.psum(num_labels, "batch")
1883
+ loss = jax.tree_util.tree_map(lambda x: x / num_labels, loss)
1884
+
1885
+ # CE/KL/MSE losses for logging
1886
+ ce_loss = jax.lax.psum(ce_loss, "batch")
1887
+ ce_loss = jax.tree_util.tree_map(lambda x: x / num_labels, ce_loss)
1888
+
1889
+ kl_loss = jax.lax.psum(kl_loss, "batch")
1890
+ kl_loss = jax.tree_util.tree_map(lambda x: x / num_labels, kl_loss)
1891
+
1892
+ mse_loss = jax.lax.psum(mse_loss, "batch")
1893
+ mse_loss = jax.tree_util.tree_map(lambda x: x / num_labels, mse_loss)
1894
+
1895
+ metrics = {"loss": loss, "ce_loss": ce_loss, "kl_loss": kl_loss, "mse_loss": mse_loss}
1896
+ return metrics
1897
+
1898
+ # Define generation function
1899
+ num_beams = (
1900
+ training_args.generation_num_beams
1901
+ if training_args.generation_num_beams is not None
1902
+ else student_model.config.num_beams
1903
+ )
1904
+
1905
+ # forcing the language and task tokens helps the model in its generations
1906
+ gen_kwargs = {
1907
+ "max_length": max_label_length,
1908
+ "num_beams": num_beams,
1909
+ "language": "<|en|>",
1910
+ "task": "transcribe",
1911
+ "return_timestamps": return_timestamps,
1912
+ }
1913
+
1914
+ def generate_step(student_params, batch):
1915
+ output_ids = student_model.generate(
1916
+ batch[model_input_name],
1917
+ attention_mask=batch.get("attention_mask"),
1918
+ params=student_params,
1919
+ **gen_kwargs,
1920
+ )
1921
+ return output_ids.sequences
1922
+
1923
+ # Replicate the train state on each device
1924
+ student_state = student_state.replicate()
1925
+
1926
+ # Replicate the teacher params on each device
1927
+ teacher_params = jax_utils.replicate(teacher_params)
1928
+
1929
+ # Create parallel version of the train and eval step
1930
+ p_train_step = jax.pmap(
1931
+ train_step,
1932
+ "batch",
1933
+ in_axes=(0, 0, 0, None, None, None),
1934
+ donate_argnums=(0,),
1935
+ static_broadcasted_argnums=(
1936
+ 3,
1937
+ 4,
1938
+ ),
1939
+ )
1940
+ p_eval_step = jax.pmap(eval_step, "batch")
1941
+ p_generate_step = jax.pmap(generate_step, "batch")
1942
+
1943
+ logger.info("***** Running training *****")
1944
+ logger.info(f" Num examples = {total_train_steps * train_batch_size * gradient_accumulation_steps}")
1945
+ logger.info(" Instantaneous batch size per device =" f" {training_args.per_device_train_batch_size}")
1946
+ logger.info(" Gradient accumulation steps =" f" {gradient_accumulation_steps}")
1947
+ logger.info(
1948
+ f" Total train batch size (w. parallel & distributed) = {train_batch_size * gradient_accumulation_steps}"
1949
+ )
1950
+ logger.info(f" Total optimization steps = {total_train_steps}")
1951
+
1952
+ # ======================== Training ================================
1953
+ train_time = 0
1954
+ train_start = time.time()
1955
+ train_metrics = []
1956
+ batches_to_skip = jax.device_get(unreplicate(student_state.step))
1957
+ cur_step = int(batches_to_skip) # will be zero if starting from scratch
1958
+ epochs_trained = batches_to_skip // steps_per_epoch
1959
+ steps_trained_progress_bar = tqdm(range(total_train_steps), desc="Train steps ... ", position=0)
1960
+ steps_trained_progress_bar.update(batches_to_skip)
1961
+ continue_training = True
1962
+ minibatch_steps = 0
1963
+ print("Debug 8")
1964
+ if batches_to_skip > 0:
1965
+ logger.info(" Continuing training from checkpoint, will skip to saved global_step")
1966
+ logger.info(f" Continuing training from epoch {epochs_trained}")
1967
+ logger.info(f" Continuing training from global step {batches_to_skip}")
1968
+ print("debug 9")
1969
+ # Generate a training data loader by shuffling sampling indices from the train dataset
1970
+ train_loader = get_data_loader(
1971
+ training_args.seed,
1972
+ vectorized_datasets["train"],
1973
+ batch_size=train_batch_size,
1974
+ data_collator=data_collator,
1975
+ dataloader_num_workers=dataloader_num_workers,
1976
+ skip_batches=batches_to_skip,
1977
+ prefetch_size=dataloader_prefetch_size,
1978
+ )
1979
+ print("debug 10")
1980
+
1981
+ for epoch in range(epochs_trained, num_epochs):
1982
+ print("Debug 11")
1983
+ if hasattr(train_loader, "dataset") and isinstance(train_loader.dataset, IterableDataset):
1984
+ print("Debug 11B")
1985
+ train_loader.dataset.set_epoch(epoch)
1986
+ breakpoint()
1987
+ print("debug 12")
1988
+ for batch in train_loader:
1989
+ print("debug 13")
1990
+ minibatch_steps += 1
1991
+ update_step = minibatch_steps == gradient_accumulation_steps
1992
+
1993
+ if update_step:
1994
+ steps_trained_progress_bar.update(1)
1995
+ cur_step += 1
1996
+ minibatch_steps = 0
1997
+ print("debug 14")
1998
+ batch = shard(batch.data)
1999
+ student_state, train_metric = p_train_step(
2000
+ student_state,
2001
+ teacher_params,
2002
+ batch,
2003
+ training_args.freeze_encoder,
2004
+ share_hidden_states,
2005
+ training_args.temperature,
2006
+ )
2007
+ print("debug 15")
2008
+ if cur_step % training_args.logging_steps == 0 and update_step:
2009
+ train_metrics.append(train_metric)
2010
+ train_metric_to_write = unreplicate(train_metric)
2011
+ steps_trained_progress_bar.write(
2012
+ f"Step... ({cur_step} / {total_train_steps} | Loss:"
2013
+ f" {train_metric_to_write['loss']}, Learning Rate:"
2014
+ f" {train_metric_to_write['learning_rate']})"
2015
+ )
2016
+ print("debug 16")
2017
+ if has_wandb and jax.process_index() == 0:
2018
+ write_wandb_metric(
2019
+ wandb_logger,
2020
+ train_metric_to_write,
2021
+ train_time + time.time() - train_start,
2022
+ cur_step,
2023
+ epoch,
2024
+ prefix="train",
2025
+ )
2026
+ print("debug 17")
2027
+ # save checkpoint and weights after each save_steps and at the end of training
2028
+ if (cur_step % training_args.save_steps == 0 and update_step) or cur_step == total_train_steps:
2029
+ if jax.process_index() == 0:
2030
+ save_hf_weights(
2031
+ student_state,
2032
+ student_model,
2033
+ processor,
2034
+ training_args.output_dir,
2035
+ cur_step,
2036
+ total_train_steps,
2037
+ use_scan=training_args.use_scan,
2038
+ )
2039
+ if training_args.save_train_state:
2040
+ student_state.save_state(
2041
+ training_args.output_dir, save_total_limit=training_args.save_total_limit
2042
+ )
2043
+ if training_args.push_to_hub:
2044
+ repo.push_to_hub(
2045
+ commit_message=f"Saving train state of step {cur_step}",
2046
+ blocking=False,
2047
+ )
2048
+
2049
+ if training_args.do_eval and (
2050
+ (cur_step % eval_steps == 0 and update_step) or cur_step == total_train_steps
2051
+ ):
2052
+ train_time += time.time() - train_start
2053
+ # ======================== Evaluating ==============================
2054
+ for eval_split in all_eval_splits:
2055
+ eval_metrics = []
2056
+ eval_preds = []
2057
+ eval_labels = []
2058
+ eval_start = time.time()
2059
+
2060
+ eval_loader = get_data_loader(
2061
+ training_args.seed,
2062
+ vectorized_datasets[eval_split],
2063
+ batch_size=eval_batch_size,
2064
+ data_collator=data_collator,
2065
+ shuffle=False,
2066
+ drop_last=False,
2067
+ dataloader_num_workers=dataloader_num_workers,
2068
+ )
2069
+ for batch in tqdm(eval_loader, desc=f"Evaluating {eval_split}...", position=2):
2070
+ # Model forward
2071
+ labels = batch["labels"]
2072
+
2073
+ metrics = pad_shard_unpad(
2074
+ p_eval_step,
2075
+ static_argnums=(
2076
+ 0,
2077
+ 1,
2078
+ ),
2079
+ static_return=True,
2080
+ )(
2081
+ student_state.params,
2082
+ teacher_params,
2083
+ batch.data,
2084
+ min_device_batch=per_device_eval_batch_size,
2085
+ )
2086
+ eval_metrics.append(metrics)
2087
+
2088
+ # generation
2089
+ if training_args.predict_with_generate:
2090
+ generated_ids = pad_shard_unpad(p_generate_step)(
2091
+ student_state.params, batch.data, min_device_batch=per_device_eval_batch_size
2092
+ )
2093
+ eval_preds.extend(jax.device_get(generated_ids.reshape(-1, gen_kwargs["max_length"])))
2094
+ eval_labels.extend(labels)
2095
+
2096
+ eval_time = time.time() - eval_start
2097
+
2098
+ # normalize eval metrics
2099
+ eval_metrics = get_metrics(eval_metrics)
2100
+ eval_metrics = jax.tree_util.tree_map(jnp.mean, eval_metrics)
2101
+
2102
+ # compute WER metric
2103
+ wer_desc = ""
2104
+ if training_args.predict_with_generate:
2105
+ wer_metric, pred_str, label_str, norm_pred_str, norm_label_str = compute_metrics(
2106
+ eval_preds, eval_labels
2107
+ )
2108
+ eval_metrics.update(wer_metric)
2109
+ wer_desc = " ".join([f"Eval {key}: {value} |" for key, value in wer_metric.items()])
2110
+
2111
+ # Print metrics and update progress bar
2112
+ steps_trained_progress_bar.write(
2113
+ f"Eval results for step ({cur_step} / {total_train_steps} | Eval Loss: {eval_metrics['loss']} |"
2114
+ f" {wer_desc})"
2115
+ )
2116
+
2117
+ if has_tensorboard and jax.process_index() == 0:
2118
+ write_eval_metric(
2119
+ summary_writer,
2120
+ eval_metrics,
2121
+ cur_step,
2122
+ prefix=eval_split,
2123
+ )
2124
+
2125
+ if has_wandb and jax.process_index() == 0:
2126
+ write_wandb_metric(wandb_logger, eval_metrics, eval_time, cur_step, epoch, prefix=eval_split)
2127
+ if training_args.predict_with_generate:
2128
+ write_wandb_pred(
2129
+ wandb_logger,
2130
+ pred_str,
2131
+ label_str,
2132
+ norm_pred_str,
2133
+ norm_label_str,
2134
+ cur_step,
2135
+ prefix=eval_split,
2136
+ )
2137
+
2138
+ if has_tensorboard and jax.process_index() == 0:
2139
+ # we'll only log to tensorboard every eval steps
2140
+ write_train_metric(
2141
+ summary_writer,
2142
+ train_metrics,
2143
+ train_time,
2144
+ cur_step,
2145
+ training_args.logging_steps,
2146
+ )
2147
+
2148
+ # flush the train metrics
2149
+ train_start = time.time()
2150
+ train_metrics = []
2151
+
2152
+ # break condition
2153
+ if cur_step == total_train_steps:
2154
+ continue_training = False
2155
+ break
2156
+
2157
+ if not continue_training:
2158
+ break
2159
+
2160
+
2161
+ if __name__ == "__main__":
2162
+ main()
run_distillation_nodes.py ADDED
@@ -0,0 +1,2168 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python
2
+ # coding=utf-8
3
+ # Copyright 2023 The HuggingFace Inc. team. All rights reserved.
4
+ #
5
+ # Licensed under the Apache License, Version 2.0 (the "License");
6
+ # you may not use this file except in compliance with the License.
7
+ # You may obtain a copy of the License at
8
+ #
9
+ # http://www.apache.org/licenses/LICENSE-2.0
10
+ #
11
+ # Unless required by applicable law or agreed to in writing, software
12
+ # distributed under the License is distributed on an "AS IS" BASIS,
13
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14
+ # See the License for the specific language governing permissions and
15
+ # limitations under the License.
16
+ """
17
+ Training the Whisper model for sequence to sequence speech recognition via teacher-student distillation.
18
+ """
19
+ # You can also adapt this script for your own distillation tasks. Pointers for this are left as comments.
20
+
21
+ import logging
22
+ import os
23
+ import re
24
+ import shutil
25
+ import string
26
+ import sys
27
+ import time
28
+ from dataclasses import dataclass, field
29
+ from functools import partial
30
+ from pathlib import Path
31
+ from typing import Any, Callable, Dict, List, Optional, Union
32
+
33
+ import datasets
34
+ import evaluate
35
+ import flax
36
+ import jax
37
+ import jax.numpy as jnp
38
+ import numpy as np
39
+ import optax
40
+ import torch
41
+ import transformers
42
+ from datasets import (
43
+ DatasetDict,
44
+ IterableDataset,
45
+ IterableDatasetDict,
46
+ concatenate_datasets,
47
+ interleave_datasets,
48
+ load_dataset,
49
+ )
50
+ from datasets.distributed import split_dataset_by_node
51
+ from flax import jax_utils, traverse_util
52
+ from flax.jax_utils import pad_shard_unpad, unreplicate
53
+ from flax.serialization import from_bytes, to_bytes
54
+ from flax.training import train_state
55
+ from flax.training.common_utils import get_metrics, onehot, shard, shard_prng_key
56
+ from huggingface_hub import Repository, create_repo
57
+ from jax.experimental.compilation_cache import compilation_cache as cc
58
+ from optax._src import linear_algebra
59
+ from torch.utils.data import DataLoader
60
+ from torchdata.datapipes.iter import IterableWrapper
61
+ from tqdm import tqdm
62
+ from transformers import (
63
+ AddedToken,
64
+ HfArgumentParser,
65
+ Seq2SeqTrainingArguments,
66
+ WhisperConfig,
67
+ WhisperFeatureExtractor,
68
+ WhisperProcessor,
69
+ WhisperTokenizerFast,
70
+ is_tensorboard_available,
71
+ is_wandb_available,
72
+ set_seed,
73
+ )
74
+ from transformers.file_utils import get_full_repo_name
75
+ from transformers.modeling_flax_outputs import FlaxBaseModelOutput
76
+ from transformers.models.whisper.english_normalizer import BasicTextNormalizer,EnglishTextNormalizer
77
+ from transformers.utils import check_min_version, send_example_telemetry
78
+ from transformers.utils.versions import require_version
79
+
80
+ from distil_whisper import FlaxWhisperForConditionalGeneration
81
+
82
+
83
+ # Will error if the minimal version of Transformers is not installed. Remove at your own risks.
84
+ check_min_version("4.27.0.dev0")
85
+
86
+ require_version(
87
+ "datasets>=1.18.0",
88
+ "To fix: pip install -r examples/flax/speech-recogintion/requirements.txt",
89
+ )
90
+
91
+ logger = logging.getLogger(__name__)
92
+
93
+
94
+ @flax.struct.dataclass
95
+ class ModelArguments:
96
+ """
97
+ Arguments pertaining to which model/config/tokenizer we are going to fine-tune from.
98
+ """
99
+
100
+ model_name_or_path: str = field(
101
+ metadata={"help": ("Path to pretrained student model or model identifier from huggingface.co/models")}
102
+ )
103
+ teacher_model_name_or_path: str = field(
104
+ metadata={"help": ("Path to pretrained teacher model or model identifier from huggingface.co/models")}
105
+ )
106
+ config_name: Optional[str] = field(
107
+ default=None,
108
+ metadata={"help": "Pretrained config name or path if not the same as model_name"},
109
+ )
110
+ tokenizer_name: Optional[str] = field(
111
+ default=None,
112
+ metadata={"help": "Pretrained tokenizer name or path if not the same as model_name"},
113
+ )
114
+ feature_extractor_name: Optional[str] = field(
115
+ default=None,
116
+ metadata={"help": "feature extractor name or path if not the same as model_name"},
117
+ )
118
+ cache_dir: Optional[str] = field(
119
+ default=None,
120
+ metadata={"help": ("Where to store the pretrained models downloaded from huggingface.co")},
121
+ )
122
+ use_fast_tokenizer: bool = field(
123
+ default=True,
124
+ metadata={"help": ("Whether to use one of the fast tokenizer (backed by the tokenizers library) or not.")},
125
+ )
126
+ model_revision: str = field(
127
+ default="main",
128
+ metadata={"help": ("The specific model version to use (can be a branch name, tag name or commit id).")},
129
+ )
130
+ subfolder: str = field(
131
+ default="",
132
+ metadata={
133
+ "help": "In case the relevant files are located inside a subfolder of the model repo on huggingface.co, you can"
134
+ "specify the folder name here."
135
+ },
136
+ )
137
+ use_auth_token: bool = field(
138
+ default=False,
139
+ metadata={
140
+ "help": (
141
+ "Will use the token generated when running `transformers-cli login`"
142
+ " (necessary to use this script with private models)."
143
+ )
144
+ },
145
+ )
146
+ dtype: Optional[str] = field(
147
+ default="float32",
148
+ metadata={
149
+ "help": (
150
+ "Floating-point format in which the model weights should be initialized"
151
+ " and trained. Choose one of `[float32, float16, bfloat16]`."
152
+ )
153
+ },
154
+ )
155
+ load_with_scan_weights: bool = field(
156
+ default=False,
157
+ metadata={
158
+ "help": "Whether the pre-trained checkpoint has its weights stored in scan format. Set to True for scanned "
159
+ "weights, defaults to False for non-scan (unrolled) weights."
160
+ },
161
+ )
162
+ activation_dropout: float = field(
163
+ default=0.0,
164
+ metadata={"help": "The dropout ratio for activations inside the fully connected layer."},
165
+ )
166
+ attention_dropout: float = field(
167
+ default=0.0,
168
+ metadata={"help": "The dropout ratio for the attention probabilities."},
169
+ )
170
+ dropout: float = field(
171
+ default=0.0,
172
+ metadata={
173
+ "help": "The dropout probability for all fully connected layers in the embeddings, encoder, and pooler."
174
+ },
175
+ )
176
+
177
+
178
+ @flax.struct.dataclass
179
+ class DataTrainingArguments:
180
+ """
181
+ Arguments pertaining to what data we are going to input our model for training and eval.
182
+ """
183
+
184
+ train_dataset_name: str = field(
185
+ default=None,
186
+ metadata={
187
+ "help": "The name of the training dataset to use (via the datasets library). Load and combine "
188
+ "multiple datasets by separating dataset ids by a '+' symbol. For example, to load and combine "
189
+ " librispeech and common voice, set `train_dataset_name='librispeech_asr+common_voice'`."
190
+ },
191
+ )
192
+ train_dataset_config_name: Optional[str] = field(
193
+ default=None,
194
+ metadata={
195
+ "help": "The configuration name of the training dataset to use (via the datasets library). Load and combine "
196
+ "multiple datasets by separating dataset configs by a '+' symbol."
197
+ },
198
+ )
199
+ train_dataset_samples: str = field(
200
+ default=None,
201
+ metadata={
202
+ "help": "Number of samples in the training data. Load and combine "
203
+ "multiple datasets by separating dataset samples by a '+' symbol."
204
+ },
205
+ )
206
+ eval_dataset_name: str = field(
207
+ default=None,
208
+ metadata={
209
+ "help": "The name of the evaluation dataset to use (via the datasets library). Defaults to the training dataset name if unspecified."
210
+ },
211
+ )
212
+ eval_dataset_config_name: Optional[str] = field(
213
+ default=None,
214
+ metadata={
215
+ "help": "The configuration name of the evaluation dataset to use (via the datasets library). Defaults to the training dataset config name if unspecified"
216
+ },
217
+ )
218
+ dataset_cache_dir: Optional[str] = field(
219
+ default=None,
220
+ metadata={"help": "Path to cache directory for saving and loading datasets"},
221
+ )
222
+ overwrite_cache: bool = field(
223
+ default=False,
224
+ metadata={"help": "Overwrite the cached training and evaluation sets"},
225
+ )
226
+ preprocessing_num_workers: Optional[int] = field(
227
+ default=None,
228
+ metadata={"help": "The number of processes to use for the preprocessing."},
229
+ )
230
+ max_train_samples: Optional[int] = field(
231
+ default=None,
232
+ metadata={
233
+ "help": (
234
+ "For debugging purposes or quicker training, truncate the number of"
235
+ " training examples to this value if set."
236
+ )
237
+ },
238
+ )
239
+ max_eval_samples: Optional[int] = field(
240
+ default=None,
241
+ metadata={
242
+ "help": (
243
+ "For debugging purposes or quicker training, truncate the number of"
244
+ " evaluation examples to this value if set."
245
+ )
246
+ },
247
+ )
248
+ audio_column_name: str = field(
249
+ default="audio",
250
+ metadata={"help": ("The name of the dataset column containing the audio data. Defaults to 'audio'")},
251
+ )
252
+ train_text_column_name: str = field(
253
+ default="whisper_transcript",
254
+ metadata={
255
+ "help": (
256
+ "The name of the dataset column containing the text data. Defaults to"
257
+ " 'whisper_transcript'which is the pseudo-labelled Whisper"
258
+ " transcription data."
259
+ )
260
+ },
261
+ )
262
+ eval_text_column_name: str = field(
263
+ default="text",
264
+ metadata={
265
+ "help": (
266
+ "The name of the dataset column containing the text data. Defaults to"
267
+ " 'text', which is the original text data"
268
+ )
269
+ },
270
+ )
271
+ max_duration_in_seconds: float = field(
272
+ default=30.0,
273
+ metadata={"help": ("Filter audio files that are longer than `max_duration_in_seconds` seconds")},
274
+ )
275
+ min_duration_in_seconds: float = field(
276
+ default=0.0,
277
+ metadata={"help": ("Filter audio files that are shorter than `min_duration_in_seconds` seconds")},
278
+ )
279
+ max_label_length: int = field(
280
+ default=128,
281
+ metadata={"help": "Truncate transcriptions that are longer `max_label_length` tokens."},
282
+ )
283
+ pad_target_to_multiple_of: Optional[int] = field(
284
+ default=None,
285
+ metadata={
286
+ "help": (
287
+ "If set will pad the target sequence to a multiple of the provided"
288
+ " value. This is important to avoid triggering recompilations on TPU."
289
+ " If unspecified, will default to padding the targets to max length."
290
+ )
291
+ },
292
+ )
293
+ preprocessing_only: bool = field(
294
+ default=False,
295
+ metadata={
296
+ "help": (
297
+ "Whether to only do data preprocessing and skip training. This is"
298
+ " especially useful when data preprocessing errors out in distributed"
299
+ " training due to timeout. In this case, one should run the"
300
+ " preprocessing in a non-distributed setup with"
301
+ " `preprocessing_only=True` so that the cached datasets can"
302
+ " consequently be loaded in distributed training"
303
+ )
304
+ },
305
+ )
306
+ train_split_name: str = field(
307
+ default="train",
308
+ metadata={
309
+ "help": ("The name of the training data set split to use (via the datasets library). Defaults to 'train'")
310
+ },
311
+ )
312
+ eval_split_name: str = field(
313
+ default="validation",
314
+ metadata={
315
+ "help": (
316
+ "The name of the evaluation data set split to use (via the datasets"
317
+ " library). Defaults to 'validation'"
318
+ )
319
+ },
320
+ )
321
+ wandb_project: str = field(
322
+ default="distil-whisper",
323
+ metadata={"help": "The name of the wandb project."},
324
+ )
325
+ wandb_name: str = field(
326
+ default=None,
327
+ metadata={"help": "The name of the wandb run."},
328
+ )
329
+ wandb_job_type: str = field(
330
+ default="distil-whisper",
331
+ metadata={"help": "The name of the wandb job type."},
332
+ )
333
+ wandb_dir: str = field(
334
+ default=None,
335
+ metadata={"help": "The absolute path to save the wandb logs."},
336
+ )
337
+ save_code_to_wandb: bool = field(
338
+ default=False,
339
+ metadata={
340
+ "help": (
341
+ "Whether to save main script to wandb. This is valuable for improving"
342
+ " experiment reproducibility and to diff code across experiments in"
343
+ " the UI."
344
+ )
345
+ },
346
+ )
347
+ streaming: bool = field(
348
+ default=True,
349
+ metadata={"help": "Whether to use Datasets' streaming mode to load and the data."},
350
+ )
351
+ wer_threshold: float = field(
352
+ default=None,
353
+ metadata={
354
+ "help": "Filter training data with Whisper transcriptions that have greater than `wer_threshold` "
355
+ "WER with the normalised transcriptions."
356
+ },
357
+ )
358
+ prefetch_size: int = field(
359
+ default=0,
360
+ metadata={"help": "Number of samples to pre-fetch if using an iterable dataset."},
361
+ )
362
+ timestamp_probability: float = field(
363
+ default=0.5, metadata={"help": "Probability for training on timestamped tokens if the data contains it."}
364
+ )
365
+ return_timestamps: bool = field(
366
+ default=False, metadata={"help": "Whether or not to predict timestamps in the generation step."}
367
+ )
368
+ round_timestamps: bool = field(
369
+ default=False,
370
+ metadata={
371
+ "help": "Whether or not to round the timestamp tokens to the nearest tenth of a second."
372
+ "By default, Whisper predicts timestamps to the nearest hundredth of a second."
373
+ "Reducing the timestamp precision to one tenth of a second simplifies the timestamp"
374
+ "prediction task, at the expense of timestamp granularity."
375
+ },
376
+ )
377
+
378
+
379
+ @dataclass
380
+ class FlaxSeq2SeqTrainingArguments(Seq2SeqTrainingArguments):
381
+ use_scan: Optional[bool] = field(
382
+ default=True,
383
+ metadata={
384
+ "help": (
385
+ "Whether or not to use `scan_with_axes` over the encoder and decoder blocks. Using scan results "
386
+ "in faster compile times and more efficient memory use during training, since all of the layers "
387
+ "in the encoder/decoder are stacked, and we perform a lax.scan over the stacked block to index "
388
+ "each layer. However, it results in slower inference time due to the overhead of stacking the "
389
+ "layers this way. Thus, we **always** default to disabling scan for the inference step."
390
+ )
391
+ },
392
+ )
393
+ freeze_encoder: Optional[bool] = field(
394
+ default=False,
395
+ metadata={
396
+ "help": (
397
+ "Whether to freeze the entire encoder model. Only recommended when the entire encoder has been "
398
+ "copied from the teacher model."
399
+ )
400
+ },
401
+ )
402
+ temperature: Optional[float] = field(
403
+ default=2.0, metadata={"help": "Temperature to anneal the logits when computing the softmax."}
404
+ )
405
+ kl_weight: Optional[float] = field(
406
+ default=1.0,
407
+ metadata={
408
+ "help": (
409
+ "Weighting assigned to the MSE loss in the KD formulation. MSE loss is "
410
+ "computed between the teacher-student hidden states and attentions."
411
+ )
412
+ },
413
+ )
414
+ mse_weight: Optional[float] = field(
415
+ default=0.0,
416
+ metadata={
417
+ "help": (
418
+ "Weighting assigned to the MSE loss in the KD formulation. MSE loss is "
419
+ "computed between the teacher-student hidden states and attentions."
420
+ )
421
+ },
422
+ )
423
+ precision: Optional[str] = field(
424
+ default="half_mixed",
425
+ metadata={
426
+ "help": (
427
+ "Precision with which run training, Can be one of `full`, `half_mixed` or `full_mixed`, the latter two"
428
+ "of which enable *mixed-precision* training. **Note that this only specifies the dtype of the computation "
429
+ "and optimizer state. It does not influence the dtype of model parameters.** An explanation of the three "
430
+ "settings is provided below:"
431
+ " 1. Full precision: forward pass, backward pass and optimiser states all in float32."
432
+ " 2. Half mixed precision: forward pass in bfloat16, backward pass and optimiser states in float32. This "
433
+ " corresponds to setting the dtype argument to bfloat16 when instantiating the model."
434
+ " 3. Full mixed precision: forward pass, backward pass and optimiser states all in bfloat16. The dtype "
435
+ " argument is set to bfloat16 for the forward pass, and the gradients computed with respect to the bfloat16 "
436
+ " parameters in the backward pass (giving bfloat16 gradients). The new optimiser states and parameter "
437
+ " updates are computed in float32 by upcasting the bfloat16 gradients and optimiser states to float32 "
438
+ " prior to the optimiser update step. The optimiser states are returned in float32 (but not saved to "
439
+ " memory) and then downcasted to bfloat16 (saved to memory) for the subsequent train step."
440
+ "For further details, refer to https://github.com/deepmind/optax/discussions/336"
441
+ )
442
+ },
443
+ )
444
+ compilation_cache: Optional[bool] = field(
445
+ default=False,
446
+ metadata={
447
+ "help": (
448
+ "Whether to enable the JAX (experimental) compilation cache. The compilation step is *cached* the "
449
+ "first time it is run. Successive compilation steps for the same function utilise the cache to reduce"
450
+ "the compilation time."
451
+ )
452
+ },
453
+ )
454
+ save_train_state: Optional[bool] = field(
455
+ default=False,
456
+ metadata={
457
+ "help": "Whether or not to save the Flax Train State on each `save_steps` steps. Required if you intend"
458
+ "to resume training from partial training runs. If False, only the model weights will be saved."
459
+ "If True, both the model weights and Flax Train state will be saved."
460
+ },
461
+ )
462
+
463
+
464
+ def shift_tokens_right(label_ids: np.array, decoder_start_token_id: int) -> np.ndarray:
465
+ """
466
+ Shift label ids one token to the right.
467
+ """
468
+ shifted_label_ids = np.zeros_like(label_ids)
469
+ shifted_label_ids[:, 1:] = label_ids[:, :-1]
470
+ shifted_label_ids[:, 0] = decoder_start_token_id
471
+
472
+ return shifted_label_ids
473
+
474
+
475
+ @flax.struct.dataclass
476
+ class FlaxDataCollatorSpeechSeq2SeqWithPadding:
477
+ """
478
+ Data collator that will dynamically pad the inputs received.
479
+ Args:
480
+ processor ([`Wav2Vec2Processor`])
481
+ The processor used for proccessing the data.
482
+ decoder_start_token_id (:obj: `int`)
483
+ The start-of-sequence token id of the decoder.
484
+ decoder_prev_token_id (:obj: `int`)
485
+ The start-of-prompt token id of the decoder
486
+ input_padding (:obj:`bool`, :obj:`str` or :class:`~transformers.tokenization_utils_base.PaddingStrategy`, `optional`, defaults to :obj:`True`):
487
+ Select a strategy to pad the returned input sequences (according to the model's padding side and padding index)
488
+ among:
489
+ * :obj:`True` or :obj:`'longest'`: Pad to the longest sequence in the batch (or no padding if only a single
490
+ sequence if provided).
491
+ * :obj:`'max_length'`: Pad to a maximum length specified with the argument :obj:`max_length` or to the
492
+ maximum acceptable input length for the model if that argument is not provided.
493
+ * :obj:`False` or :obj:`'do_not_pad'` (default): No padding (i.e., can output a batch with sequences of
494
+ different lengths).
495
+ target_padding (:obj:`bool`, :obj:`str` or :class:`~transformers.tokenization_utils_base.PaddingStrategy`, `optional`, defaults to :obj:`True`):
496
+ Select a strategy to pad the returned target sequences (according to the model's padding side and padding index).
497
+ See above for details.
498
+ max_target_length (:obj:`int`, `optional`):
499
+ Maximum length of the ``labels`` of the returned list and optionally padding length (see above).
500
+ """
501
+
502
+ processor: Any
503
+ decoder_start_token_id: int
504
+ decoder_prev_token_id: int
505
+ input_padding: Union[bool, str] = "max_length"
506
+ target_padding: Union[bool, str] = "max_length"
507
+ max_target_length: Optional[int] = None
508
+
509
+ def __call__(self, features: List[Dict[str, Union[List[int], np.ndarray]]]) -> Dict[str, np.ndarray]:
510
+ # split inputs and labels since they have to be of different lengths and need
511
+ # different padding methods
512
+ model_input_name = self.processor.model_input_names[0]
513
+
514
+ # dataloader returns a list of features which we convert to a dict
515
+ input_features = {model_input_name: [feature[model_input_name] for feature in features]}
516
+ label_features = {"input_ids": [feature["labels"] for feature in features]}
517
+
518
+ # reformat list to dict and set to pytorch format
519
+ batch = self.processor.feature_extractor.pad(
520
+ input_features,
521
+ padding=self.input_padding,
522
+ return_tensors="np",
523
+ )
524
+
525
+ labels_batch = self.processor.tokenizer.pad(
526
+ label_features,
527
+ max_length=self.max_target_length,
528
+ padding=self.target_padding,
529
+ return_tensors="np",
530
+ )
531
+
532
+ # if bos token is appended in previous tokenization step,
533
+ # cut bos token here as it's append later anyways
534
+ labels = labels_batch["input_ids"]
535
+ if set(np.unique(labels[:, 0])).issubset({self.decoder_start_token_id, self.decoder_prev_token_id}):
536
+ decoder_input_ids = labels[:, :-1]
537
+ labels = labels[:, 1:]
538
+ labels_batch.attention_mask = labels_batch.attention_mask[:, 1:]
539
+ else:
540
+ decoder_input_ids = shift_tokens_right(labels, self.decoder_start_token_id)
541
+
542
+ # replace padding with -100 to ignore correctly when computing the loss
543
+ labels = np.ma.array(labels, mask=np.not_equal(labels_batch.attention_mask, 1))
544
+ labels = labels.filled(fill_value=-100)
545
+
546
+ # replace initial prompt tokens with -100 to ignore correctly when computing the loss
547
+ bos_index = np.argmax(labels == self.decoder_start_token_id, axis=1)
548
+ prompt_mask = np.arange(labels.shape[1]) < bos_index[:, None]
549
+ labels = np.where(prompt_mask, -100, labels)
550
+
551
+ batch["labels"] = labels
552
+ batch["decoder_input_ids"] = decoder_input_ids
553
+
554
+ return batch
555
+
556
+
557
+ def get_data_loader(
558
+ seed: int,
559
+ dataset: IterableDataset,
560
+ batch_size: int,
561
+ data_collator: FlaxDataCollatorSpeechSeq2SeqWithPadding,
562
+ shuffle: bool = False,
563
+ drop_last: bool = True,
564
+ dataloader_num_workers: int = 0,
565
+ skip_batches: int = 0,
566
+ pin_memory: bool = True,
567
+ prefetch_size: int = 0,
568
+ ) -> DataLoader:
569
+ """
570
+ Returns batches of size `batch_size` from `dataset`. If `drop_last` is set to `False`, the final batch may be incomplete,
571
+ and range in size from 1 to `batch_size`. Shuffle batches if `shuffle` is `True`.
572
+
573
+ Args:
574
+ seed (int): Numpy seed for generating pseudo random numbers. Used if shuffling the dataset.
575
+ dataset (IterableDataset): streaming dataset from which to load the data.
576
+ batch_size (int): how many samples per batch to load.
577
+ data_collator (FlaxDataCollatorSpeechSeq2SeqWithPadding, optional): merges a list of samples to form a
578
+ mini-batch of Tensor(s). Used when using batched loading from a map-style dataset.
579
+ shuffle (bool, optional): set to `True` to have the batches reshuffled.
580
+ drop_last (bool, optional): set to ``True`` to drop the last incomplete batch,
581
+ if the dataset size is not divisible by the batch size. If ``False`` and
582
+ the size of dataset is not divisible by the batch size, then the last batch
583
+ will be smaller. (default: ``False``)
584
+ dataloader_num_workers (int, optional): how many subprocesses to use for data
585
+ loading. ``0`` means that the data will be loaded in the main process.
586
+ (default: ``0``)
587
+ skip_batches (int, optional): Efficiently skip the first `skip_batches`.
588
+ pin_memory (bool, optional): If ``True``, the data loader will copy Tensors
589
+ into device/CUDA pinned memory before returning them. If your data elements
590
+ are a custom type, or your :attr:`collate_fn` returns a batch that is a custom type,
591
+ see the example below.
592
+
593
+ """
594
+ if shuffle:
595
+ dataset = dataset.shuffle(seed)
596
+
597
+ if skip_batches > 0:
598
+ dataset = dataset.skip(skip_batches * batch_size)
599
+
600
+ if prefetch_size > 0:
601
+ dataset = IterableWrapper(dataset)
602
+ dataset = dataset.prefetch(prefetch_size)
603
+
604
+ num_of_hosts = jax.process_count()
605
+ dataset = split_dataset_by_node(dataset, rank=jax.process_index(), world_size=num_of_hosts)
606
+
607
+ assert batch_size % num_of_hosts == 0, "Batch size must be divisible by the number of hosts."
608
+ if dataset.n_shards < dataloader_num_workers:
609
+ dataloader_num_workers = dataset.n_shards
610
+
611
+ data_loader = DataLoader(
612
+ dataset,
613
+ batch_size=batch_size //num_of_hosts,
614
+ drop_last=drop_last,
615
+ pin_memory=pin_memory,
616
+ collate_fn=data_collator,
617
+ num_workers=dataloader_num_workers,
618
+ )
619
+
620
+ return data_loader
621
+
622
+
623
+ def sorted_checkpoints(output_dir=None, checkpoint_prefix="checkpoint", use_mtime=False) -> List[str]:
624
+ ordering_and_checkpoint_path = []
625
+
626
+ glob_checkpoints = [str(x) for x in Path(output_dir).glob(f"{checkpoint_prefix}-*") if os.path.isdir(x)]
627
+
628
+ for path in glob_checkpoints:
629
+ if use_mtime:
630
+ ordering_and_checkpoint_path.append((os.path.getmtime(path), path))
631
+ else:
632
+ regex_match = re.match(f".*{checkpoint_prefix}-([0-9]+)", path)
633
+ if regex_match is not None and regex_match.groups() is not None:
634
+ ordering_and_checkpoint_path.append((int(regex_match.groups()[0]), path))
635
+
636
+ checkpoints_sorted = sorted(ordering_and_checkpoint_path)
637
+ checkpoints_sorted = [checkpoint[1] for checkpoint in checkpoints_sorted]
638
+ return checkpoints_sorted
639
+
640
+
641
+ def rotate_checkpoints(
642
+ save_total_limit=None, use_mtime=False, output_dir=None, checkpoint_prefix="checkpoint"
643
+ ) -> None:
644
+ if save_total_limit is None or save_total_limit <= 0:
645
+ return
646
+
647
+ # Check if we should delete older checkpoint(s)
648
+ checkpoints_sorted = sorted_checkpoints(
649
+ use_mtime=use_mtime, output_dir=output_dir, checkpoint_prefix=checkpoint_prefix
650
+ )
651
+ if len(checkpoints_sorted) <= save_total_limit:
652
+ return
653
+
654
+ number_of_checkpoints_to_delete = max(0, len(checkpoints_sorted) - save_total_limit)
655
+ checkpoints_to_be_deleted = checkpoints_sorted[:number_of_checkpoints_to_delete]
656
+ for checkpoint in checkpoints_to_be_deleted:
657
+ logger.info(f"Deleting older checkpoint [{checkpoint}] due to args.save_total_limit")
658
+ shutil.rmtree(checkpoint, ignore_errors=True)
659
+
660
+
661
+ def to_fp32(t):
662
+ return jax.tree_map(lambda x: x.astype(jnp.float32) if x.dtype == jnp.bfloat16 else x, t)
663
+
664
+
665
+ def to_bf16(t):
666
+ return jax.tree_map(lambda x: x.astype(jnp.bfloat16) if x.dtype == jnp.float32 else x, t)
667
+
668
+
669
+ class TrainState(train_state.TrainState):
670
+ dropout_rng: jnp.ndarray
671
+ max_grad_norm: float
672
+
673
+ def apply_gradients(self, *, grads, to_dtype: to_fp32, **kwargs):
674
+ """Updates `step`, `params`, `opt_state` and `**kwargs` in return value, clipping the
675
+ gradients by the maximum grad norm.
676
+
677
+ Note that internally this function calls `.tx.update()` followed by a call
678
+ to `optax.apply_updates()` to update `params` and `opt_state`.
679
+
680
+ Args:
681
+ grads: Gradients that have the same pytree structure as `.params`.
682
+ **kwargs: Additional dataclass attributes that should be `.replace()`-ed.
683
+
684
+ Returns:
685
+ An updated instance of `self` with `step` incremented by one, `params`
686
+ and `opt_state` updated by applying `grads`, and additional attributes
687
+ replaced as specified by `kwargs`.
688
+ """
689
+ # clip gradients by global l2 norm
690
+ casted_max_grad_norm = to_dtype(self.max_grad_norm)
691
+ g_norm = linear_algebra.global_norm(grads)
692
+ g_norm = jnp.maximum(casted_max_grad_norm, g_norm)
693
+ grads = jax.tree_map(lambda t: (t / g_norm) * casted_max_grad_norm, grads)
694
+
695
+ # perform update step in fp32 and subsequently downcast optimizer states if mixed precision training
696
+ # grads and opt_state in bf16 (need to upcast), params in fp32 (leave as is)
697
+ updates, new_opt_state = self.tx.update(to_fp32(grads), to_fp32(self.opt_state), self.params)
698
+
699
+ new_params = optax.apply_updates(self.params, updates)
700
+
701
+ return self.replace(
702
+ step=self.step + 1,
703
+ params=new_params,
704
+ opt_state=to_dtype(new_opt_state),
705
+ **kwargs,
706
+ )
707
+
708
+ @classmethod
709
+ def create(cls, *, apply_fn, params, tx, to_dtype: to_fp32, **kwargs):
710
+ """Creates a new instance with `step=0` and initialized `opt_state`."""
711
+ # downcast optimizer state to bf16 if mixed-precision training
712
+ opt_state = tx.init(to_dtype(params))
713
+ return cls(
714
+ step=0,
715
+ apply_fn=apply_fn,
716
+ params=params,
717
+ tx=tx,
718
+ opt_state=opt_state,
719
+ **kwargs,
720
+ )
721
+
722
+ def replicate(self):
723
+ return jax_utils.replicate(self).replace(dropout_rng=shard_prng_key(self.dropout_rng))
724
+
725
+ def unreplicate(self):
726
+ return jax_utils.unreplicate(self)
727
+
728
+ def save_state(self, output_dir, save_total_limit=None, checkpoint_prefix="checkpoint"):
729
+ step = int(jax.device_get(unreplicate(self.step)))
730
+ serialized_state = to_bytes(self.unreplicate())
731
+
732
+ output_file = Path(os.path.join(output_dir, f"{checkpoint_prefix}-{step}", "train_state.msgpack"))
733
+ output_file.parent.mkdir(exist_ok=True, parents=True)
734
+
735
+ with output_file.open("wb") as f:
736
+ f.write(serialized_state)
737
+
738
+ logger.info(f"Flax train state saved in {output_file}")
739
+ rotate_checkpoints(
740
+ save_total_limit=save_total_limit, output_dir=output_dir, checkpoint_prefix=checkpoint_prefix
741
+ )
742
+
743
+
744
+ def save_hf_weights(
745
+ student_state: TrainState,
746
+ student_model: FlaxWhisperForConditionalGeneration,
747
+ processor: WhisperProcessor,
748
+ output_dir: str,
749
+ cur_step: int,
750
+ total_train_steps: int,
751
+ use_scan: bool = True,
752
+ checkpoint_prefix: str = "checkpoint",
753
+ ) -> None:
754
+ # always disable scan in the params / model so that we can load from PyTorch directly - this is a no-op if we're not using scan for training
755
+ student_state_params = unreplicate(student_state.params)
756
+ student_state_params = student_model.convert_scan_to_unroll(student_state_params)
757
+ student_params = jax.device_get(student_state_params)
758
+ student_model.disable_scan()
759
+
760
+ if cur_step != total_train_steps:
761
+ output_dir = os.path.join(output_dir, f"{checkpoint_prefix}-{cur_step}")
762
+ os.makedirs(output_dir, exist_ok=True)
763
+
764
+ student_model.save_pretrained(output_dir, params=student_params)
765
+ processor.save_pretrained(output_dir)
766
+
767
+ # re-enable scan only if required for training
768
+ if use_scan:
769
+ student_model.enable_scan()
770
+
771
+
772
+ def write_train_metric(summary_writer, train_metrics, train_time, step, logging_steps):
773
+ summary_writer.scalar("train/time", train_time, step)
774
+ # Check if train_metrics is empty
775
+ if not train_metrics:
776
+ print("DEBUG: train_metrics is empty; This is probably a bug that needs fixing.")
777
+ return # Early exit if train_metrics is empty to avoid further processing
778
+
779
+ train_metrics = get_metrics(train_metrics)
780
+ for key, vals in train_metrics.items():
781
+ steps_arr = np.arange(0, step, logging_steps)[-len(vals) :]
782
+ tag = f"train/{key}"
783
+ for i, val in enumerate(vals):
784
+ summary_writer.scalar(tag, val, steps_arr[i])
785
+
786
+
787
+ def write_eval_metric(summary_writer, eval_metrics, step, prefix="eval"):
788
+ for metric_name, value in eval_metrics.items():
789
+ summary_writer.scalar(f"{prefix}/{metric_name}", value, step)
790
+
791
+
792
+ def write_wandb_metric(wandb_logger, metrics, train_time, step, epoch, prefix="train"):
793
+ log_metrics = {}
794
+ for k, v in metrics.items():
795
+ log_metrics[f"{prefix}/{k}"] = v
796
+ log_metrics[f"{prefix}/time"] = train_time
797
+ log_metrics[f"{prefix}/epoch"] = epoch
798
+ wandb_logger.log(log_metrics, step)
799
+
800
+
801
+ def write_wandb_pred(
802
+ wandb_logger, pred_str, label_str, norm_pred_str, norm_label_str, cur_step, prefix="eval", num_lines=200000
803
+ ):
804
+ # pretty name for current step: step 50000 -> step 50k
805
+ cur_step_pretty = f"{int(cur_step // 1000)}k" if cur_step > 1000 else cur_step
806
+ # convert str data to a wandb compatible format
807
+ str_data = [[label_str[i], pred_str[i], norm_label_str[i], norm_pred_str[i]] for i in range(len(pred_str))]
808
+ # log as a table with the appropriate headers
809
+ wandb_logger.log(
810
+ {
811
+ f"predictions/{prefix.replace('/', '-')}-step-{cur_step_pretty}": wandb_logger.Table(
812
+ columns=["Target", "Pred", "Norm Target", "Norm Pred"], data=str_data[:num_lines]
813
+ )
814
+ },
815
+ cur_step,
816
+ )
817
+ # log incorrect normalised predictions
818
+ str_data = np.asarray(str_data)
819
+ str_data_incorrect = str_data[str_data[:, -2] != str_data[:, -1]]
820
+ # log as a table with the appropriate headers
821
+ wandb_logger.log(
822
+ {
823
+ f"incorrect_predictions/{prefix.replace('/', '-')}-step-{cur_step_pretty}": wandb_logger.Table(
824
+ columns=["Target", "Pred", "Norm Target", "Norm Pred"], data=str_data_incorrect[:num_lines]
825
+ )
826
+ },
827
+ cur_step,
828
+ )
829
+
830
+
831
+ def create_learning_rate_fn(
832
+ num_train_steps: int, lr_scheduler_type: str, num_warmup_steps: int, learning_rate: float
833
+ ) -> Callable[[int], jnp.array]:
834
+ """Returns a linear warmup, linear_decay learning rate function."""
835
+ lr_scheduler_types = ("linear", "constant_with_warmup")
836
+
837
+ if lr_scheduler_type not in lr_scheduler_types:
838
+ raise ValueError(
839
+ f"lr_scheduler_type of type {lr_scheduler_type} not supported, choose from {lr_scheduler_types}."
840
+ )
841
+
842
+ warmup_fn = optax.linear_schedule(init_value=0.0, end_value=learning_rate, transition_steps=num_warmup_steps)
843
+ decay_fn = optax.linear_schedule(
844
+ init_value=learning_rate,
845
+ end_value=0 if lr_scheduler_type == "linear" else learning_rate,
846
+ transition_steps=num_train_steps - num_warmup_steps,
847
+ )
848
+ schedule_fn = optax.join_schedules(schedules=[warmup_fn, decay_fn], boundaries=[num_warmup_steps])
849
+ return schedule_fn
850
+
851
+
852
+ def convert_dataset_str_to_list(
853
+ dataset_names,
854
+ dataset_config_names,
855
+ splits=None,
856
+ text_column_names=None,
857
+ dataset_samples=None,
858
+ default_split="train",
859
+ ):
860
+ if isinstance(dataset_names, str):
861
+ dataset_names = dataset_names.split("+")
862
+
863
+ # we assume that all the datasets we're using derive from the distil-whisper org on the Hub - prepend the org name if necessary
864
+ for i in range(len(dataset_names)):
865
+ ds_name = dataset_names[i]
866
+ dataset_names[i] = f"distil-whisper/{ds_name}" if "/" not in ds_name else ds_name
867
+
868
+ dataset_config_names = dataset_config_names.split("+")
869
+ splits = splits.split("+") if splits is not None else None
870
+ text_column_names = text_column_names.split("+") if text_column_names is not None else None
871
+ dataset_samples = dataset_samples.split("+") if dataset_samples is not None else None
872
+
873
+ # basic checks to ensure we've got the right number of datasets/configs/splits/columns/probs
874
+ if len(dataset_names) != len(dataset_config_names):
875
+ raise ValueError(
876
+ f"Ensure one config is passed for each dataset, got {len(dataset_names)} datasets and"
877
+ f" {len(dataset_config_names)} configs."
878
+ )
879
+
880
+ if splits is not None and len(splits) != len(dataset_names):
881
+ raise ValueError(
882
+ f"Ensure one split is passed for each dataset, got {len(dataset_names)} datasets and {len(splits)} splits."
883
+ )
884
+
885
+ if text_column_names is not None and len(text_column_names) != len(dataset_names):
886
+ raise ValueError(
887
+ f"Ensure one text column name is passed for each dataset, got {len(dataset_names)} datasets and"
888
+ f" {len(text_column_names)} text column names."
889
+ )
890
+
891
+ if dataset_samples is not None:
892
+ if len(dataset_samples) != len(dataset_names):
893
+ raise ValueError(
894
+ f"Ensure one sample is passed for each dataset, got {len(dataset_names)} datasets and "
895
+ f"{len(dataset_samples)} samples."
896
+ )
897
+ dataset_samples = [float(ds_sample) for ds_sample in dataset_samples]
898
+ else:
899
+ dataset_samples = [None] * len(dataset_names)
900
+
901
+ text_column_names = (
902
+ text_column_names if text_column_names is not None else ["text" for _ in range(len(dataset_names))]
903
+ )
904
+ splits = splits if splits is not None else [default_split for _ in range(len(dataset_names))]
905
+
906
+ dataset_names_dict = []
907
+ for i, ds_name in enumerate(dataset_names):
908
+ dataset_names_dict.append(
909
+ {
910
+ "name": ds_name,
911
+ "config": dataset_config_names[i],
912
+ "split": splits[i],
913
+ "text_column_name": text_column_names[i],
914
+ "samples": dataset_samples[i],
915
+ }
916
+ )
917
+ return dataset_names_dict
918
+
919
+
920
+ def load_multiple_datasets(
921
+ dataset_names: Union[List, str],
922
+ dataset_config_names: Union[List, str],
923
+ splits: Optional[Union[List, str]] = None,
924
+ text_column_names: Optional[List] = None,
925
+ sampling_rate: Optional[int] = 16000,
926
+ stopping_strategy: Optional[str] = "first_exhausted",
927
+ dataset_samples: Optional[Union[List, np.array]] = None,
928
+ streaming: bool = True,
929
+ seed: int = None,
930
+ **kwargs,
931
+ ) -> IterableDataset:
932
+ dataset_names_dict = convert_dataset_str_to_list(
933
+ dataset_names, dataset_config_names, splits, text_column_names, dataset_samples
934
+ )
935
+
936
+ if dataset_samples is not None:
937
+ dataset_samples = [ds_dict["samples"] for ds_dict in dataset_names_dict]
938
+ probabilities = np.array(dataset_samples) / np.sum(dataset_samples)
939
+ else:
940
+ probabilities = None
941
+
942
+ if len(dataset_names_dict) == 1:
943
+ dataset_dict = dataset_names_dict[0]
944
+ # we have a single dataset so just return it as is
945
+ return load_dataset(
946
+ dataset_dict["name"],
947
+ dataset_dict["config"],
948
+ split=dataset_dict["split"],
949
+ streaming=streaming,
950
+ **kwargs,
951
+ )
952
+
953
+ all_datasets = []
954
+ # iterate over the datasets we want to interleave
955
+ for dataset_dict in tqdm(dataset_names_dict, desc="Combining datasets..."):
956
+ dataset = load_dataset(
957
+ dataset_dict["name"],
958
+ dataset_dict["config"],
959
+ split=dataset_dict["split"],
960
+ streaming=streaming,
961
+ **kwargs,
962
+ )
963
+ # resample to specified sampling rate
964
+ dataset = dataset.cast_column("audio", datasets.features.Audio(sampling_rate))
965
+ dataset = dataset.remove_columns(
966
+ set(dataset.features.keys()) - {"audio", dataset_dict["text_column_name"], "whisper_transcript"}
967
+ )
968
+ all_datasets.append(dataset)
969
+
970
+ if streaming:
971
+ interleaved_dataset = interleave_datasets(
972
+ all_datasets,
973
+ stopping_strategy=stopping_strategy,
974
+ probabilities=probabilities,
975
+ seed=seed,
976
+ )
977
+ else:
978
+ interleaved_dataset = concatenate_datasets(all_datasets)
979
+
980
+ return interleaved_dataset
981
+
982
+
983
+ def get_layers_to_supervise(student_layers: int, teacher_layers: int) -> dict:
984
+ """Helper function to map the student layer i to the teacher layer j whose output we'd like them to emulate. Used
985
+ for MSE loss terms in distillation (hidden-states and activations). Student layers are paired with teacher layers
986
+ in equal increments, e.g. for a 12-layer model distilled to a 3-layer model, student layer 0 emulates teacher layer
987
+ 3 (such that it behaves like the first 4 teacher layers), student layer 1 emulates teacher layer 7, and student layer
988
+ 2 emulates teacher layer 11. This mapping is summarised by the dictionary: {0: 3, 1: 7, 2: 11}, which is precisely
989
+ the output of this function for the arguments (student_layers=3, teacher_layers=12)."""
990
+ layer_intervals = np.linspace(teacher_layers // student_layers - 1, teacher_layers - 1, student_layers, dtype=int)
991
+ layer_intervals[-1] = teacher_layers - 1
992
+ layer_map = {}
993
+
994
+ for student_layer, teacher_layer in enumerate(layer_intervals):
995
+ layer_map[student_layer] = teacher_layer
996
+
997
+ return layer_map
998
+
999
+
1000
+ class FlaxWhisperFeatureExtractor(WhisperFeatureExtractor):
1001
+ def _np_extract_fbank_features(self, waveform: np.array) -> np.ndarray:
1002
+ """
1003
+ Compute the log-mel spectrogram of the provided audio using torch filters. Using the torch implementation
1004
+ computes stft filter banks approx 5x faster than its numpy counterpart, which is the native implementation
1005
+ in transformers, and matches to within 1e-5 abs tolerance.
1006
+ """
1007
+ waveform = torch.from_numpy(waveform).type(torch.float32)
1008
+
1009
+ window = torch.hann_window(self.n_fft)
1010
+ stft = torch.stft(waveform, self.n_fft, self.hop_length, window=window, return_complex=True)
1011
+ magnitudes = stft[..., :-1].abs() ** 2
1012
+
1013
+ mel_filters = torch.from_numpy(self.mel_filters).type(torch.float32)
1014
+ mel_spec = mel_filters.T @ magnitudes
1015
+
1016
+ log_spec = torch.clamp(mel_spec, min=1e-10).log10()
1017
+ log_spec = torch.maximum(log_spec, log_spec.max() - 8.0)
1018
+ log_spec = (log_spec + 4.0) / 4.0
1019
+ return log_spec.numpy()
1020
+
1021
+
1022
+ def main():
1023
+ # 1. Parse input arguments
1024
+ # See all possible arguments in src/transformers/training_args.py
1025
+ # or by passing the --help flag to this script.
1026
+ # We now keep distinct sets of args, for a cleaner separation of concerns.
1027
+ parser = HfArgumentParser((ModelArguments, DataTrainingArguments, FlaxSeq2SeqTrainingArguments))
1028
+
1029
+ if len(sys.argv) == 2 and sys.argv[1].endswith(".json"):
1030
+ # If we pass only one argument to the script and it's the path to a json file,
1031
+ # let's parse it to get our arguments.
1032
+ model_args, data_args, training_args = parser.parse_json_file(json_file=os.path.abspath(sys.argv[1]))
1033
+ else:
1034
+ model_args, data_args, training_args = parser.parse_args_into_dataclasses()
1035
+
1036
+ # Sending telemetry. Tracking the example usage helps us better allocate resources to maintain them. The
1037
+ # information sent is the one passed as arguments along with your JAX/Flax versions.
1038
+ send_example_telemetry("run_flax_speech_recognition_seq2seq", model_args, data_args, framework="flax")
1039
+
1040
+ # 2. Define remote logging - do this early so that we get the full traceback on our remote logs
1041
+ # Enable tensorboard only on the master node
1042
+ has_tensorboard = is_tensorboard_available()
1043
+ if has_tensorboard:
1044
+ if jax.process_index() == 0:
1045
+ try:
1046
+ from flax.metrics.tensorboard import SummaryWriter
1047
+
1048
+ summary_writer = SummaryWriter(log_dir=os.path.join(Path(training_args.output_dir), "runs"))
1049
+ except ImportError as ie:
1050
+ has_tensorboard = False
1051
+ logger.warning(
1052
+ "Unable to display metrics through TensorBoard because some package" f" are not installed: {ie}"
1053
+ )
1054
+ else:
1055
+ logger.warning(
1056
+ "Unable to display metrics through TensorBoard because the package is not"
1057
+ " installed: Please run `pip install tensorboard` to enable."
1058
+ )
1059
+
1060
+ # Enable wandb only on the master node
1061
+ has_wandb = is_wandb_available()
1062
+ if has_wandb:
1063
+ import wandb as wandb_logger
1064
+
1065
+ # Set up wandb run
1066
+ if jax.process_index() == 0:
1067
+ wandb_logger.init(
1068
+ project=data_args.wandb_project,
1069
+ name=data_args.wandb_name,
1070
+ job_type=data_args.wandb_job_type,
1071
+ dir=data_args.wandb_dir,
1072
+ save_code=data_args.save_code_to_wandb,
1073
+ )
1074
+ else:
1075
+ logger.warning("Wandb logging requires wandb to be installed. Run `pip install wandb` to enable.")
1076
+
1077
+ # 3. Setup local logging
1078
+ # Make one log on every process with the configuration for debugging.
1079
+ logging.basicConfig(
1080
+ format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
1081
+ datefmt="%m/%d/%Y %H:%M:%S",
1082
+ handlers=[logging.StreamHandler(sys.stdout)],
1083
+ )
1084
+ # Set the verbosity to info of the Transformers logger.
1085
+ # We only want one process per machine to log things on the screen.
1086
+ logger.setLevel(logging.INFO if jax.process_index() == 0 else logging.ERROR)
1087
+ if jax.process_index() == 0:
1088
+ datasets.utils.logging.set_verbosity_warning()
1089
+ transformers.utils.logging.set_verbosity_info()
1090
+ else:
1091
+ datasets.utils.logging.set_verbosity_error()
1092
+ transformers.utils.logging.set_verbosity_error()
1093
+
1094
+ logger.info("Training/evaluation parameters %s", training_args)
1095
+
1096
+ # Check the output dir is valid
1097
+ if (
1098
+ os.path.exists(training_args.output_dir)
1099
+ and os.listdir(training_args.output_dir)
1100
+ and training_args.do_train
1101
+ and not training_args.overwrite_output_dir
1102
+ ):
1103
+ raise ValueError(
1104
+ f"Output directory ({training_args.output_dir}) already exists and is not"
1105
+ " empty. Use `--overwrite_output_dir` to overcome."
1106
+ )
1107
+
1108
+ # 4. Handle the repository creation
1109
+ if training_args.push_to_hub:
1110
+ if training_args.hub_model_id is None:
1111
+ repo_name = get_full_repo_name(
1112
+ Path(training_args.output_dir).absolute().name,
1113
+ token=training_args.hub_token,
1114
+ )
1115
+ else:
1116
+ repo_name = training_args.hub_model_id
1117
+ create_repo(repo_name, exist_ok=True, token=training_args.hub_token)
1118
+ repo = Repository(
1119
+ training_args.output_dir,
1120
+ clone_from=repo_name,
1121
+ token=training_args.hub_token,
1122
+ )
1123
+
1124
+ if training_args.compilation_cache:
1125
+ cc.initialize_cache(os.path.join(model_args.cache_dir, "jax_cache"))
1126
+
1127
+ # 5. Load dataset
1128
+ raw_datasets = IterableDatasetDict() if data_args.streaming else DatasetDict()
1129
+
1130
+ # set seed for determinism
1131
+ set_seed(training_args.seed)
1132
+
1133
+ if training_args.do_train:
1134
+ raw_datasets["train"] = load_multiple_datasets(
1135
+ data_args.train_dataset_name,
1136
+ data_args.train_dataset_config_name,
1137
+ splits=data_args.train_split_name,
1138
+ streaming=data_args.streaming,
1139
+ dataset_samples=data_args.train_dataset_samples,
1140
+ seed=training_args.seed,
1141
+ cache_dir=data_args.dataset_cache_dir,
1142
+ token=True if model_args.use_auth_token else None,
1143
+ )
1144
+
1145
+ if training_args.do_eval:
1146
+ dataset_names_dict = convert_dataset_str_to_list(
1147
+ data_args.eval_dataset_name if data_args.eval_dataset_name else data_args.train_dataset_name,
1148
+ (
1149
+ data_args.eval_dataset_config_name
1150
+ if data_args.eval_dataset_config_name
1151
+ else data_args.train_dataset_config_name
1152
+ ),
1153
+ splits=data_args.eval_split_name,
1154
+ text_column_names=data_args.eval_text_column_name,
1155
+ )
1156
+ all_eval_splits = []
1157
+ if len(dataset_names_dict) == 1:
1158
+ # load a single eval set
1159
+ dataset_dict = dataset_names_dict[0]
1160
+ all_eval_splits.append("eval")
1161
+ raw_datasets["eval"] = load_dataset(
1162
+ dataset_dict["name"],
1163
+ dataset_dict["config"],
1164
+ split=dataset_dict["split"],
1165
+ cache_dir=data_args.dataset_cache_dir,
1166
+ token=True if model_args.use_auth_token else None,
1167
+ streaming=data_args.streaming,
1168
+ )
1169
+ else:
1170
+ # load multiple eval sets
1171
+ for dataset_dict in dataset_names_dict:
1172
+ if dataset_dict["name"] == "esb/diagnostic-dataset":
1173
+ # for the ESB diagnostic dataset, the dataset name is effectively the config
1174
+ pretty_name = f"{dataset_dict['config']}-diagnostic/{dataset_dict['split']}"
1175
+ else:
1176
+ pretty_name = f"{dataset_dict['name'].split('/')[-1]}/{dataset_dict['split'].replace('.', '-')}"
1177
+ all_eval_splits.append(pretty_name)
1178
+ raw_datasets[pretty_name] = load_dataset(
1179
+ dataset_dict["name"],
1180
+ dataset_dict["config"],
1181
+ split=dataset_dict["split"],
1182
+ cache_dir=data_args.dataset_cache_dir,
1183
+ token=True if model_args.use_auth_token else None,
1184
+ streaming=data_args.streaming,
1185
+ )
1186
+ features = raw_datasets[pretty_name].features.keys()
1187
+ if "text" not in features:
1188
+ raw_datasets[pretty_name] = raw_datasets[pretty_name].rename_column(
1189
+ dataset_dict["text_column_name"], "text"
1190
+ )
1191
+ raw_datasets[pretty_name] = raw_datasets[pretty_name].remove_columns(
1192
+ set(raw_datasets[pretty_name].features.keys()) - {"audio", "text"}
1193
+ )
1194
+
1195
+ if not training_args.do_train and not training_args.do_eval:
1196
+ raise ValueError(
1197
+ "Cannot not train and not do evaluation. At least one of training or evaluation has to be performed."
1198
+ )
1199
+
1200
+ raw_datasets_train_features = list(raw_datasets["train"].features.keys())
1201
+
1202
+ if data_args.audio_column_name not in raw_datasets_train_features:
1203
+ raise ValueError(
1204
+ f"--audio_column_name '{data_args.audio_column_name}' not found in dataset"
1205
+ f" '{data_args.dataset_name}'. Make sure to set `--audio_column_name` to"
1206
+ " the correct audio column - one of"
1207
+ f" {', '.join(raw_datasets_train_features)}."
1208
+ )
1209
+
1210
+ if data_args.train_text_column_name not in raw_datasets_train_features:
1211
+ raise ValueError(
1212
+ f"--train_text_column_name {data_args.train_text_column_name} not found in dataset"
1213
+ f" '{data_args.dataset_name}'. Make sure to set `--train_text_column_name` to the"
1214
+ " correct text column - one of"
1215
+ f" {', '.join(raw_datasets_train_features)}."
1216
+ )
1217
+
1218
+ # 6. Load pretrained model, tokenizer, and feature extractor
1219
+ config = WhisperConfig.from_pretrained(
1220
+ (model_args.config_name if model_args.config_name else model_args.model_name_or_path),
1221
+ cache_dir=model_args.cache_dir,
1222
+ revision=model_args.model_revision,
1223
+ token=True if model_args.use_auth_token else None,
1224
+ )
1225
+ feature_extractor = FlaxWhisperFeatureExtractor.from_pretrained(
1226
+ (model_args.feature_extractor_name if model_args.feature_extractor_name else model_args.model_name_or_path),
1227
+ cache_dir=model_args.cache_dir,
1228
+ revision=model_args.model_revision,
1229
+ token=True if model_args.use_auth_token else None,
1230
+ )
1231
+ tokenizer = WhisperTokenizerFast.from_pretrained(
1232
+ (model_args.tokenizer_name if model_args.tokenizer_name else model_args.model_name_or_path),
1233
+ cache_dir=model_args.cache_dir,
1234
+ use_fast=model_args.use_fast_tokenizer,
1235
+ revision=model_args.model_revision,
1236
+ token=True if model_args.use_auth_token else None,
1237
+ )
1238
+
1239
+ # override timestamp tokens until tokenizer issues are fixed in transformers
1240
+ timestamps = [AddedToken("<|%.2f|>" % (i * 0.02), lstrip=False, rstrip=False) for i in range(1500 + 1)]
1241
+ tokenizer.add_tokens(timestamps)
1242
+
1243
+ config.update(
1244
+ {
1245
+ "activation_dropout": model_args.activation_dropout,
1246
+ "attention_dropout": model_args.attention_dropout,
1247
+ "dropout": model_args.dropout,
1248
+ }
1249
+ )
1250
+
1251
+ if training_args.precision == "full_mixed":
1252
+ # forward pass, backward pass and optimiser states in bf16
1253
+ dtype = jnp.bfloat16
1254
+ to_dtype = to_bf16
1255
+ elif training_args.precision == "half_mixed" or model_args.dtype == "bfloat16":
1256
+ # forward pass in bf16, backward pass and optimiser states in fp32
1257
+ dtype = jnp.bfloat16
1258
+ to_dtype = to_fp32
1259
+ else:
1260
+ if training_args.precision != "full":
1261
+ raise ValueError(
1262
+ f"`precision` should be one of: `full`, `half_mixed` or `full_mixed`, got {training_args.precision}"
1263
+ )
1264
+ # forward pass, backward pass and optimiser states in fp32
1265
+ dtype = jnp.float32
1266
+ to_dtype = to_fp32
1267
+
1268
+ student_model, student_params = FlaxWhisperForConditionalGeneration.from_pretrained(
1269
+ model_args.model_name_or_path,
1270
+ config=config,
1271
+ dtype=dtype,
1272
+ cache_dir=model_args.cache_dir,
1273
+ revision=model_args.model_revision,
1274
+ subfolder=model_args.subfolder,
1275
+ token=True if model_args.use_auth_token else None,
1276
+ _do_init=False,
1277
+ use_scan=model_args.load_with_scan_weights,
1278
+ )
1279
+
1280
+ teacher_model, teacher_params = FlaxWhisperForConditionalGeneration.from_pretrained(
1281
+ model_args.teacher_model_name_or_path,
1282
+ # config=config,
1283
+ dtype=dtype,
1284
+ cache_dir=model_args.cache_dir,
1285
+ # revision=model_args.model_revision,
1286
+ token=True if model_args.use_auth_token else None,
1287
+ _do_init=False,
1288
+ )
1289
+
1290
+ if student_model.config.decoder_start_token_id is None or teacher_model.config.decoder_start_token_id is None:
1291
+ raise ValueError(
1292
+ f"Make sure that `config.decoder_start_token_id` is correctly defined for both the "
1293
+ f"student and teacher model. Got {student_model.config.decoder_start_token_id} for the "
1294
+ f"student and {teacher_model.config.decoder_start_token_id} for the teacher."
1295
+ )
1296
+
1297
+ # enable scan / gradient checkpointing if necessary
1298
+ if training_args.use_scan:
1299
+ student_model.enable_scan() # to enable scan in the nn.Module
1300
+ student_params = student_model.convert_unroll_to_scan(student_params) # to convert the unrolled params to scan
1301
+
1302
+ teacher_model.enable_scan() # faster compile time (even though we don't train the teacher)
1303
+ teacher_params = teacher_model.convert_unroll_to_scan(teacher_params)
1304
+
1305
+ if training_args.gradient_checkpointing:
1306
+ student_model.enable_gradient_checkpointing() # to enable checkpointing in the nn.Module, there is no change to the params structure
1307
+ teacher_model.enable_gradient_checkpointing()
1308
+
1309
+ if hasattr(teacher_model.generation_config, "is_multilingual") and teacher_model.generation_config.is_multilingual:
1310
+ # We need to set the language and task ids for previously multilingual checkpoints - for now we hardcode this to Norwegian
1311
+ tokenizer.set_prefix_tokens(language="Norwegian", task="transcribe", predict_timestamps=False)
1312
+ student_model.generation_config.update(
1313
+ **{
1314
+ "language": "<|no|>",
1315
+ "task": "transcribe",
1316
+ }
1317
+ )
1318
+
1319
+ # 7. Resample speech dataset: `datasets` takes care of automatically loading and resampling the audio,
1320
+ # so we just need to set the correct target sampling rate.
1321
+ raw_datasets = raw_datasets.cast_column(
1322
+ data_args.audio_column_name,
1323
+ datasets.features.Audio(sampling_rate=feature_extractor.sampling_rate),
1324
+ )
1325
+
1326
+ # 8. Preprocessing the datasets.
1327
+ # We need to read the audio files as arrays and tokenize the targets.
1328
+ max_input_length = int(data_args.max_duration_in_seconds * feature_extractor.sampling_rate)
1329
+ min_input_length = int(data_args.min_duration_in_seconds * feature_extractor.sampling_rate)
1330
+ max_label_length = (
1331
+ data_args.max_label_length if data_args.max_label_length is not None else student_model.config.max_length
1332
+ )
1333
+ audio_column_name = data_args.audio_column_name
1334
+ num_workers = data_args.preprocessing_num_workers
1335
+ dataloader_num_workers = training_args.dataloader_num_workers
1336
+ dataloader_prefetch_size = data_args.prefetch_size
1337
+ train_text_column_name = data_args.train_text_column_name
1338
+ eval_text_column_name = "text"
1339
+ model_input_name = feature_extractor.model_input_names[0]
1340
+ normalizer = BasicTextNormalizer(tokenizer.english_spelling_normalizer)
1341
+ wer_threshold = data_args.wer_threshold
1342
+ round_timestamps = data_args.round_timestamps
1343
+
1344
+ if training_args.do_train and data_args.max_train_samples is not None:
1345
+ raw_datasets["train"] = (
1346
+ raw_datasets["train"].take(data_args.max_train_samples)
1347
+ if data_args.streaming
1348
+ else raw_datasets["train"].select(range(data_args.max_train_samples))
1349
+ )
1350
+
1351
+ if training_args.do_eval and data_args.max_eval_samples is not None:
1352
+ for eval_split in all_eval_splits:
1353
+ raw_datasets[eval_split] = (
1354
+ raw_datasets[eval_split].take(data_args.max_eval_samples)
1355
+ if data_args.streaming
1356
+ else raw_datasets[eval_split].select(range(data_args.max_eval_samples))
1357
+ )
1358
+
1359
+ # 10.3: filter training data based on WER threshold -> this is KEY to good distillation performance
1360
+ def is_wer_in_range(ground_truth, whisper_transcript):
1361
+ norm_ground_truth = normalizer(ground_truth)
1362
+ if whisper_transcript is not None and whisper_transcript.upper() == whisper_transcript:
1363
+ # filter entirely upper-case transcriptions: these are erroneous generations from large-v3
1364
+ return False
1365
+ elif len(norm_ground_truth) == 0 and len(normalizer(whisper_transcript)) == 0:
1366
+ return True
1367
+ elif len(norm_ground_truth.strip()) > 0 and whisper_transcript is not None and len(normalizer(whisper_transcript).strip()) > 0:
1368
+ norm_whisper_transcript = normalizer(whisper_transcript)
1369
+ wer = 100 * metric.compute(predictions=[norm_whisper_transcript], references=[norm_ground_truth])
1370
+ return wer < wer_threshold
1371
+ else:
1372
+ # filter automatically since we cant know WER
1373
+ return False
1374
+
1375
+
1376
+ filter_by_wer_threshold = partial(
1377
+ raw_datasets["train"].filter,
1378
+ function=is_wer_in_range,
1379
+ input_columns=[eval_text_column_name, train_text_column_name],
1380
+ )
1381
+
1382
+ if wer_threshold is not None:
1383
+ raw_datasets["train"] = (
1384
+ filter_by_wer_threshold(num_proc=num_workers, desc="filtering train dataset by wer")
1385
+ if not data_args.streaming
1386
+ else filter_by_wer_threshold()
1387
+ )
1388
+
1389
+ def has_timestamp_tokens(input_str):
1390
+ """
1391
+ Identify whether the input string contains timestamp tokens, of the form <|0.00|>, by searching for
1392
+ pairs of left and right-angle brackets.
1393
+ """
1394
+ return bool(re.search("\<[^\>]*\>", input_str))
1395
+
1396
+ def round_timestamp_tokens(input_str: str, ndigits: int = 1):
1397
+ timestamps = re.findall("\<[^\>]*\>", input_str, re.DOTALL)
1398
+ for token in timestamps:
1399
+ # extract time digits from timestamp token, e.g. <|6.24|> to 6.24
1400
+ time_digit = token[2:-2]
1401
+ # round to specified number of digits, e.g. 6.24 to 6.2
1402
+ time_digit = round(float(time_digit), ndigits=ndigits)
1403
+ # replace in original string with the same precision, e.g. <|6.24|> to <|6.20|>
1404
+ input_str = input_str.replace(token, "<|{:.2f}|>".format(time_digit))
1405
+ return input_str
1406
+
1407
+ def prepare_train_dataset(batch):
1408
+ # process audio input
1409
+ sample = batch[audio_column_name]
1410
+ inputs = feature_extractor(sample["array"], sampling_rate=sample["sampling_rate"])
1411
+ batch[model_input_name] = inputs.get(model_input_name)[0]
1412
+ batch["input_length"] = len(sample["array"])
1413
+
1414
+ # process text targets
1415
+ input_str = batch[train_text_column_name]
1416
+
1417
+ # prompt & timestamp processing: for now, we only do one or the other
1418
+ if input_str.startswith("<|startoftranscript|>") or input_str.startswith("<|startofprev|>"):
1419
+ # prompted target text already has special ids added, so don't add them here
1420
+ batch["labels"] = tokenizer(input_str, add_special_tokens=False).input_ids
1421
+ return batch
1422
+
1423
+ has_timestamps = has_timestamp_tokens(input_str)
1424
+
1425
+ if has_timestamps:
1426
+ predict_timestamps = bool(np.random.binomial(1, data_args.timestamp_probability))
1427
+ if not predict_timestamps:
1428
+ # filter timestamp token ids if not part of the prediction task
1429
+ input_str = tokenizer._filter_timestamp_ids(input_str)
1430
+ elif round_timestamps:
1431
+ input_str = round_timestamp_tokens(input_str)
1432
+ else:
1433
+ predict_timestamps = False
1434
+
1435
+ tokenizer.set_prefix_tokens(language="Norwegian", task="transcribe", predict_timestamps=predict_timestamps)
1436
+ input_ids = tokenizer(input_str).input_ids
1437
+ batch["labels"] = input_ids
1438
+ return batch
1439
+
1440
+ def prepare_eval_dataset(batch):
1441
+ # process audio
1442
+ sample = batch[audio_column_name]
1443
+ inputs = feature_extractor(sample["array"], sampling_rate=sample["sampling_rate"])
1444
+ # process audio length
1445
+ batch[model_input_name] = inputs.get(model_input_name)[0]
1446
+ batch["input_length"] = len(sample["array"])
1447
+
1448
+ # process targets
1449
+ input_str = batch[eval_text_column_name]
1450
+ batch["labels"] = tokenizer(input_str).input_ids
1451
+ return batch
1452
+
1453
+ vectorized_datasets = IterableDatasetDict() if data_args.streaming else DatasetDict()
1454
+ if training_args.do_train:
1455
+ map_fn_train = partial(
1456
+ raw_datasets["train"].map, function=prepare_train_dataset, remove_columns=raw_datasets_train_features
1457
+ )
1458
+ vectorized_datasets["train"] = (
1459
+ map_fn_train(num_proc=num_workers, desc="preprocess train dataset")
1460
+ if not data_args.streaming
1461
+ else map_fn_train()
1462
+ )
1463
+ if training_args.do_eval:
1464
+ for eval_split in all_eval_splits:
1465
+ raw_datasets_eval_features = list(raw_datasets[eval_split].features.keys())
1466
+ map_fn_eval = partial(
1467
+ raw_datasets[eval_split].map, function=prepare_eval_dataset, remove_columns=raw_datasets_eval_features
1468
+ )
1469
+ vectorized_datasets[eval_split] = (
1470
+ map_fn_eval(num_proc=num_workers, desc="preprocess eval dataset")
1471
+ if not data_args.streaming
1472
+ else map_fn_eval()
1473
+ )
1474
+
1475
+ # filter training data with inputs longer than max_input_length
1476
+ def is_audio_in_length_range(length):
1477
+ return min_input_length < length < max_input_length
1478
+
1479
+ filter_by_audio_fn = partial(
1480
+ vectorized_datasets.filter, function=is_audio_in_length_range, input_columns=["input_length"]
1481
+ )
1482
+ vectorized_datasets = (
1483
+ filter_by_audio_fn(num_proc=num_workers, desc="filtering train dataset by audio length")
1484
+ if not data_args.streaming
1485
+ else filter_by_audio_fn()
1486
+ )
1487
+
1488
+ # filter training data with labels longer than max_label_length
1489
+ def is_labels_in_length_range(labels):
1490
+ return 0 < len(labels) < max_label_length
1491
+
1492
+ filter_by_labels_fn = partial(
1493
+ vectorized_datasets.filter, function=is_labels_in_length_range, input_columns=["labels"]
1494
+ )
1495
+ vectorized_datasets = (
1496
+ filter_by_labels_fn(num_proc=num_workers, desc="filtering train dataset")
1497
+ if not data_args.streaming
1498
+ else filter_by_labels_fn()
1499
+ )
1500
+
1501
+ # for large datasets it is advised to run the preprocessing on a
1502
+ # single machine first with `args.preprocessing_only` since there will mostly likely
1503
+ # be a timeout when running the script in distributed mode.
1504
+ # In a second step `args.preprocessing_only` can then be set to `False` to load the
1505
+ # cached dataset
1506
+ if data_args.preprocessing_only:
1507
+ cache = {k: v.cache_files for k, v in vectorized_datasets.items()}
1508
+ logger.info(f"Data preprocessing finished. Files cached at {cache}.")
1509
+ return
1510
+
1511
+ # 8. Load Metric
1512
+ metric = evaluate.load("wer")
1513
+ # convention is that we space all punctuation *except* apostrophes
1514
+ all_punctuation = list(string.punctuation.replace("'", ""))
1515
+ return_timestamps = data_args.return_timestamps if data_args.timestamp_probability > 0 else False
1516
+
1517
+ def compute_metrics(preds, labels):
1518
+ # replace padded labels by the padding token
1519
+ for idx in range(len(labels)):
1520
+ labels[idx][labels[idx] == -100] = tokenizer.pad_token_id
1521
+
1522
+ pred_str = tokenizer.batch_decode(preds, skip_special_tokens=True, decode_with_timestamps=return_timestamps)
1523
+ # we do not want to group tokens when computing the metrics
1524
+ label_str = tokenizer.batch_decode(labels, skip_special_tokens=True)
1525
+
1526
+ # space punctuation for orthographic WER (c.f. ESB paper https://arxiv.org/abs/2210.13352)
1527
+ spaced_pred_str = [
1528
+ pred_str[i].replace(punctuation, f" {punctuation} ")
1529
+ for punctuation in all_punctuation
1530
+ for i in range(len(pred_str))
1531
+ ]
1532
+ spaced_label_str = [
1533
+ label_str[i].replace(punctuation, f" {punctuation} ")
1534
+ for punctuation in all_punctuation
1535
+ for i in range(len(label_str))
1536
+ ]
1537
+ wer_ortho = 100 * metric.compute(predictions=spaced_pred_str, references=spaced_label_str)
1538
+
1539
+ norm_pred_str, norm_label_str = [], []
1540
+
1541
+ # Iterate through all predictions and labels
1542
+ for pred, label in zip(pred_str, label_str):
1543
+ # Normalize the prediction and label
1544
+ normalized_pred = normalizer(pred)
1545
+ normalized_label = normalizer(label)
1546
+
1547
+ # If either normalized string is empty after normalization, replace with "<|nospeech|>"
1548
+ if not normalized_pred.strip():
1549
+ normalized_pred = "<|nospeech|>"
1550
+ if not normalized_label.strip():
1551
+ normalized_label = "<|nospeech|>"
1552
+
1553
+ norm_pred_str.append(normalized_pred)
1554
+ norm_label_str.append(normalized_label)
1555
+
1556
+ # Replace original strings with "<|nocaptions|>" where necessary for consistency
1557
+ pred_str = [pred if len(pred.strip()) > 0 else "<|nospeech|>" for pred in pred_str]
1558
+ label_str = [label if len(label.strip()) > 0 else "<|nospeech|>" for label in label_str]
1559
+
1560
+ # Compute WER using all entries, including those with "<|nocaptions|>"
1561
+ wer = 100 * metric.compute(predictions=norm_pred_str, references=norm_label_str)
1562
+ return {"wer": wer, "wer_ortho": wer_ortho}, pred_str, label_str, norm_pred_str, norm_label_str
1563
+
1564
+
1565
+ # 9. Save feature extractor, tokenizer, config and generation config
1566
+ feature_extractor.save_pretrained(training_args.output_dir)
1567
+ tokenizer.save_pretrained(training_args.output_dir)
1568
+ config.save_pretrained(training_args.output_dir)
1569
+ student_model.generation_config.save_pretrained(
1570
+ training_args.output_dir
1571
+ ) # generation config stays bound to model to make it easy to jit
1572
+
1573
+ processor = WhisperProcessor.from_pretrained(training_args.output_dir)
1574
+
1575
+ data_collator = FlaxDataCollatorSpeechSeq2SeqWithPadding(
1576
+ processor=processor,
1577
+ decoder_start_token_id=student_model.config.decoder_start_token_id, # <|startoftranscript|>
1578
+ decoder_prev_token_id=tokenizer.all_special_ids[-3], # <|startofprev|>
1579
+ input_padding="longest",
1580
+ target_padding="max_length",
1581
+ max_target_length=max_label_length,
1582
+ )
1583
+
1584
+ # Initialize our training
1585
+ rng = jax.random.PRNGKey(training_args.seed)
1586
+ rng, dropout_rng = jax.random.split(rng)
1587
+
1588
+ # Store some constants
1589
+ train_batch_size = int(training_args.per_device_train_batch_size) * jax.device_count()
1590
+ gradient_accumulation_steps = int(training_args.gradient_accumulation_steps)
1591
+ per_device_eval_batch_size = int(training_args.per_device_eval_batch_size)
1592
+ eval_batch_size = per_device_eval_batch_size * jax.device_count()
1593
+
1594
+ if not data_args.streaming and training_args.max_steps < 0:
1595
+ num_epochs = int(training_args.num_train_epochs)
1596
+ steps_per_epoch = len(vectorized_datasets["train"]) // train_batch_size
1597
+ total_train_steps = steps_per_epoch * num_epochs
1598
+ elif training_args.max_steps > 0:
1599
+ logger.info("max_steps is given, it will override any value given in num_train_epochs")
1600
+ total_train_steps = int(training_args.max_steps)
1601
+ # Setting a very large number of epochs so we go as many times as necessary over the iterator.
1602
+ num_epochs = sys.maxsize
1603
+ steps_per_epoch = total_train_steps
1604
+ else:
1605
+ raise ValueError("max_steps must be specified when training with a streaming (iterable) dataset")
1606
+
1607
+ if training_args.eval_steps is None:
1608
+ logger.info(
1609
+ f"eval_steps is not set, evaluating at the end of {'each epoch' if not data_args.streaming else 'training'}"
1610
+ )
1611
+ eval_steps = steps_per_epoch
1612
+ else:
1613
+ eval_steps = training_args.eval_steps
1614
+
1615
+ # Create learning rate schedule
1616
+ linear_decay_lr_schedule_fn = create_learning_rate_fn(
1617
+ total_train_steps * gradient_accumulation_steps,
1618
+ training_args.lr_scheduler_type,
1619
+ training_args.warmup_steps * gradient_accumulation_steps,
1620
+ training_args.learning_rate,
1621
+ )
1622
+
1623
+ # We use Optax's "masking" functionality to not apply weight decay
1624
+ # to bias and LayerNorm scale parameters. decay_mask_fn returns a
1625
+ # mask boolean with the same structure as the parameters.
1626
+ # The mask is True for parameters that should be decayed.
1627
+ def decay_mask_fn(params):
1628
+ flat_params = traverse_util.flatten_dict(params)
1629
+ # find out all LayerNorm parameters
1630
+ layer_norm_candidates = [
1631
+ "layer_norm",
1632
+ "self_attn_layer_norm",
1633
+ "final_layer_norm",
1634
+ "encoder_attn_layer_norm",
1635
+ ]
1636
+ layer_norm_named_params = {
1637
+ layer[-2:]
1638
+ for layer_norm_name in layer_norm_candidates
1639
+ for layer in flat_params.keys()
1640
+ if layer_norm_name in "".join(layer).lower()
1641
+ }
1642
+ flat_mask = {path: path[-1] != "bias" and path[-2:] not in layer_norm_named_params for path in flat_params}
1643
+ return traverse_util.unflatten_dict(flat_mask)
1644
+
1645
+ # create adam optimizer
1646
+ adamw = optax.adamw(
1647
+ learning_rate=linear_decay_lr_schedule_fn,
1648
+ b1=training_args.adam_beta1,
1649
+ b2=training_args.adam_beta2,
1650
+ eps=training_args.adam_epsilon,
1651
+ weight_decay=training_args.weight_decay,
1652
+ mask=decay_mask_fn,
1653
+ )
1654
+
1655
+ if gradient_accumulation_steps > 1:
1656
+ # accumulate gradients and apply once every k steps
1657
+ adamw = optax.MultiSteps(adamw, every_k_schedule=gradient_accumulation_steps)
1658
+
1659
+ share_hidden_states = training_args.freeze_encoder and student_model.config.d_model == teacher_model.config.d_model
1660
+ encoder_layer_mapping = get_layers_to_supervise(
1661
+ student_model.config.encoder_layers, teacher_model.config.encoder_layers
1662
+ )
1663
+ decoder_layer_mapping = get_layers_to_supervise(
1664
+ student_model.config.decoder_layers, teacher_model.config.decoder_layers
1665
+ )
1666
+
1667
+ # Setup train state
1668
+ student_state = TrainState.create(
1669
+ apply_fn=student_model.decode if share_hidden_states else student_model.__call__,
1670
+ params=student_params,
1671
+ tx=adamw,
1672
+ to_dtype=to_dtype,
1673
+ dropout_rng=dropout_rng,
1674
+ max_grad_norm=training_args.max_grad_norm,
1675
+ )
1676
+
1677
+ if training_args.resume_from_checkpoint is not None:
1678
+ if os.path.isfile(os.path.join(training_args.resume_from_checkpoint, "train_state.msgpack")):
1679
+ logger.info(
1680
+ f"Checkpoint detected, resuming training at {training_args.resume_from_checkpoint}. To avoid "
1681
+ "this behavior, omit the resume_from_checkpoint argument."
1682
+ )
1683
+ with Path(os.path.join(training_args.resume_from_checkpoint, "train_state.msgpack")).open("rb") as f:
1684
+ student_state = from_bytes(student_state, f.read())
1685
+ else:
1686
+ logger.warning(
1687
+ f"Checkpoint {training_args.resume_from_checkpoint} not detected, training from scratch. Ensure "
1688
+ f"you pass the path to a folder with a valid checkpoint for your model."
1689
+ )
1690
+
1691
+ def cross_entropy_loss(logits, labels):
1692
+ vocab_size = logits.shape[-1]
1693
+ # optax onehot always returns a float32 device array, need to downcast if performing mixed precision training
1694
+ onehot_targets = to_dtype(onehot(labels, vocab_size))
1695
+ loss = optax.softmax_cross_entropy(logits, onehot_targets)
1696
+ # ignore padded tokens from loss, i.e. where labels are not set to -100
1697
+ padding = labels >= 0
1698
+ loss = loss * padding
1699
+ loss = loss.sum()
1700
+ num_labels = padding.sum()
1701
+ return loss, num_labels
1702
+
1703
+ # temperature smoothed kl-divergence
1704
+ def kl_divergence(target_distribution, log_predicted_distribution, labels, eps=1e-20):
1705
+ divergence = -target_distribution * (log_predicted_distribution - jnp.log(target_distribution + eps))
1706
+ # ignore padded tokens from divergence, i.e. where labels are not set to -100
1707
+ padding_mask = labels >= 0
1708
+ padding_mask = jnp.expand_dims(padding_mask, axis=-1)
1709
+ divergence = (divergence * padding_mask).sum()
1710
+ return to_dtype(divergence) # respect the dtype of the backprop
1711
+
1712
+ def mean_square_error_loss(student_outputs, teacher_outputs):
1713
+ mse = dtype(0.0)
1714
+
1715
+ # tie encoder embeddings
1716
+ mse += jnp.mean(
1717
+ jnp.square(teacher_outputs.encoder_hidden_states[0] - student_outputs.encoder_hidden_states[0])
1718
+ )
1719
+
1720
+ for student_layer_id, teacher_layer_id in encoder_layer_mapping.items():
1721
+ # offset the hidden-state layer ids by 1 to account for the extra embedding hidden-state
1722
+ student_hidden_state = student_outputs.encoder_hidden_states[student_layer_id + 1]
1723
+ teacher_hidden_state = teacher_outputs.encoder_hidden_states[teacher_layer_id + 1]
1724
+ mse += jnp.mean(jnp.square(teacher_hidden_state - student_hidden_state))
1725
+
1726
+ # student_attention = student_outputs.encoder_attentions[student_layer_id]
1727
+ # teacher_attention = teacher_outputs.encoder_attentions[teacher_layer_id]
1728
+ # mse += jnp.mean(jnp.square(student_attention - teacher_attention))
1729
+
1730
+ # tie decoder embeddings
1731
+ mse += jnp.mean(
1732
+ jnp.square(teacher_outputs.decoder_hidden_states[0] - student_outputs.decoder_hidden_states[0])
1733
+ )
1734
+
1735
+ for student_layer_id, teacher_layer_id in decoder_layer_mapping.items():
1736
+ # offset the hidden-state layer ids by 1 to account for the extra embedding hidden-state
1737
+ student_hidden_state = student_outputs.decoder_hidden_states[student_layer_id + 1]
1738
+ teacher_hidden_state = teacher_outputs.decoder_hidden_states[teacher_layer_id + 1]
1739
+ mse += jnp.mean(jnp.square(teacher_hidden_state - student_hidden_state))
1740
+
1741
+ # student_attention = student_outputs.decoder_attentions[student_layer_id]
1742
+ # teacher_attention = teacher_outputs.decoder_attentions[teacher_layer_id]
1743
+ # mse += jnp.mean(jnp.square(student_attention - teacher_attention))
1744
+
1745
+ # student_cross_attention = student_outputs.cross_attentions[student_layer_id]
1746
+ # teacher_cross_attention = teacher_outputs.cross_attentions[teacher_layer_id]
1747
+ # mse += jnp.mean(jnp.square(student_cross_attention - teacher_cross_attention))
1748
+
1749
+ return to_dtype(mse) # respect the dtype of the backprop
1750
+
1751
+ # Define gradient update step fn
1752
+ def train_step(
1753
+ student_state,
1754
+ teacher_params,
1755
+ batch,
1756
+ freeze_encoder,
1757
+ share_hidden_states,
1758
+ temperature=2.0,
1759
+ ):
1760
+ dropout_rng, new_dropout_rng = jax.random.split(student_state.dropout_rng)
1761
+
1762
+ def compute_loss(student_params):
1763
+ labels = batch.pop("labels")
1764
+ output_hidden_states = not share_hidden_states and training_args.mse_weight > 0.0
1765
+
1766
+ teacher_outputs = teacher_model(
1767
+ **batch,
1768
+ params=teacher_params,
1769
+ freeze_encoder=True,
1770
+ output_hidden_states=output_hidden_states,
1771
+ train=False,
1772
+ )
1773
+
1774
+ if share_hidden_states:
1775
+ # if the student and teacher share the same frozen encoder then we don't have to recompute the
1776
+ # encoder hidden-states for the student model, we can just re-use from the teacher
1777
+ encoder_hidden_states = jax.lax.stop_gradient(teacher_outputs.encoder_last_hidden_state)
1778
+ encoder_outputs = FlaxBaseModelOutput(last_hidden_state=encoder_hidden_states)
1779
+
1780
+ student_outputs = student_state.apply_fn(
1781
+ decoder_input_ids=batch["decoder_input_ids"],
1782
+ encoder_outputs=encoder_outputs,
1783
+ params=student_params,
1784
+ dropout_rng=dropout_rng,
1785
+ train=True,
1786
+ )
1787
+ else:
1788
+ # do the full forward pass for the student model (encoder + decoder)
1789
+ student_outputs = student_state.apply_fn(
1790
+ **batch,
1791
+ params=student_params,
1792
+ dropout_rng=dropout_rng,
1793
+ freeze_encoder=freeze_encoder,
1794
+ output_hidden_states=output_hidden_states,
1795
+ train=True,
1796
+ )
1797
+
1798
+ # CE (data) loss
1799
+ ce_loss, num_labels = cross_entropy_loss(student_outputs.logits, labels)
1800
+
1801
+ # rescale by temperature to ensure gradients scale correctly
1802
+ teacher_distribution = jax.nn.softmax(teacher_outputs.logits / temperature, axis=-1)
1803
+ # ensure no information flow backwards through teacher
1804
+ teacher_distribution = jax.lax.stop_gradient(teacher_distribution)
1805
+ # log softmax of student predictions for numerical stability
1806
+ student_distribution = jax.nn.log_softmax(student_outputs.logits / temperature, axis=-1)
1807
+ # KL-divergence loss (scaled by temperature)
1808
+ kl_loss = kl_divergence(teacher_distribution, student_distribution, labels) * temperature**2
1809
+
1810
+ # MSE loss between enc-dec hidden-states and attentions
1811
+ mse_loss = (
1812
+ mean_square_error_loss(student_outputs, teacher_outputs)
1813
+ if output_hidden_states
1814
+ else jnp.zeros_like(kl_loss)
1815
+ )
1816
+
1817
+ # use DistilBart formulation - only tune the MSE weight and take remaining HPs from DistilBERT
1818
+ ce_weight = 0.8 if training_args.kl_weight > 0 else 1.0
1819
+ loss = ce_weight * ce_loss + training_args.kl_weight * kl_loss + training_args.mse_weight * mse_loss
1820
+
1821
+ return loss, (
1822
+ ce_loss,
1823
+ kl_loss,
1824
+ mse_loss,
1825
+ num_labels,
1826
+ )
1827
+
1828
+ grad_fn = jax.value_and_grad(compute_loss, has_aux=True)
1829
+ (loss, (ce_loss, kl_loss, mse_loss, num_labels)), grad = grad_fn(to_dtype(student_state.params))
1830
+
1831
+ # true loss = total loss / total samples
1832
+ loss = jax.lax.psum(loss, "batch")
1833
+ num_labels = jax.lax.psum(num_labels, "batch")
1834
+ loss = jax.tree_util.tree_map(lambda x: x / num_labels, loss)
1835
+
1836
+ # true grad = total grad / total samples
1837
+ grad = jax.lax.psum(grad, "batch")
1838
+ grad = jax.tree_util.tree_map(lambda x: x / num_labels, grad)
1839
+ new_state = student_state.apply_gradients(grads=grad, dropout_rng=new_dropout_rng, to_dtype=to_dtype)
1840
+
1841
+ # CE/KL/MSE losses for logging
1842
+ ce_loss = jax.lax.psum(ce_loss, "batch")
1843
+ ce_loss = jax.tree_util.tree_map(lambda x: x / num_labels, ce_loss)
1844
+
1845
+ kl_loss = jax.lax.psum(kl_loss, "batch")
1846
+ kl_loss = jax.tree_util.tree_map(lambda x: x / num_labels, kl_loss)
1847
+
1848
+ mse_loss = jax.lax.psum(mse_loss, "batch")
1849
+ mse_loss = jax.tree_util.tree_map(lambda x: x / num_labels, mse_loss)
1850
+
1851
+ metrics = {
1852
+ "loss": loss,
1853
+ "learning_rate": linear_decay_lr_schedule_fn(student_state.step),
1854
+ "ce_loss": ce_loss,
1855
+ "kl_loss": kl_loss,
1856
+ "mse_loss": mse_loss,
1857
+ }
1858
+ return new_state, metrics
1859
+
1860
+ # Define eval fn
1861
+ def eval_step(student_params, teacher_params, batch):
1862
+ labels = batch.pop("labels")
1863
+ output_hidden_states = not share_hidden_states and training_args.mse_weight > 0
1864
+
1865
+ student_outputs = student_model(
1866
+ **batch,
1867
+ params=student_params,
1868
+ output_hidden_states=output_hidden_states,
1869
+ train=False,
1870
+ )
1871
+ student_distribution = jax.nn.log_softmax(student_outputs.logits, axis=-1)
1872
+ ce_loss, num_labels = cross_entropy_loss(student_outputs.logits, labels)
1873
+
1874
+ teacher_outputs = teacher_model(
1875
+ **batch,
1876
+ params=teacher_params,
1877
+ output_hidden_states=output_hidden_states,
1878
+ train=False,
1879
+ )
1880
+ teacher_distribution = jax.nn.softmax(teacher_outputs.logits, axis=-1)
1881
+ # temperature is always 1 for eval
1882
+ kl_loss = kl_divergence(teacher_distribution, student_distribution, labels)
1883
+
1884
+ mse_loss = (
1885
+ mean_square_error_loss(student_outputs, teacher_outputs)
1886
+ if output_hidden_states
1887
+ else jnp.zeros_like(kl_loss)
1888
+ )
1889
+
1890
+ ce_weight = 0.8 if training_args.kl_weight > 0 else 1.0
1891
+ loss = ce_weight * ce_loss + training_args.kl_weight * kl_loss + training_args.mse_weight * mse_loss
1892
+ # true loss = total loss / total samples
1893
+ loss = jax.lax.psum(loss, "batch")
1894
+ num_labels = jax.lax.psum(num_labels, "batch")
1895
+ loss = jax.tree_util.tree_map(lambda x: x / num_labels, loss)
1896
+
1897
+ # CE/KL/MSE losses for logging
1898
+ ce_loss = jax.lax.psum(ce_loss, "batch")
1899
+ ce_loss = jax.tree_util.tree_map(lambda x: x / num_labels, ce_loss)
1900
+
1901
+ kl_loss = jax.lax.psum(kl_loss, "batch")
1902
+ kl_loss = jax.tree_util.tree_map(lambda x: x / num_labels, kl_loss)
1903
+
1904
+ mse_loss = jax.lax.psum(mse_loss, "batch")
1905
+ mse_loss = jax.tree_util.tree_map(lambda x: x / num_labels, mse_loss)
1906
+
1907
+ metrics = {"loss": loss, "ce_loss": ce_loss, "kl_loss": kl_loss, "mse_loss": mse_loss}
1908
+ return metrics
1909
+
1910
+ # Define generation function
1911
+ num_beams = (
1912
+ training_args.generation_num_beams
1913
+ if training_args.generation_num_beams is not None
1914
+ else student_model.config.num_beams
1915
+ )
1916
+
1917
+ # forcing the language and task tokens helps the model in its generations
1918
+ gen_kwargs = {
1919
+ "max_length": max_label_length,
1920
+ "num_beams": num_beams,
1921
+ "language": "<|en|>",
1922
+ "task": "transcribe",
1923
+ "return_timestamps": return_timestamps,
1924
+ }
1925
+
1926
+ def generate_step(student_params, batch):
1927
+ output_ids = student_model.generate(
1928
+ batch[model_input_name],
1929
+ attention_mask=batch.get("attention_mask"),
1930
+ params=student_params,
1931
+ **gen_kwargs,
1932
+ )
1933
+ return output_ids.sequences
1934
+
1935
+ # Replicate the train state on each device
1936
+ student_state = student_state.replicate()
1937
+
1938
+ # Replicate the teacher params on each device
1939
+ teacher_params = jax_utils.replicate(teacher_params)
1940
+
1941
+ # Create parallel version of the train and eval step
1942
+ p_train_step = jax.pmap(
1943
+ train_step,
1944
+ "batch",
1945
+ in_axes=(0, 0, 0, None, None, None),
1946
+ donate_argnums=(0,),
1947
+ static_broadcasted_argnums=(
1948
+ 3,
1949
+ 4,
1950
+ ),
1951
+ )
1952
+ p_eval_step = jax.pmap(eval_step, "batch")
1953
+ p_generate_step = jax.pmap(generate_step, "batch")
1954
+
1955
+ logger.info("***** Running training *****")
1956
+ logger.info(f" Num examples = {total_train_steps * train_batch_size * gradient_accumulation_steps}")
1957
+ logger.info(" Instantaneous batch size per device =" f" {training_args.per_device_train_batch_size}")
1958
+ logger.info(" Gradient accumulation steps =" f" {gradient_accumulation_steps}")
1959
+ logger.info(
1960
+ f" Total train batch size (w. parallel & distributed) = {train_batch_size * gradient_accumulation_steps}"
1961
+ )
1962
+ logger.info(f" Total optimization steps = {total_train_steps}")
1963
+
1964
+ # ======================== Training ================================
1965
+ train_time = 0
1966
+ train_start = time.time()
1967
+ train_metrics = []
1968
+ batches_to_skip = jax.device_get(unreplicate(student_state.step))
1969
+ cur_step = int(batches_to_skip) # will be zero if starting from scratch
1970
+ epochs_trained = batches_to_skip // steps_per_epoch
1971
+ steps_trained_progress_bar = tqdm(range(total_train_steps), desc="Train steps ... ", position=0)
1972
+ steps_trained_progress_bar.update(batches_to_skip)
1973
+ continue_training = True
1974
+ minibatch_steps = 0
1975
+
1976
+ if batches_to_skip > 0:
1977
+ logger.info(" Continuing training from checkpoint, will skip to saved global_step")
1978
+ logger.info(f" Continuing training from epoch {epochs_trained}")
1979
+ logger.info(f" Continuing training from global step {batches_to_skip}")
1980
+
1981
+ # Generate a training data loader by shuffling sampling indices from the train dataset
1982
+ train_loader = get_data_loader(
1983
+ training_args.seed,
1984
+ vectorized_datasets["train"],
1985
+ batch_size=train_batch_size,
1986
+ data_collator=data_collator,
1987
+ dataloader_num_workers=dataloader_num_workers,
1988
+ skip_batches=batches_to_skip,
1989
+ prefetch_size=dataloader_prefetch_size,
1990
+ )
1991
+
1992
+ for epoch in range(epochs_trained, num_epochs):
1993
+ if hasattr(train_loader, "dataset") and isinstance(train_loader.dataset, IterableDataset):
1994
+ train_loader.dataset.set_epoch(epoch)
1995
+
1996
+ for batch in train_loader:
1997
+ minibatch_steps += 1
1998
+ update_step = minibatch_steps == gradient_accumulation_steps
1999
+
2000
+ if update_step:
2001
+ steps_trained_progress_bar.update(1)
2002
+ cur_step += 1
2003
+ minibatch_steps = 0
2004
+
2005
+ batch = shard(batch.data)
2006
+ student_state, train_metric = p_train_step(
2007
+ student_state,
2008
+ teacher_params,
2009
+ batch,
2010
+ training_args.freeze_encoder,
2011
+ share_hidden_states,
2012
+ training_args.temperature,
2013
+ )
2014
+
2015
+ if cur_step % training_args.logging_steps == 0 and update_step:
2016
+ train_metrics.append(train_metric)
2017
+ train_metric_to_write = unreplicate(train_metric)
2018
+ steps_trained_progress_bar.write(
2019
+ f"Step... ({cur_step} / {total_train_steps} | Loss:"
2020
+ f" {train_metric_to_write['loss']}, Learning Rate:"
2021
+ f" {train_metric_to_write['learning_rate']})"
2022
+ )
2023
+ if has_wandb and jax.process_index() == 0:
2024
+ write_wandb_metric(
2025
+ wandb_logger,
2026
+ train_metric_to_write,
2027
+ train_time + time.time() - train_start,
2028
+ cur_step,
2029
+ epoch,
2030
+ prefix="train",
2031
+ )
2032
+
2033
+ # save checkpoint and weights after each save_steps and at the end of training
2034
+ if (cur_step % training_args.save_steps == 0 and update_step) or cur_step == total_train_steps:
2035
+ if jax.process_index() == 0:
2036
+ save_hf_weights(
2037
+ student_state,
2038
+ student_model,
2039
+ processor,
2040
+ training_args.output_dir,
2041
+ cur_step,
2042
+ total_train_steps,
2043
+ use_scan=training_args.use_scan,
2044
+ )
2045
+ if training_args.save_train_state:
2046
+ student_state.save_state(
2047
+ training_args.output_dir, save_total_limit=training_args.save_total_limit
2048
+ )
2049
+ if training_args.push_to_hub:
2050
+ repo.push_to_hub(
2051
+ commit_message=f"Saving train state of step {cur_step}",
2052
+ blocking=False,
2053
+ )
2054
+
2055
+ if training_args.do_eval and (
2056
+ (cur_step % eval_steps == 0 and update_step) or cur_step == total_train_steps
2057
+ ):
2058
+ train_time += time.time() - train_start
2059
+ # ======================== Evaluating ==============================
2060
+ for eval_split in all_eval_splits:
2061
+ eval_metrics = []
2062
+ eval_preds = []
2063
+ eval_labels = []
2064
+ eval_start = time.time()
2065
+
2066
+ eval_loader = get_data_loader(
2067
+ training_args.seed,
2068
+ vectorized_datasets[eval_split],
2069
+ batch_size=eval_batch_size,
2070
+ data_collator=data_collator,
2071
+ shuffle=False,
2072
+ drop_last=False,
2073
+ dataloader_num_workers=dataloader_num_workers,
2074
+ )
2075
+ for batch in tqdm(eval_loader, desc=f"Evaluating {eval_split}...", position=2):
2076
+ # Model forward
2077
+ labels = batch["labels"]
2078
+
2079
+ metrics = pad_shard_unpad(
2080
+ p_eval_step,
2081
+ static_argnums=(
2082
+ 0,
2083
+ 1,
2084
+ ),
2085
+ static_return=True,
2086
+ )(
2087
+ student_state.params,
2088
+ teacher_params,
2089
+ batch.data,
2090
+ min_device_batch=per_device_eval_batch_size,
2091
+ )
2092
+ eval_metrics.append(metrics)
2093
+
2094
+ # generation
2095
+ if training_args.predict_with_generate:
2096
+ generated_ids = pad_shard_unpad(p_generate_step)(
2097
+ student_state.params, batch.data, min_device_batch=per_device_eval_batch_size
2098
+ )
2099
+ eval_preds.extend(jax.device_get(generated_ids.reshape(-1, gen_kwargs["max_length"])))
2100
+ eval_labels.extend(labels)
2101
+
2102
+ eval_time = time.time() - eval_start
2103
+
2104
+ # normalize eval metrics
2105
+ eval_metrics = get_metrics(eval_metrics)
2106
+ eval_metrics = jax.tree_util.tree_map(jnp.mean, eval_metrics)
2107
+
2108
+ # compute WER metric
2109
+ wer_desc = ""
2110
+ if training_args.predict_with_generate:
2111
+ wer_metric, pred_str, label_str, norm_pred_str, norm_label_str = compute_metrics(
2112
+ eval_preds, eval_labels
2113
+ )
2114
+ eval_metrics.update(wer_metric)
2115
+ wer_desc = " ".join([f"Eval {key}: {value} |" for key, value in wer_metric.items()])
2116
+
2117
+ # Print metrics and update progress bar
2118
+ steps_trained_progress_bar.write(
2119
+ f"Eval results for step ({cur_step} / {total_train_steps} | Eval Loss: {eval_metrics['loss']} |"
2120
+ f" {wer_desc})"
2121
+ )
2122
+
2123
+ if has_tensorboard and jax.process_index() == 0:
2124
+ write_eval_metric(
2125
+ summary_writer,
2126
+ eval_metrics,
2127
+ cur_step,
2128
+ prefix=eval_split,
2129
+ )
2130
+
2131
+ if has_wandb and jax.process_index() == 0:
2132
+ write_wandb_metric(wandb_logger, eval_metrics, eval_time, cur_step, epoch, prefix=eval_split)
2133
+ if training_args.predict_with_generate:
2134
+ write_wandb_pred(
2135
+ wandb_logger,
2136
+ pred_str,
2137
+ label_str,
2138
+ norm_pred_str,
2139
+ norm_label_str,
2140
+ cur_step,
2141
+ prefix=eval_split,
2142
+ )
2143
+
2144
+ if has_tensorboard and jax.process_index() == 0:
2145
+ # we'll only log to tensorboard every eval steps
2146
+ write_train_metric(
2147
+ summary_writer,
2148
+ train_metrics,
2149
+ train_time,
2150
+ cur_step,
2151
+ training_args.logging_steps,
2152
+ )
2153
+
2154
+ # flush the train metrics
2155
+ train_start = time.time()
2156
+ train_metrics = []
2157
+
2158
+ # break condition
2159
+ if cur_step == total_train_steps:
2160
+ continue_training = False
2161
+ break
2162
+
2163
+ if not continue_training:
2164
+ break
2165
+
2166
+
2167
+ if __name__ == "__main__":
2168
+ main()
run_large_training.sh ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env bash
2
+ TOKENIZERS_PARALLELISM=false python3 run_distillation_nodes.py \
3
+ --model_name_or_path "./nb-distil-large-init" \
4
+ --teacher_model_name_or_path "NbAiLab/nb-whisper-large" \
5
+ --train_dataset_name "NbAiLab/annotated_distil_raw_ncc_speech_v7_large" \
6
+ --train_dataset_config_name "" \
7
+ --train_split_name "train" \
8
+ --eval_dataset_name "NbAiLab/annotated_distil_raw_ncc_speech_v7_large" \
9
+ --eval_dataset_config_name "" \
10
+ --eval_split_name "validation" \
11
+ --eval_steps 500 \
12
+ --save_steps 1000 \
13
+ --warmup_steps 1000 \
14
+ --learning_rate 0.0003 \
15
+ --lr_scheduler_type "linear" \
16
+ --logging_steps 200 \
17
+ --save_total_limit 1 \
18
+ --max_steps 50000 \
19
+ --wer_threshold 10 \
20
+ --per_device_train_batch_size 16\
21
+ --per_device_eval_batch_size 16 \
22
+ --dataloader_num_workers 16 \
23
+ --dtype "bfloat16" \
24
+ --output_dir "./" \
25
+ --do_train \
26
+ --do_eval \
27
+ --use_scan \
28
+ --gradient_checkpointing \
29
+ --overwrite_output_dir \
30
+ --predict_with_generate \
31
+ --freeze_encoder \
32
+ --streaming \
33
+ --use_auth_token \
34
+ --report_to "wandb" \
35
+ --wandb_project "nb-distil-whisper-large-flax2" \
36
+ --hub_model_id "NbAiLab/nb-distil-whisper-large-flax2" \
37
+ --push_to_hub
38
+
run_large_training_debug.sh ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env bash
2
+ TOKENIZERS_PARALLELISM=false python3 run_distillation_debug.py \
3
+ --model_name_or_path "./nb-distil-large-init" \
4
+ --teacher_model_name_or_path "NbAiLab/nb-whisper-large" \
5
+ --train_dataset_name "NbAiLab/annotated_distil_raw_ncc_speech_v7_compact8_large" \
6
+ --train_dataset_config_name "no" \
7
+ --train_split_name "train" \
8
+ --eval_dataset_name "NbAiLab/annotated_distil_raw_ncc_speech_v7_compact8_large" \
9
+ --eval_dataset_config_name "no" \
10
+ --eval_split_name "validation_norwegian_fleurs" \
11
+ --eval_steps 5000 \
12
+ --save_steps 5000 \
13
+ --warmup_steps 500 \
14
+ --learning_rate 0.0001 \
15
+ --lr_scheduler_type "linear" \
16
+ --logging_steps 25 \
17
+ --save_total_limit 1 \
18
+ --max_steps 100000 \
19
+ --wer_threshold 10 \
20
+ --per_device_train_batch_size 64 \
21
+ --per_device_eval_batch_size 64 \
22
+ --dataloader_num_workers 16 \
23
+ --dtype "bfloat16" \
24
+ --output_dir "./" \
25
+ --do_train \
26
+ --do_eval \
27
+ --use_scan \
28
+ --gradient_checkpointing \
29
+ --overwrite_output_dir \
30
+ --predict_with_generate \
31
+ --freeze_encoder \
32
+ --streaming \
33
+ --use_auth_token \
34
+ --report_to "wandb" \
35
+ --wandb_project "nb-distil-whisper-large-test2" \
36
+ --hub_model_id "NbAiLab/nb-distil-whisper-large-flax1-no" \
37
+ --push_to_hub
38
+
run_large_training_lr1e4.sh ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env bash
2
+ TOKENIZERS_PARALLELISM=false python3 run_distillation_nodes.py \
3
+ --model_name_or_path "./nb-distil-large-init-0811" \
4
+ --teacher_model_name_or_path "NbAiLab/nb-whisper-large" \
5
+ --train_dataset_name "NbAiLab/annotated_distil_raw_ncc_speech_v7_large" \
6
+ --train_dataset_config_name "" \
7
+ --train_split_name "train" \
8
+ --eval_dataset_name "NbAiLab/annotated_distil_raw_ncc_speech_v7_large" \
9
+ --eval_dataset_config_name "" \
10
+ --eval_split_name "validation" \
11
+ --eval_steps 500 \
12
+ --save_steps 1000 \
13
+ --warmup_steps 1000 \
14
+ --learning_rate 0.0001 \
15
+ --lr_scheduler_type "linear" \
16
+ --logging_steps 200 \
17
+ --save_total_limit 1 \
18
+ --max_steps 50000 \
19
+ --wer_threshold 10 \
20
+ --per_device_train_batch_size 16\
21
+ --per_device_eval_batch_size 16 \
22
+ --dataloader_num_workers 16 \
23
+ --dtype "bfloat16" \
24
+ --output_dir "./" \
25
+ --do_train \
26
+ --do_eval \
27
+ --use_scan \
28
+ --gradient_checkpointing \
29
+ --overwrite_output_dir \
30
+ --predict_with_generate \
31
+ --freeze_encoder \
32
+ --streaming \
33
+ --use_auth_token \
34
+ --report_to "wandb" \
35
+ --wandb_project "nb-distil-whisper-large-flax2" \
36
+ --wandb_name "flax lr1e4" \
37
+ --save_code_to_wandb \
38
+ --save_train_state \
39
+ --hub_model_id "NbAiLab/nb-distil-whisper-large-flax3" \
40
+ --push_to_hub
41
+
run_large_training_lr6e4.sh ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env bash
2
+ TOKENIZERS_PARALLELISM=false python3 run_distillation_nodes.py \
3
+ --model_name_or_path "./nb-distil-large-init" \
4
+ --teacher_model_name_or_path "NbAiLab/nb-whisper-large" \
5
+ --train_dataset_name "NbAiLab/annotated_distil_raw_ncc_speech_v7_large" \
6
+ --train_dataset_config_name "" \
7
+ --train_split_name "train" \
8
+ --eval_dataset_name "NbAiLab/annotated_distil_raw_ncc_speech_v7_large" \
9
+ --eval_dataset_config_name "" \
10
+ --eval_split_name "validation" \
11
+ --eval_steps 500 \
12
+ --save_steps 1000 \
13
+ --warmup_steps 1000 \
14
+ --learning_rate 0.0006 \
15
+ --lr_scheduler_type "linear" \
16
+ --logging_steps 200 \
17
+ --save_total_limit 1 \
18
+ --max_steps 50000 \
19
+ --wer_threshold 10 \
20
+ --per_device_train_batch_size 16\
21
+ --per_device_eval_batch_size 16 \
22
+ --dataloader_num_workers 16 \
23
+ --dtype "bfloat16" \
24
+ --output_dir "./" \
25
+ --do_train \
26
+ --do_eval \
27
+ --use_scan \
28
+ --gradient_checkpointing \
29
+ --overwrite_output_dir \
30
+ --predict_with_generate \
31
+ --freeze_encoder \
32
+ --streaming \
33
+ --use_auth_token \
34
+ --report_to "wandb" \
35
+ --wandb_project "nb-distil-whisper-large-flax2" \
36
+ --wandb_name "flax lr1e6" \
37
+ --save_code_to_wandb \
38
+ --save_train_state \
39
+ --hub_model_id "NbAiLab/nb-distil-whisper-large-flax4" \
40
+ --push_to_hub
41
+
run_large_training_recover.sh ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env bash
2
+ TOKENIZERS_PARALLELISM=false python3 run_distillation_nodes.py \
3
+ --model_name_or_path "./checkpoint-4000" \
4
+ --teacher_model_name_or_path "NbAiLab/nb-whisper-large" \
5
+ --train_dataset_name "NbAiLab/annotated_distil_raw_ncc_speech_v7_large" \
6
+ --train_dataset_config_name "" \
7
+ --train_split_name "train" \
8
+ --eval_dataset_name "NbAiLab/annotated_distil_raw_ncc_speech_v7_large" \
9
+ --eval_dataset_config_name "" \
10
+ --eval_split_name "validation" \
11
+ --eval_steps 500 \
12
+ --save_steps 1000 \
13
+ --warmup_steps 1000 \
14
+ --learning_rate 0.0003 \
15
+ --lr_scheduler_type "linear" \
16
+ --logging_steps 200 \
17
+ --save_total_limit 1 \
18
+ --max_steps 50000 \
19
+ --wer_threshold 10 \
20
+ --per_device_train_batch_size 16\
21
+ --per_device_eval_batch_size 16 \
22
+ --dataloader_num_workers 16 \
23
+ --dtype "bfloat16" \
24
+ --output_dir "./" \
25
+ --do_train \
26
+ --do_eval \
27
+ --use_scan \
28
+ --gradient_checkpointing \
29
+ --overwrite_output_dir \
30
+ --predict_with_generate \
31
+ --freeze_encoder \
32
+ --streaming \
33
+ --use_auth_token \
34
+ --report_to "wandb" \
35
+ --wandb_project "nb-distil-whisper-large-flax2" \
36
+ --wandb_name "recover at 4000" \
37
+ --save_code_to_wandb \
38
+ --save_train_state \
39
+ --hub_model_id "NbAiLab/nb-distil-whisper-large-flax2" \
40
+ --push_to_hub
41
+
special_tokens_map.json ADDED
@@ -0,0 +1,139 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|startoftranscript|>",
4
+ "<|en|>",
5
+ "<|zh|>",
6
+ "<|de|>",
7
+ "<|es|>",
8
+ "<|ru|>",
9
+ "<|ko|>",
10
+ "<|fr|>",
11
+ "<|ja|>",
12
+ "<|pt|>",
13
+ "<|tr|>",
14
+ "<|pl|>",
15
+ "<|ca|>",
16
+ "<|nl|>",
17
+ "<|ar|>",
18
+ "<|sv|>",
19
+ "<|it|>",
20
+ "<|id|>",
21
+ "<|hi|>",
22
+ "<|fi|>",
23
+ "<|vi|>",
24
+ "<|he|>",
25
+ "<|uk|>",
26
+ "<|el|>",
27
+ "<|ms|>",
28
+ "<|cs|>",
29
+ "<|ro|>",
30
+ "<|da|>",
31
+ "<|hu|>",
32
+ "<|ta|>",
33
+ "<|no|>",
34
+ "<|th|>",
35
+ "<|ur|>",
36
+ "<|hr|>",
37
+ "<|bg|>",
38
+ "<|lt|>",
39
+ "<|la|>",
40
+ "<|mi|>",
41
+ "<|ml|>",
42
+ "<|cy|>",
43
+ "<|sk|>",
44
+ "<|te|>",
45
+ "<|fa|>",
46
+ "<|lv|>",
47
+ "<|bn|>",
48
+ "<|sr|>",
49
+ "<|az|>",
50
+ "<|sl|>",
51
+ "<|kn|>",
52
+ "<|et|>",
53
+ "<|mk|>",
54
+ "<|br|>",
55
+ "<|eu|>",
56
+ "<|is|>",
57
+ "<|hy|>",
58
+ "<|ne|>",
59
+ "<|mn|>",
60
+ "<|bs|>",
61
+ "<|kk|>",
62
+ "<|sq|>",
63
+ "<|sw|>",
64
+ "<|gl|>",
65
+ "<|mr|>",
66
+ "<|pa|>",
67
+ "<|si|>",
68
+ "<|km|>",
69
+ "<|sn|>",
70
+ "<|yo|>",
71
+ "<|so|>",
72
+ "<|af|>",
73
+ "<|oc|>",
74
+ "<|ka|>",
75
+ "<|be|>",
76
+ "<|tg|>",
77
+ "<|sd|>",
78
+ "<|gu|>",
79
+ "<|am|>",
80
+ "<|yi|>",
81
+ "<|lo|>",
82
+ "<|uz|>",
83
+ "<|fo|>",
84
+ "<|ht|>",
85
+ "<|ps|>",
86
+ "<|tk|>",
87
+ "<|nn|>",
88
+ "<|mt|>",
89
+ "<|sa|>",
90
+ "<|lb|>",
91
+ "<|my|>",
92
+ "<|bo|>",
93
+ "<|tl|>",
94
+ "<|mg|>",
95
+ "<|as|>",
96
+ "<|tt|>",
97
+ "<|haw|>",
98
+ "<|ln|>",
99
+ "<|ha|>",
100
+ "<|ba|>",
101
+ "<|jw|>",
102
+ "<|su|>",
103
+ "<|yue|>",
104
+ "<|translate|>",
105
+ "<|transcribe|>",
106
+ "<|startoflm|>",
107
+ "<|startofprev|>",
108
+ "<|nospeech|>",
109
+ "<|notimestamps|>"
110
+ ],
111
+ "bos_token": {
112
+ "content": "<|endoftext|>",
113
+ "lstrip": false,
114
+ "normalized": false,
115
+ "rstrip": false,
116
+ "single_word": false
117
+ },
118
+ "eos_token": {
119
+ "content": "<|endoftext|>",
120
+ "lstrip": false,
121
+ "normalized": false,
122
+ "rstrip": false,
123
+ "single_word": false
124
+ },
125
+ "pad_token": {
126
+ "content": "<|endoftext|>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false
131
+ },
132
+ "unk_token": {
133
+ "content": "<|endoftext|>",
134
+ "lstrip": false,
135
+ "normalized": false,
136
+ "rstrip": false,
137
+ "single_word": false
138
+ }
139
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff
 
vocab.json ADDED
The diff for this file is too large to render. See raw diff