RonanMcGovern commited on
Commit
9e73dad
·
verified ·
1 Parent(s): 60aa2b3

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,553 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:305
8
+ - loss:MultipleNegativesRankingLoss
9
+ base_model: nomic-ai/modernbert-embed-base
10
+ widget:
11
+ - source_sentence: What happens if neither team is leading after the two-minute drop-off
12
+ period?
13
+ sentences:
14
+ - "24  Drop-Off \n24.1\tShould a Winner be required in drawn matches, the following\
15
+ \ Drop-Off \nprocedure is used to determine a Winner.24.1.1\tEach Team will reduce\
16
+ \ their on-field Team to four (4) players and within \n60 seconds take up a position\
17
+ \ to restart play from the Halfway Line, \ndefending the same end of the field\
18
+ \ as at the End of Play.24.1.2\tThe Drop-Off commences with a Tap from the centre\
19
+ \ of the Halfway Line \nby the Team that did not commence the match with Possession.24.1.3\t\
20
+ The Drop-Off will commence with a two (2) minute period of extra time.24.1.4\t\
21
+ Should a Team be leading at the expiration of the two (2) minute period \nof extra\
22
+ \ time then that Team will be declared the Winner and Match \ncomplete.24.1.5\t\
23
+ Should neither Team be leading at the expiration of two (2) minutes, a \nsignal\
24
+ \ is given and the match will pause at the next Touch or Dead Ball."
25
+ - "25.1.2\tAdjudicate on the Rules of the game;\n25.1.3\tImpose any sanction necessary\
26
+ \ to control the match;\n25.1.4\tAward Tries and record the progressive score;\n\
27
+ 25.1.5\tMaintain a count of Touches during each Possession;\n25.1.6\tAward Penalties\
28
+ \ for Infringements against the Rules; and\n25.1.7\tReport to the relevant competition\
29
+ \ administration any Sin Bins, \nDismissals or injuries to any participant sustained\
30
+ \ during a Match.25.2\tOnly Team captains are permitted to seek clarification\
31
+ \ of a decision directly \nfrom the Referee.An approach may only be made during\
32
+ \ a break in play or at \nthe discretion of the Referee."
33
+ - "21  Forced Interchange \n21.1\tWhere the Referee deems it necessary to implement\
34
+ \ a Forced Interchange \nfollowing an Infringement, the Referee is to stop the\
35
+ \ match, direct the ball to \nbe placed on the Mark, advise the offending player\
36
+ \ of the reason for the Forced \nInterchange, direct that player to return to\
37
+ \ the Interchange Area, display the \nrelevant signal and award a Penalty to the\
38
+ \ non-offending Team.22  Sin Bin \n22.1\tThe on-field Referee is required to\
39
+ \ indicate the commencement and the end of \nthe Sin Bin time.22.2\tAny player\
40
+ \ sent to the Sin Bin must stand in the Sin Bin Area at the opposition’s \nend\
41
+ \ of the Field of Play and on the same side as their Interchange Area.22.3\tAny\
42
+ \ player sent to the Sin Bin must return to the Interchange Area prior to re-\n\
43
+ entering the Field of Play.22.4\tAny action that causes the Touch Count to restart\
44
+ \ will result in a continuation of \nthat Possession."
45
+ - source_sentence: What actions constitute misconduct under rule 20.1.6?
46
+ sentences:
47
+ - "FIT Playing Rules - 5th Edition\nCOPYRIGHT © Touch Football Australia 2020\n\
48
+ 7\n7.6\tA Tap may not be taken until at least four (4) defending players are in\
49
+ \ an Onside \nposition or unless directed to so by the Referee.Where the number\
50
+ \ of players \non the field from the Defending Team falls below four (4), all\
51
+ \ players must be in \nan Onside position for a Tap to be taken unless directed\
52
+ \ to do so by the Referee.Ruling = The Player will be directed to return to the\
53
+ \ Mark and to take the Tap again.7.7\tThe Tap to commence or recommence play must\
54
+ \ be performed without delay.Ruling = A Penalty to the non-offending team at the\
55
+ \ centre of the Halfway line.8  Match Duration \n \n8.1\tA match is 40 minutes\
56
+ \ in duration, consisting of two (2) x 20 minute halves with \na Half Time break.8.1.1\t\
57
+ There is no time off for injury during a match.8.2\tLocal competition and tournament\
58
+ \ conditions may vary the duration of a match."
59
+ - "Ruling = A Penalty to the Defending Team at the point of the Infringement.13.5\t\
60
+ A player may only perform a Rollball at the Mark under the following \ncircumstances:\n\
61
+ 13.5.1\twhen a Touch has been made; or\n13.5.2\twhen Possession changes following\
62
+ \ the sixth Touch; or\n13.5.3\twhen Possession changes due to the ball being dropped\
63
+ \ or passed and \ngoes to the ground; or\n13.5.4\twhen Possession changes due\
64
+ \ to an Infringement by an attacking player \nat a Penalty, a Tap or a Rollball;\
65
+ \ or\nFIT Playing Rules - 5th Edition\nCOPYRIGHT © Touch Football Australia 2020\n\
66
+ 11\n13.5.5\twhen Possession changes after the Half is Touched or when the Half\
67
+ \ \nplaces the ball on or over the Try Line; or\n13.5.6\tin replacement of a Penalty\
68
+ \ Tap; or\n13.5.7\twhen so directed by the Referee."
69
+ - "18.7\tA player may perform a Rollball instead of a Penalty Tap and the player\
70
+ \ who \nreceives the ball does not become the Half.18.8\tIf the Defending Team\
71
+ \ is penalised three (3) times upon entering their Seven \nMetre Zone during a\
72
+ \ single Possession, the last offending player will be given an \nExclusion until\
73
+ \ the end of that Possession.18.9\tA Penalty Try is awarded if any action by a\
74
+ \ player, Team official or spectator, \ndeemed by the Referee to be contrary to\
75
+ \ the Rules or spirit of the game clearly \nprevents the Attacking Team from scoring\
76
+ \ a Try.FIT Playing Rules - 5th Edition\nCOPYRIGHT © Touch Football Australia\
77
+ \ 2020\n15\n19  Advantage \n19.1\tWhere a Defending Team player is Offside at\
78
+ \ a Tap or Rollball and attempts \nto interfere with play, the Referee will allow\
79
+ \ Advantage or award a Penalty, \nwhichever is of greater Advantage to the Attacking\
80
+ \ Team."
81
+ - source_sentence: Who is permitted to directly seek clarification of a Referee's
82
+ decision?
83
+ sentences:
84
+ - "8.2\tLocal competition and tournament conditions may vary the duration of a match.8.3\t\
85
+ When time expires, play is to continue until the next Touch or Dead Ball and End\
86
+ \ \nof Play is signaled by the Referee.8.3.1\tShould a Penalty be awarded during\
87
+ \ this period, the Penalty is to be taken.8.4\tIf a match is abandoned in any\
88
+ \ circumstances other than those referred to in \nclause 24.1.6 the NTA or NTA\
89
+ \ competition provider in its sole discretion shall \ndetermine the result of\
90
+ \ the match.9  Possession \n \n9.1\tThe Team with the ball is entitled to six\
91
+ \ (6) Touches prior to a Change of \nPossession.9.2\tOn the Change of Possession\
92
+ \ due to an intercept, the first Touch will be zero (0) \nTouch.9.3\tFollowing\
93
+ \ the sixth Touch or a loss of Possession due to any other means, the \nball must\
94
+ \ be returned to the Mark without delay."
95
+ - "The player is counted as a player on the Field of Play \nand cannot be replaced\
96
+ \ or Interchanged.FIT Playing Rules - 5th Edition\nCOPYRIGHT © Touch Football\
97
+ \ Australia 2020\n3\nSin Bin Area\nThe area between the Dead Ball Line and the\
98
+ \ Perimeter where \nplayers are sent for either a Sin Bin period or Exclusion\
99
+ \ for repeated \nSeven Metre Zone Infringements.There are four (4) Sin Bin Areas.See\
100
+ \ Appendix 1.Spirit of the Game\nThe act of good sportsmanship and fair play.Substitute\
101
+ \ Player\nThe player who replaces another player during Interchange.There is \n\
102
+ a maximum of eight (8) substitute players in any Team and except \nwhen interchanging,\
103
+ \ in the Sin Bin, dismissed or on the Field of Play, \nthey must remain in the\
104
+ \ Substitution Box.Tap and Tap Penalty\nThe method of commencing the match, recommencing\
105
+ \ the match \nafter Half Time and after a Try has been scored.The Tap is also\
106
+ \ the \nmethod of recommencing play when a Penalty is awarded."
107
+ - "FIT Playing Rules - 5th Edition\nCOPYRIGHT © Touch Football Australia 2020\n\
108
+ 17\n24.3\tAt the commencement of the Drop-Off, if there is a player serving time\
109
+ \ in the \nSin Bin and is yet to complete the required time, their Team commences\
110
+ \ the \nDrop-Off with one (1) less player on the field than their opposition and\
111
+ \ continues \nto play with one (1) player less until the Sin Bin period has been\
112
+ \ completed.24.4\tAt the commencement of the Drop-Off, if a Team has had a player\
113
+ \ dismissed for \nthe remainder of the match that Team continues to play with\
114
+ \ one (1) player less \nthan the opposition Team for the duration of the Drop-Off.24.5\t\
115
+ For the avoidance of doubt for clauses 24.3 and 24.4 the non-offending Team \n\
116
+ will retain a numerical advantage on the Field of Play during the Drop-Off."
117
+ - source_sentence: What are the dimensions of the rectangular field of play, excluding
118
+ in-goal and interchange areas?
119
+ sentences:
120
+ - "24  Drop-Off \n24.1\tShould a Winner be required in drawn matches, the following\
121
+ \ Drop-Off \nprocedure is used to determine a Winner.24.1.1\tEach Team will reduce\
122
+ \ their on-field Team to four (4) players and within \n60 seconds take up a position\
123
+ \ to restart play from the Halfway Line, \ndefending the same end of the field\
124
+ \ as at the End of Play.24.1.2\tThe Drop-Off commences with a Tap from the centre\
125
+ \ of the Halfway Line \nby the Team that did not commence the match with Possession.24.1.3\t\
126
+ The Drop-Off will commence with a two (2) minute period of extra time.24.1.4\t\
127
+ Should a Team be leading at the expiration of the two (2) minute period \nof extra\
128
+ \ time then that Team will be declared the Winner and Match \ncomplete.24.1.5\t\
129
+ Should neither Team be leading at the expiration of two (2) minutes, a \nsignal\
130
+ \ is given and the match will pause at the next Touch or Dead Ball."
131
+ - "Touch Count\nThe progressive number of Touches that each Team has before a \n\
132
+ Change of Possession, from zero (0) to six (6).Try\nThe result of any attacking\
133
+ \ player, except the Half, placing the ball on \nor over the Team’s Attacking\
134
+ \ Try Line before being Touched.Try Lines\nThe lines separating the In-Goal Areas\
135
+ \ from the Field of Play.See \nAppendix 1.Voluntary Rollball\nThe player in Possession\
136
+ \ performs a Rollball before a Touch is made \nwith a defending player.Wing\n\
137
+ The player outside the Link player.Winner\nThe Team that scores the most Tries\
138
+ \ during the match.FIT Playing Rules - 5th Edition\n4\nCOPYRIGHT © Touch Football\
139
+ \ Australia 2020\n Rules of Play \n Mode of Play \nThe object of the game\
140
+ \ of Touch is for each Team to score Tries and to prevent the \nopposition from\
141
+ \ scoring.The ball may be passed, knocked or handed between players \nof the Attacking\
142
+ \ Team who may in turn run or otherwise move with the ball in an \nattempt to\
143
+ \ gain territorial Advantage and to score Tries."
144
+ - "12.2\tIf a player from the Defending Team deliberately makes contact with the\
145
+ \ ball \nin flight and the ball is retrieved by an attacking player, without touching\
146
+ \ the \nground, play continues and the next Touch is zero (0) Touch.12.3\tIf a\
147
+ \ player from the Defending Team deliberately makes contact with the ball \nin\
148
+ \ flight, propelling it Forward and an attacking player, in an attempt to regain\
149
+ \ \npossession, drops the ball, the Attacking Team retains Possession and the\
150
+ \ \nFIT Playing Rules - 5th Edition\n10\nCOPYRIGHT © Touch Football Australia\
151
+ \ 2020\nTouch Count restarts as zero (0) Touch.12.4\tIf a player from the Defending\
152
+ \ Team deliberately makes contact with the ball \nin flight, propelling it towards\
153
+ \ the Defending Team’s Dead Ball Line and an \nattacking player, in an attempt\
154
+ \ to regain possession drops the ball, a Change of \nPossession occurs.12.5\t\
155
+ If a player from the Defending Team unintentionally makes contact with the ball\
156
+ \ \nin flight and the ball goes to ground, a Change of Possession occurs."
157
+ - source_sentence: Who indicates to commence play at the start of a Touch Rugby match?
158
+ sentences:
159
+ - "24.5\tFor the avoidance of doubt for clauses 24.3 and 24.4 the non-offending\
160
+ \ Team \nwill retain a numerical advantage on the Field of Play during the Drop-Off.25 \
161
+ \ Match Officials \n25.1\tThe Referee is the sole judge on all match related\
162
+ \ matters inside the Perimeter \nfor the Duration of a match, has jurisdiction\
163
+ \ over all players, coaches and \nofficials and is required to:\n25.1.1\tInspect\
164
+ \ the Field of Play, Line Markings and Markers prior to the \ncommencement of\
165
+ \ the Match to ensure the safety of all participants.25.1.2\tAdjudicate on the\
166
+ \ Rules of the game;\n25.1.3\tImpose any sanction necessary to control the match;\n\
167
+ 25.1.4\tAward Tries and record the progressive score;\n25.1.5\tMaintain a count\
168
+ \ of Touches during each Possession;\n25.1.6\tAward Penalties for Infringements\
169
+ \ against the Rules; and\n25.1.7\tReport to the relevant competition administration\
170
+ \ any Sin Bins, \nDismissals or injuries to any participant sustained during a\
171
+ \ Match."
172
+ - "See Appendix 1.Forced Interchange\nWhen a player is required to undertake a compulsory\
173
+ \ Interchange for \nan Infringement ruled more serious than a Penalty but less\
174
+ \ serious \nthan a Permanent Interchange, Sin Bin or Dismissal.Forward\nA position\
175
+ \ or direction towards the Dead Ball Line beyond the Team’s \nAttacking Try Line.Full\
176
+ \ Time\nThe expiration of the second period of time allowed for play.Half\nThe\
177
+ \ player who takes Possession following a Rollball.Half Time\nThe break in play\
178
+ \ between the two halves of a match.Imminent\nAbout to occur, it is almost certain\
179
+ \ to occur.Infringement\nThe action of a player contrary to the Rules of the game.In-Goal\
180
+ \ Area\nThe area in the Field of Play bounded by the Sidelines, the Try Lines\
181
+ \ \nand the Dead Ball Lines.There are two (2), one (1) at each end of the \nField\
182
+ \ of Play.See Appendix 1.Interchange\nThe act of an on-field player leaving the\
183
+ \ Field of Play to be replaced \nby an off-field player entering the Field of\
184
+ \ Play."
185
+ - "6.2\tThe Team coach(s) and Team officials may move from one position to the other\
186
+ \ \nbut shall do so without delay.While in a position at the end of the Field\
187
+ \ of Play, \nthe Team coach(s) or Team official must remain no closer than five\
188
+ \ (5) metres \nfrom the Dead Ball Line and must not coach or communicate (verbal\
189
+ \ or non-\nverbal) with either Team or the Referees.7  Commencement and Recommencement\
190
+ \ of Play \n7.1\tTeam captains are to toss a coin in the presence of the Referee(s)\
191
+ \ with the \nwinning captain’s Team having the choice of the direction the Team\
192
+ \ wishes \nto run in the first half; the choice of Interchange Areas for the duration\
193
+ \ of the \nmatch, including any extra time; and the choice of which team will\
194
+ \ commence \nthe match in Possession.7.2\tA player of the Attacking Team is to\
195
+ \ commence the match with a Tap at the \ncentre of the Halfway Line following\
196
+ \ the indication to commence play from the \nReferee."
197
+ datasets:
198
+ - Trelis/touch-rugby-modernbert-pairs
199
+ pipeline_tag: sentence-similarity
200
+ library_name: sentence-transformers
201
+ ---
202
+
203
+ # SentenceTransformer based on nomic-ai/modernbert-embed-base
204
+
205
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [nomic-ai/modernbert-embed-base](https://huggingface.co/nomic-ai/modernbert-embed-base) on the [touch-rugby-modernbert-pairs](https://huggingface.co/datasets/Trelis/touch-rugby-modernbert-pairs) dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
206
+
207
+ ## Model Details
208
+
209
+ ### Model Description
210
+ - **Model Type:** Sentence Transformer
211
+ - **Base model:** [nomic-ai/modernbert-embed-base](https://huggingface.co/nomic-ai/modernbert-embed-base) <!-- at revision 92168cbee600b1abbfc10842aba988aa69572291 -->
212
+ - **Maximum Sequence Length:** 8192 tokens
213
+ - **Output Dimensionality:** 768 dimensions
214
+ - **Similarity Function:** Cosine Similarity
215
+ - **Training Dataset:**
216
+ - [touch-rugby-modernbert-pairs](https://huggingface.co/datasets/Trelis/touch-rugby-modernbert-pairs)
217
+ <!-- - **Language:** Unknown -->
218
+ <!-- - **License:** Unknown -->
219
+
220
+ ### Model Sources
221
+
222
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
223
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
224
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
225
+
226
+ ### Full Model Architecture
227
+
228
+ ```
229
+ SentenceTransformer(
230
+ (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel
231
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
232
+ (2): Normalize()
233
+ )
234
+ ```
235
+
236
+ ## Usage
237
+
238
+ ### Direct Usage (Sentence Transformers)
239
+
240
+ First install the Sentence Transformers library:
241
+
242
+ ```bash
243
+ pip install -U sentence-transformers
244
+ ```
245
+
246
+ Then you can load this model and run inference.
247
+ ```python
248
+ from sentence_transformers import SentenceTransformer
249
+
250
+ # Download from the 🤗 Hub
251
+ model = SentenceTransformer("Trelis/modernbert-embed-base-touch-rugby-ft-v2")
252
+ # Run inference
253
+ sentences = [
254
+ 'Who indicates to commence play at the start of a Touch Rugby match?',
255
+ '6.2\tThe Team coach(s) and Team officials may move from one position to the other \nbut shall do so without delay.While in a position at the end of the Field of Play, \nthe Team coach(s) or Team official must remain no closer than five (5) metres \nfrom the Dead Ball Line and must not coach or communicate (verbal or non-\nverbal) with either Team or the Referees.7\u2002 Commencement and Recommencement of Play \n7.1\tTeam captains are to toss a coin in the presence of the Referee(s) with the \nwinning captain’s Team having the choice of the direction the Team wishes \nto run in the first half; the choice of Interchange Areas for the duration of the \nmatch, including any extra time; and the choice of which team will commence \nthe match in Possession.7.2\tA player of the Attacking Team is to commence the match with a Tap at the \ncentre of the Halfway Line following the indication to commence play from the \nReferee.',
256
+ 'See Appendix 1.Forced Interchange\nWhen a player is required to undertake a compulsory Interchange for \nan Infringement ruled more serious than a Penalty but less serious \nthan a Permanent Interchange, Sin Bin or Dismissal.Forward\nA position or direction towards the Dead Ball Line beyond the Team’s \nAttacking Try Line.Full Time\nThe expiration of the second period of time allowed for play.Half\nThe player who takes Possession following a Rollball.Half Time\nThe break in play between the two halves of a match.Imminent\nAbout to occur, it is almost certain to occur.Infringement\nThe action of a player contrary to the Rules of the game.In-Goal Area\nThe area in the Field of Play bounded by the Sidelines, the Try Lines \nand the Dead Ball Lines.There are two (2), one (1) at each end of the \nField of Play.See Appendix 1.Interchange\nThe act of an on-field player leaving the Field of Play to be replaced \nby an off-field player entering the Field of Play.',
257
+ ]
258
+ embeddings = model.encode(sentences)
259
+ print(embeddings.shape)
260
+ # [3, 768]
261
+
262
+ # Get the similarity scores for the embeddings
263
+ similarities = model.similarity(embeddings, embeddings)
264
+ print(similarities.shape)
265
+ # [3, 3]
266
+ ```
267
+
268
+ <!--
269
+ ### Direct Usage (Transformers)
270
+
271
+ <details><summary>Click to see the direct usage in Transformers</summary>
272
+
273
+ </details>
274
+ -->
275
+
276
+ <!--
277
+ ### Downstream Usage (Sentence Transformers)
278
+
279
+ You can finetune this model on your own dataset.
280
+
281
+ <details><summary>Click to expand</summary>
282
+
283
+ </details>
284
+ -->
285
+
286
+ <!--
287
+ ### Out-of-Scope Use
288
+
289
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
290
+ -->
291
+
292
+ <!--
293
+ ## Bias, Risks and Limitations
294
+
295
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
296
+ -->
297
+
298
+ <!--
299
+ ### Recommendations
300
+
301
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
302
+ -->
303
+
304
+ ## Training Details
305
+
306
+ ### Training Dataset
307
+
308
+ #### touch-rugby-modernbert-pairs
309
+
310
+ * Dataset: [touch-rugby-modernbert-pairs](https://huggingface.co/datasets/Trelis/touch-rugby-modernbert-pairs) at [7cb0ae2](https://huggingface.co/datasets/Trelis/touch-rugby-modernbert-pairs/tree/7cb0ae2222504ad98d6ca368b68a657ba2b33e22)
311
+ * Size: 305 training samples
312
+ * Columns: <code>question</code> and <code>related_chunk</code>
313
+ * Approximate statistics based on the first 305 samples:
314
+ | | question | related_chunk |
315
+ |:--------|:-----------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------|
316
+ | type | string | string |
317
+ | details | <ul><li>min: 10 tokens</li><li>mean: 18.68 tokens</li><li>max: 36 tokens</li></ul> | <ul><li>min: 147 tokens</li><li>mean: 231.42 tokens</li><li>max: 319 tokens</li></ul> |
318
+ * Samples:
319
+ | question | related_chunk |
320
+ |:-----------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
321
+ | <code>When may Onside players of the Defending Team move forward if the Half is not within one metre of the Rollball?</code> | <code>13.10 A player ceases to be the Half once the ball is passed to another player.13.11 Defending players are not to interfere with the performance of the Rollball or the <br>Half.Ruling = A Penalty to the Attacking Team at a point ten (10) metres directly Forward of the <br>Infringement.13.12 Players of the Defending Team must not move Forward of the Onside position <br>until the Half has made contact with the ball, unless directed to do so by the <br>Referee or in accordance with 13.12.1.13.12.1 When the Half is not within one (1) metre of the Rollball, Onside players <br>of the Defending Team may move Forward as soon as the player <br>performing the Rollball releases the ball.If the Half is not in position and <br>a defending player moves Forward and makes contact with the ball, a <br>Change of Possession results.</code> |
322
+ | <code>Besides awarding tries, what other scoring-related task does the Referee perform?</code> | <code>An approach may only be made during a break in play or at <br>the discretion of the Referee.FIT Playing Rules - 5th Edition<br>18<br>COPYRIGHT © Touch Football Australia 2020<br>HALFWAY LINE<br>SIN BIN AREAS<br>IN-GOAL AREA<br>TRY LINE<br>7 M ZONE<br>DEAD BALL LINE<br>PERIMETER<br>INTERCHANGE<br>AREA<br>20M<br>10M<br>10M<br>1M<br>5M<br>7 M<br>7 M<br>7 M<br>7 M<br>50M<br>3M<br>70M<br>INTERCHANGE<br>AREA<br> Appendix 1 – Field of Play<br>FIT Playing Rules - 5th Edition<br>COPYRIGHT © Touch Football Australia 2020<br>19<br>FEDERATION OF INTERNATIONAL TOUCH</code> |
323
+ | <code>What happens if a team has fewer than four players on the field during a match?</code> | <code>FIT Playing Rules - 5th Edition<br>COPYRIGHT © Touch Football Australia 2020<br>7<br>7.6 A Tap may not be taken until at least four (4) defending players are in an Onside <br>position or unless directed to so by the Referee.Where the number of players <br>on the field from the Defending Team falls below four (4), all players must be in <br>an Onside position for a Tap to be taken unless directed to do so by the Referee.Ruling = The Player will be directed to return to the Mark and to take the Tap again.7.7 The Tap to commence or recommence play must be performed without delay.Ruling = A Penalty to the non-offending team at the centre of the Halfway line.8  Match Duration <br> <br>8.1 A match is 40 minutes in duration, consisting of two (2) x 20 minute halves with <br>a Half Time break.8.1.1 There is no time off for injury during a match.8.2 Local competition and tournament conditions may vary the duration of a match.</code> |
324
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
325
+ ```json
326
+ {
327
+ "scale": 20.0,
328
+ "similarity_fct": "cos_sim"
329
+ }
330
+ ```
331
+
332
+ ### Evaluation Dataset
333
+
334
+ #### touch-rugby-modernbert-pairs
335
+
336
+ * Dataset: [touch-rugby-modernbert-pairs](https://huggingface.co/datasets/Trelis/touch-rugby-modernbert-pairs) at [7cb0ae2](https://huggingface.co/datasets/Trelis/touch-rugby-modernbert-pairs/tree/7cb0ae2222504ad98d6ca368b68a657ba2b33e22)
337
+ * Size: 305 evaluation samples
338
+ * Columns: <code>question</code> and <code>related_chunk</code>
339
+ * Approximate statistics based on the first 305 samples:
340
+ | | question | related_chunk |
341
+ |:--------|:-----------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------|
342
+ | type | string | string |
343
+ | details | <ul><li>min: 11 tokens</li><li>mean: 18.06 tokens</li><li>max: 32 tokens</li></ul> | <ul><li>min: 173 tokens</li><li>mean: 228.39 tokens</li><li>max: 260 tokens</li></ul> |
344
+ * Samples:
345
+ | question | related_chunk |
346
+ |:-----------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
347
+ | <code>What is the definition of the 'Defending Team' in Touch Rugby Rules 5th Edition?</code> | <code>Except as permitted under the <br>Copyright Act, these Rules must not be reproduced by any process, electronic or otherwise, without the written <br>permission of Touch Football Australia.Attacking Try Line<br>The line on or over which a player has to place the ball to <br>score a Try.Attacking Team<br>The Team which has or is gaining Possession.Behind<br>A position or direction towards a Team’s Defending Try Line.Change of Possession<br>The act of moving control of the ball from one Team to the other.Dead/Dead Ball<br>When the ball is out of play including the period following a Try and <br>until the match is recommenced and when the ball goes to ground <br>and/or outside the boundaries of the Field of Play prior to the <br>subsequent Rollball.Dead Ball Line<br>The end boundaries of the Field of Play.There is one at each end of <br>the Field of Play.See Appendix 1.Defending Try Line<br>The line which a Team has to defend to prevent a Try.Defending Team<br>The Team without or which is losing Possession.</code> |
348
+ | <code>What is the minimum number of players required on the field for a touch rugby match to begin or continue?</code> | <code>FIT Playing Rules - 5th Edition<br>COPYRIGHT © Touch Football Australia 2020<br>7<br>7.6 A Tap may not be taken until at least four (4) defending players are in an Onside <br>position or unless directed to so by the Referee.Where the number of players <br>on the field from the Defending Team falls below four (4), all players must be in <br>an Onside position for a Tap to be taken unless directed to do so by the Referee.Ruling = The Player will be directed to return to the Mark and to take the Tap again.7.7 The Tap to commence or recommence play must be performed without delay.Ruling = A Penalty to the non-offending team at the centre of the Halfway line.8  Match Duration <br> <br>8.1 A match is 40 minutes in duration, consisting of two (2) x 20 minute halves with <br>a Half Time break.8.1.1 There is no time off for injury during a match.8.2 Local competition and tournament conditions may vary the duration of a match.</code> |
349
+ | <code>What are the possible outcomes of a Referee's Ruling?</code> | <code>See Appendix 1.Forced Interchange<br>When a player is required to undertake a compulsory Interchange for <br>an Infringement ruled more serious than a Penalty but less serious <br>than a Permanent Interchange, Sin Bin or Dismissal.Forward<br>A position or direction towards the Dead Ball Line beyond the Team’s <br>Attacking Try Line.Full Time<br>The expiration of the second period of time allowed for play.Half<br>The player who takes Possession following a Rollball.Half Time<br>The break in play between the two halves of a match.Imminent<br>About to occur, it is almost certain to occur.Infringement<br>The action of a player contrary to the Rules of the game.In-Goal Area<br>The area in the Field of Play bounded by the Sidelines, the Try Lines <br>and the Dead Ball Lines.There are two (2), one (1) at each end of the <br>Field of Play.See Appendix 1.Interchange<br>The act of an on-field player leaving the Field of Play to be replaced <br>by an off-field player entering the Field of Play.</code> |
350
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
351
+ ```json
352
+ {
353
+ "scale": 20.0,
354
+ "similarity_fct": "cos_sim"
355
+ }
356
+ ```
357
+
358
+ ### Training Hyperparameters
359
+ #### Non-Default Hyperparameters
360
+
361
+ - `eval_strategy`: steps
362
+ - `per_device_train_batch_size`: 32
363
+ - `per_device_eval_batch_size`: 32
364
+ - `learning_rate`: 5e-06
365
+ - `num_train_epochs`: 1
366
+ - `lr_scheduler_type`: constant
367
+ - `warmup_ratio`: 0.3
368
+
369
+ #### All Hyperparameters
370
+ <details><summary>Click to expand</summary>
371
+
372
+ - `overwrite_output_dir`: False
373
+ - `do_predict`: False
374
+ - `eval_strategy`: steps
375
+ - `prediction_loss_only`: True
376
+ - `per_device_train_batch_size`: 32
377
+ - `per_device_eval_batch_size`: 32
378
+ - `per_gpu_train_batch_size`: None
379
+ - `per_gpu_eval_batch_size`: None
380
+ - `gradient_accumulation_steps`: 1
381
+ - `eval_accumulation_steps`: None
382
+ - `torch_empty_cache_steps`: None
383
+ - `learning_rate`: 5e-06
384
+ - `weight_decay`: 0.0
385
+ - `adam_beta1`: 0.9
386
+ - `adam_beta2`: 0.999
387
+ - `adam_epsilon`: 1e-08
388
+ - `max_grad_norm`: 1.0
389
+ - `num_train_epochs`: 1
390
+ - `max_steps`: -1
391
+ - `lr_scheduler_type`: constant
392
+ - `lr_scheduler_kwargs`: {}
393
+ - `warmup_ratio`: 0.3
394
+ - `warmup_steps`: 0
395
+ - `log_level`: passive
396
+ - `log_level_replica`: warning
397
+ - `log_on_each_node`: True
398
+ - `logging_nan_inf_filter`: True
399
+ - `save_safetensors`: True
400
+ - `save_on_each_node`: False
401
+ - `save_only_model`: False
402
+ - `restore_callback_states_from_checkpoint`: False
403
+ - `no_cuda`: False
404
+ - `use_cpu`: False
405
+ - `use_mps_device`: False
406
+ - `seed`: 42
407
+ - `data_seed`: None
408
+ - `jit_mode_eval`: False
409
+ - `use_ipex`: False
410
+ - `bf16`: False
411
+ - `fp16`: False
412
+ - `fp16_opt_level`: O1
413
+ - `half_precision_backend`: auto
414
+ - `bf16_full_eval`: False
415
+ - `fp16_full_eval`: False
416
+ - `tf32`: None
417
+ - `local_rank`: 0
418
+ - `ddp_backend`: None
419
+ - `tpu_num_cores`: None
420
+ - `tpu_metrics_debug`: False
421
+ - `debug`: []
422
+ - `dataloader_drop_last`: False
423
+ - `dataloader_num_workers`: 0
424
+ - `dataloader_prefetch_factor`: None
425
+ - `past_index`: -1
426
+ - `disable_tqdm`: False
427
+ - `remove_unused_columns`: True
428
+ - `label_names`: None
429
+ - `load_best_model_at_end`: False
430
+ - `ignore_data_skip`: False
431
+ - `fsdp`: []
432
+ - `fsdp_min_num_params`: 0
433
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
434
+ - `fsdp_transformer_layer_cls_to_wrap`: None
435
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
436
+ - `deepspeed`: None
437
+ - `label_smoothing_factor`: 0.0
438
+ - `optim`: adamw_torch
439
+ - `optim_args`: None
440
+ - `adafactor`: False
441
+ - `group_by_length`: False
442
+ - `length_column_name`: length
443
+ - `ddp_find_unused_parameters`: None
444
+ - `ddp_bucket_cap_mb`: None
445
+ - `ddp_broadcast_buffers`: False
446
+ - `dataloader_pin_memory`: True
447
+ - `dataloader_persistent_workers`: False
448
+ - `skip_memory_metrics`: True
449
+ - `use_legacy_prediction_loop`: False
450
+ - `push_to_hub`: False
451
+ - `resume_from_checkpoint`: None
452
+ - `hub_model_id`: None
453
+ - `hub_strategy`: every_save
454
+ - `hub_private_repo`: None
455
+ - `hub_always_push`: False
456
+ - `gradient_checkpointing`: False
457
+ - `gradient_checkpointing_kwargs`: None
458
+ - `include_inputs_for_metrics`: False
459
+ - `include_for_metrics`: []
460
+ - `eval_do_concat_batches`: True
461
+ - `fp16_backend`: auto
462
+ - `push_to_hub_model_id`: None
463
+ - `push_to_hub_organization`: None
464
+ - `mp_parameters`:
465
+ - `auto_find_batch_size`: False
466
+ - `full_determinism`: False
467
+ - `torchdynamo`: None
468
+ - `ray_scope`: last
469
+ - `ddp_timeout`: 1800
470
+ - `torch_compile`: False
471
+ - `torch_compile_backend`: None
472
+ - `torch_compile_mode`: None
473
+ - `dispatch_batches`: None
474
+ - `split_batches`: None
475
+ - `include_tokens_per_second`: False
476
+ - `include_num_input_tokens_seen`: False
477
+ - `neftune_noise_alpha`: None
478
+ - `optim_target_modules`: None
479
+ - `batch_eval_metrics`: False
480
+ - `eval_on_start`: False
481
+ - `use_liger_kernel`: False
482
+ - `eval_use_gather_object`: False
483
+ - `average_tokens_across_devices`: False
484
+ - `prompts`: None
485
+ - `batch_sampler`: batch_sampler
486
+ - `multi_dataset_batch_sampler`: proportional
487
+
488
+ </details>
489
+
490
+ ### Training Logs
491
+ | Epoch | Step | Training Loss | Validation Loss |
492
+ |:------:|:----:|:-------------:|:---------------:|
493
+ | 0.2222 | 2 | 2.8177 | 2.5945 |
494
+ | 0.4444 | 4 | 2.9155 | 2.5693 |
495
+ | 0.6667 | 6 | 2.9114 | 2.5402 |
496
+ | 0.8889 | 8 | 2.7999 | 2.5098 |
497
+
498
+
499
+ ### Framework Versions
500
+ - Python: 3.12.4
501
+ - Sentence Transformers: 3.3.1
502
+ - Transformers: 4.48.0
503
+ - PyTorch: 2.5.1
504
+ - Accelerate: 1.3.0
505
+ - Datasets: 2.17.1
506
+ - Tokenizers: 0.21.0
507
+
508
+ ## Citation
509
+
510
+ ### BibTeX
511
+
512
+ #### Sentence Transformers
513
+ ```bibtex
514
+ @inproceedings{reimers-2019-sentence-bert,
515
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
516
+ author = "Reimers, Nils and Gurevych, Iryna",
517
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
518
+ month = "11",
519
+ year = "2019",
520
+ publisher = "Association for Computational Linguistics",
521
+ url = "https://arxiv.org/abs/1908.10084",
522
+ }
523
+ ```
524
+
525
+ #### MultipleNegativesRankingLoss
526
+ ```bibtex
527
+ @misc{henderson2017efficient,
528
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
529
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
530
+ year={2017},
531
+ eprint={1705.00652},
532
+ archivePrefix={arXiv},
533
+ primaryClass={cs.CL}
534
+ }
535
+ ```
536
+
537
+ <!--
538
+ ## Glossary
539
+
540
+ *Clearly define terms in order to be accessible across audiences.*
541
+ -->
542
+
543
+ <!--
544
+ ## Model Card Authors
545
+
546
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
547
+ -->
548
+
549
+ <!--
550
+ ## Model Card Contact
551
+
552
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
553
+ -->
config.json ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "nomic-ai/modernbert-embed-base",
3
+ "architectures": [
4
+ "ModernBertModel"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "bos_token_id": 50281,
9
+ "classifier_activation": "gelu",
10
+ "classifier_bias": false,
11
+ "classifier_dropout": 0.0,
12
+ "classifier_pooling": "mean",
13
+ "cls_token_id": 50281,
14
+ "decoder_bias": true,
15
+ "deterministic_flash_attn": false,
16
+ "embedding_dropout": 0.0,
17
+ "eos_token_id": 50282,
18
+ "global_attn_every_n_layers": 3,
19
+ "global_rope_theta": 160000.0,
20
+ "gradient_checkpointing": false,
21
+ "hidden_activation": "gelu",
22
+ "hidden_size": 768,
23
+ "initializer_cutoff_factor": 2.0,
24
+ "initializer_range": 0.02,
25
+ "intermediate_size": 1152,
26
+ "layer_norm_eps": 1e-05,
27
+ "local_attention": 128,
28
+ "local_rope_theta": 10000.0,
29
+ "max_position_embeddings": 8192,
30
+ "mlp_bias": false,
31
+ "mlp_dropout": 0.0,
32
+ "model_type": "modernbert",
33
+ "norm_bias": false,
34
+ "norm_eps": 1e-05,
35
+ "num_attention_heads": 12,
36
+ "num_hidden_layers": 22,
37
+ "pad_token_id": 50283,
38
+ "position_embedding_type": "absolute",
39
+ "reference_compile": false,
40
+ "repad_logits_with_grad": false,
41
+ "sep_token_id": 50282,
42
+ "sparse_pred_ignore_index": -100,
43
+ "sparse_prediction": false,
44
+ "torch_dtype": "float32",
45
+ "transformers_version": "4.48.0",
46
+ "vocab_size": 50368
47
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.3.1",
4
+ "transformers": "4.48.0",
5
+ "pytorch": "2.5.1"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fe1d23cbe9f9338b560e594fd47a556db4c321c8cb719c38a3b2a0db54660fab
3
+ size 596070136
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 8192,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": true,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,945 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "|||IP_ADDRESS|||",
5
+ "lstrip": false,
6
+ "normalized": true,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": false
10
+ },
11
+ "1": {
12
+ "content": "<|padding|>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "50254": {
20
+ "content": " ",
21
+ "lstrip": false,
22
+ "normalized": true,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": false
26
+ },
27
+ "50255": {
28
+ "content": " ",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": false
34
+ },
35
+ "50256": {
36
+ "content": " ",
37
+ "lstrip": false,
38
+ "normalized": true,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": false
42
+ },
43
+ "50257": {
44
+ "content": " ",
45
+ "lstrip": false,
46
+ "normalized": true,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": false
50
+ },
51
+ "50258": {
52
+ "content": " ",
53
+ "lstrip": false,
54
+ "normalized": true,
55
+ "rstrip": false,
56
+ "single_word": false,
57
+ "special": false
58
+ },
59
+ "50259": {
60
+ "content": " ",
61
+ "lstrip": false,
62
+ "normalized": true,
63
+ "rstrip": false,
64
+ "single_word": false,
65
+ "special": false
66
+ },
67
+ "50260": {
68
+ "content": " ",
69
+ "lstrip": false,
70
+ "normalized": true,
71
+ "rstrip": false,
72
+ "single_word": false,
73
+ "special": false
74
+ },
75
+ "50261": {
76
+ "content": " ",
77
+ "lstrip": false,
78
+ "normalized": true,
79
+ "rstrip": false,
80
+ "single_word": false,
81
+ "special": false
82
+ },
83
+ "50262": {
84
+ "content": " ",
85
+ "lstrip": false,
86
+ "normalized": true,
87
+ "rstrip": false,
88
+ "single_word": false,
89
+ "special": false
90
+ },
91
+ "50263": {
92
+ "content": " ",
93
+ "lstrip": false,
94
+ "normalized": true,
95
+ "rstrip": false,
96
+ "single_word": false,
97
+ "special": false
98
+ },
99
+ "50264": {
100
+ "content": " ",
101
+ "lstrip": false,
102
+ "normalized": true,
103
+ "rstrip": false,
104
+ "single_word": false,
105
+ "special": false
106
+ },
107
+ "50265": {
108
+ "content": " ",
109
+ "lstrip": false,
110
+ "normalized": true,
111
+ "rstrip": false,
112
+ "single_word": false,
113
+ "special": false
114
+ },
115
+ "50266": {
116
+ "content": " ",
117
+ "lstrip": false,
118
+ "normalized": true,
119
+ "rstrip": false,
120
+ "single_word": false,
121
+ "special": false
122
+ },
123
+ "50267": {
124
+ "content": " ",
125
+ "lstrip": false,
126
+ "normalized": true,
127
+ "rstrip": false,
128
+ "single_word": false,
129
+ "special": false
130
+ },
131
+ "50268": {
132
+ "content": " ",
133
+ "lstrip": false,
134
+ "normalized": true,
135
+ "rstrip": false,
136
+ "single_word": false,
137
+ "special": false
138
+ },
139
+ "50269": {
140
+ "content": " ",
141
+ "lstrip": false,
142
+ "normalized": true,
143
+ "rstrip": false,
144
+ "single_word": false,
145
+ "special": false
146
+ },
147
+ "50270": {
148
+ "content": " ",
149
+ "lstrip": false,
150
+ "normalized": true,
151
+ "rstrip": false,
152
+ "single_word": false,
153
+ "special": false
154
+ },
155
+ "50271": {
156
+ "content": " ",
157
+ "lstrip": false,
158
+ "normalized": true,
159
+ "rstrip": false,
160
+ "single_word": false,
161
+ "special": false
162
+ },
163
+ "50272": {
164
+ "content": " ",
165
+ "lstrip": false,
166
+ "normalized": true,
167
+ "rstrip": false,
168
+ "single_word": false,
169
+ "special": false
170
+ },
171
+ "50273": {
172
+ "content": " ",
173
+ "lstrip": false,
174
+ "normalized": true,
175
+ "rstrip": false,
176
+ "single_word": false,
177
+ "special": false
178
+ },
179
+ "50274": {
180
+ "content": " ",
181
+ "lstrip": false,
182
+ "normalized": true,
183
+ "rstrip": false,
184
+ "single_word": false,
185
+ "special": false
186
+ },
187
+ "50275": {
188
+ "content": " ",
189
+ "lstrip": false,
190
+ "normalized": true,
191
+ "rstrip": false,
192
+ "single_word": false,
193
+ "special": false
194
+ },
195
+ "50276": {
196
+ "content": " ",
197
+ "lstrip": false,
198
+ "normalized": true,
199
+ "rstrip": false,
200
+ "single_word": false,
201
+ "special": false
202
+ },
203
+ "50277": {
204
+ "content": "|||EMAIL_ADDRESS|||",
205
+ "lstrip": false,
206
+ "normalized": true,
207
+ "rstrip": false,
208
+ "single_word": false,
209
+ "special": false
210
+ },
211
+ "50278": {
212
+ "content": "|||PHONE_NUMBER|||",
213
+ "lstrip": false,
214
+ "normalized": true,
215
+ "rstrip": false,
216
+ "single_word": false,
217
+ "special": false
218
+ },
219
+ "50279": {
220
+ "content": "<|endoftext|>",
221
+ "lstrip": false,
222
+ "normalized": false,
223
+ "rstrip": false,
224
+ "single_word": false,
225
+ "special": true
226
+ },
227
+ "50280": {
228
+ "content": "[UNK]",
229
+ "lstrip": false,
230
+ "normalized": false,
231
+ "rstrip": false,
232
+ "single_word": false,
233
+ "special": true
234
+ },
235
+ "50281": {
236
+ "content": "[CLS]",
237
+ "lstrip": false,
238
+ "normalized": false,
239
+ "rstrip": false,
240
+ "single_word": false,
241
+ "special": true
242
+ },
243
+ "50282": {
244
+ "content": "[SEP]",
245
+ "lstrip": false,
246
+ "normalized": false,
247
+ "rstrip": false,
248
+ "single_word": false,
249
+ "special": true
250
+ },
251
+ "50283": {
252
+ "content": "[PAD]",
253
+ "lstrip": false,
254
+ "normalized": false,
255
+ "rstrip": false,
256
+ "single_word": false,
257
+ "special": true
258
+ },
259
+ "50284": {
260
+ "content": "[MASK]",
261
+ "lstrip": true,
262
+ "normalized": false,
263
+ "rstrip": false,
264
+ "single_word": false,
265
+ "special": true
266
+ },
267
+ "50285": {
268
+ "content": "[unused0]",
269
+ "lstrip": false,
270
+ "normalized": true,
271
+ "rstrip": false,
272
+ "single_word": false,
273
+ "special": false
274
+ },
275
+ "50286": {
276
+ "content": "[unused1]",
277
+ "lstrip": false,
278
+ "normalized": true,
279
+ "rstrip": false,
280
+ "single_word": false,
281
+ "special": false
282
+ },
283
+ "50287": {
284
+ "content": "[unused2]",
285
+ "lstrip": false,
286
+ "normalized": true,
287
+ "rstrip": false,
288
+ "single_word": false,
289
+ "special": false
290
+ },
291
+ "50288": {
292
+ "content": "[unused3]",
293
+ "lstrip": false,
294
+ "normalized": true,
295
+ "rstrip": false,
296
+ "single_word": false,
297
+ "special": false
298
+ },
299
+ "50289": {
300
+ "content": "[unused4]",
301
+ "lstrip": false,
302
+ "normalized": true,
303
+ "rstrip": false,
304
+ "single_word": false,
305
+ "special": false
306
+ },
307
+ "50290": {
308
+ "content": "[unused5]",
309
+ "lstrip": false,
310
+ "normalized": true,
311
+ "rstrip": false,
312
+ "single_word": false,
313
+ "special": false
314
+ },
315
+ "50291": {
316
+ "content": "[unused6]",
317
+ "lstrip": false,
318
+ "normalized": true,
319
+ "rstrip": false,
320
+ "single_word": false,
321
+ "special": false
322
+ },
323
+ "50292": {
324
+ "content": "[unused7]",
325
+ "lstrip": false,
326
+ "normalized": true,
327
+ "rstrip": false,
328
+ "single_word": false,
329
+ "special": false
330
+ },
331
+ "50293": {
332
+ "content": "[unused8]",
333
+ "lstrip": false,
334
+ "normalized": true,
335
+ "rstrip": false,
336
+ "single_word": false,
337
+ "special": false
338
+ },
339
+ "50294": {
340
+ "content": "[unused9]",
341
+ "lstrip": false,
342
+ "normalized": true,
343
+ "rstrip": false,
344
+ "single_word": false,
345
+ "special": false
346
+ },
347
+ "50295": {
348
+ "content": "[unused10]",
349
+ "lstrip": false,
350
+ "normalized": true,
351
+ "rstrip": false,
352
+ "single_word": false,
353
+ "special": false
354
+ },
355
+ "50296": {
356
+ "content": "[unused11]",
357
+ "lstrip": false,
358
+ "normalized": true,
359
+ "rstrip": false,
360
+ "single_word": false,
361
+ "special": false
362
+ },
363
+ "50297": {
364
+ "content": "[unused12]",
365
+ "lstrip": false,
366
+ "normalized": true,
367
+ "rstrip": false,
368
+ "single_word": false,
369
+ "special": false
370
+ },
371
+ "50298": {
372
+ "content": "[unused13]",
373
+ "lstrip": false,
374
+ "normalized": true,
375
+ "rstrip": false,
376
+ "single_word": false,
377
+ "special": false
378
+ },
379
+ "50299": {
380
+ "content": "[unused14]",
381
+ "lstrip": false,
382
+ "normalized": true,
383
+ "rstrip": false,
384
+ "single_word": false,
385
+ "special": false
386
+ },
387
+ "50300": {
388
+ "content": "[unused15]",
389
+ "lstrip": false,
390
+ "normalized": true,
391
+ "rstrip": false,
392
+ "single_word": false,
393
+ "special": false
394
+ },
395
+ "50301": {
396
+ "content": "[unused16]",
397
+ "lstrip": false,
398
+ "normalized": true,
399
+ "rstrip": false,
400
+ "single_word": false,
401
+ "special": false
402
+ },
403
+ "50302": {
404
+ "content": "[unused17]",
405
+ "lstrip": false,
406
+ "normalized": true,
407
+ "rstrip": false,
408
+ "single_word": false,
409
+ "special": false
410
+ },
411
+ "50303": {
412
+ "content": "[unused18]",
413
+ "lstrip": false,
414
+ "normalized": true,
415
+ "rstrip": false,
416
+ "single_word": false,
417
+ "special": false
418
+ },
419
+ "50304": {
420
+ "content": "[unused19]",
421
+ "lstrip": false,
422
+ "normalized": true,
423
+ "rstrip": false,
424
+ "single_word": false,
425
+ "special": false
426
+ },
427
+ "50305": {
428
+ "content": "[unused20]",
429
+ "lstrip": false,
430
+ "normalized": true,
431
+ "rstrip": false,
432
+ "single_word": false,
433
+ "special": false
434
+ },
435
+ "50306": {
436
+ "content": "[unused21]",
437
+ "lstrip": false,
438
+ "normalized": true,
439
+ "rstrip": false,
440
+ "single_word": false,
441
+ "special": false
442
+ },
443
+ "50307": {
444
+ "content": "[unused22]",
445
+ "lstrip": false,
446
+ "normalized": true,
447
+ "rstrip": false,
448
+ "single_word": false,
449
+ "special": false
450
+ },
451
+ "50308": {
452
+ "content": "[unused23]",
453
+ "lstrip": false,
454
+ "normalized": true,
455
+ "rstrip": false,
456
+ "single_word": false,
457
+ "special": false
458
+ },
459
+ "50309": {
460
+ "content": "[unused24]",
461
+ "lstrip": false,
462
+ "normalized": true,
463
+ "rstrip": false,
464
+ "single_word": false,
465
+ "special": false
466
+ },
467
+ "50310": {
468
+ "content": "[unused25]",
469
+ "lstrip": false,
470
+ "normalized": true,
471
+ "rstrip": false,
472
+ "single_word": false,
473
+ "special": false
474
+ },
475
+ "50311": {
476
+ "content": "[unused26]",
477
+ "lstrip": false,
478
+ "normalized": true,
479
+ "rstrip": false,
480
+ "single_word": false,
481
+ "special": false
482
+ },
483
+ "50312": {
484
+ "content": "[unused27]",
485
+ "lstrip": false,
486
+ "normalized": true,
487
+ "rstrip": false,
488
+ "single_word": false,
489
+ "special": false
490
+ },
491
+ "50313": {
492
+ "content": "[unused28]",
493
+ "lstrip": false,
494
+ "normalized": true,
495
+ "rstrip": false,
496
+ "single_word": false,
497
+ "special": false
498
+ },
499
+ "50314": {
500
+ "content": "[unused29]",
501
+ "lstrip": false,
502
+ "normalized": true,
503
+ "rstrip": false,
504
+ "single_word": false,
505
+ "special": false
506
+ },
507
+ "50315": {
508
+ "content": "[unused30]",
509
+ "lstrip": false,
510
+ "normalized": true,
511
+ "rstrip": false,
512
+ "single_word": false,
513
+ "special": false
514
+ },
515
+ "50316": {
516
+ "content": "[unused31]",
517
+ "lstrip": false,
518
+ "normalized": true,
519
+ "rstrip": false,
520
+ "single_word": false,
521
+ "special": false
522
+ },
523
+ "50317": {
524
+ "content": "[unused32]",
525
+ "lstrip": false,
526
+ "normalized": true,
527
+ "rstrip": false,
528
+ "single_word": false,
529
+ "special": false
530
+ },
531
+ "50318": {
532
+ "content": "[unused33]",
533
+ "lstrip": false,
534
+ "normalized": true,
535
+ "rstrip": false,
536
+ "single_word": false,
537
+ "special": false
538
+ },
539
+ "50319": {
540
+ "content": "[unused34]",
541
+ "lstrip": false,
542
+ "normalized": true,
543
+ "rstrip": false,
544
+ "single_word": false,
545
+ "special": false
546
+ },
547
+ "50320": {
548
+ "content": "[unused35]",
549
+ "lstrip": false,
550
+ "normalized": true,
551
+ "rstrip": false,
552
+ "single_word": false,
553
+ "special": false
554
+ },
555
+ "50321": {
556
+ "content": "[unused36]",
557
+ "lstrip": false,
558
+ "normalized": true,
559
+ "rstrip": false,
560
+ "single_word": false,
561
+ "special": false
562
+ },
563
+ "50322": {
564
+ "content": "[unused37]",
565
+ "lstrip": false,
566
+ "normalized": true,
567
+ "rstrip": false,
568
+ "single_word": false,
569
+ "special": false
570
+ },
571
+ "50323": {
572
+ "content": "[unused38]",
573
+ "lstrip": false,
574
+ "normalized": true,
575
+ "rstrip": false,
576
+ "single_word": false,
577
+ "special": false
578
+ },
579
+ "50324": {
580
+ "content": "[unused39]",
581
+ "lstrip": false,
582
+ "normalized": true,
583
+ "rstrip": false,
584
+ "single_word": false,
585
+ "special": false
586
+ },
587
+ "50325": {
588
+ "content": "[unused40]",
589
+ "lstrip": false,
590
+ "normalized": true,
591
+ "rstrip": false,
592
+ "single_word": false,
593
+ "special": false
594
+ },
595
+ "50326": {
596
+ "content": "[unused41]",
597
+ "lstrip": false,
598
+ "normalized": true,
599
+ "rstrip": false,
600
+ "single_word": false,
601
+ "special": false
602
+ },
603
+ "50327": {
604
+ "content": "[unused42]",
605
+ "lstrip": false,
606
+ "normalized": true,
607
+ "rstrip": false,
608
+ "single_word": false,
609
+ "special": false
610
+ },
611
+ "50328": {
612
+ "content": "[unused43]",
613
+ "lstrip": false,
614
+ "normalized": true,
615
+ "rstrip": false,
616
+ "single_word": false,
617
+ "special": false
618
+ },
619
+ "50329": {
620
+ "content": "[unused44]",
621
+ "lstrip": false,
622
+ "normalized": true,
623
+ "rstrip": false,
624
+ "single_word": false,
625
+ "special": false
626
+ },
627
+ "50330": {
628
+ "content": "[unused45]",
629
+ "lstrip": false,
630
+ "normalized": true,
631
+ "rstrip": false,
632
+ "single_word": false,
633
+ "special": false
634
+ },
635
+ "50331": {
636
+ "content": "[unused46]",
637
+ "lstrip": false,
638
+ "normalized": true,
639
+ "rstrip": false,
640
+ "single_word": false,
641
+ "special": false
642
+ },
643
+ "50332": {
644
+ "content": "[unused47]",
645
+ "lstrip": false,
646
+ "normalized": true,
647
+ "rstrip": false,
648
+ "single_word": false,
649
+ "special": false
650
+ },
651
+ "50333": {
652
+ "content": "[unused48]",
653
+ "lstrip": false,
654
+ "normalized": true,
655
+ "rstrip": false,
656
+ "single_word": false,
657
+ "special": false
658
+ },
659
+ "50334": {
660
+ "content": "[unused49]",
661
+ "lstrip": false,
662
+ "normalized": true,
663
+ "rstrip": false,
664
+ "single_word": false,
665
+ "special": false
666
+ },
667
+ "50335": {
668
+ "content": "[unused50]",
669
+ "lstrip": false,
670
+ "normalized": true,
671
+ "rstrip": false,
672
+ "single_word": false,
673
+ "special": false
674
+ },
675
+ "50336": {
676
+ "content": "[unused51]",
677
+ "lstrip": false,
678
+ "normalized": true,
679
+ "rstrip": false,
680
+ "single_word": false,
681
+ "special": false
682
+ },
683
+ "50337": {
684
+ "content": "[unused52]",
685
+ "lstrip": false,
686
+ "normalized": true,
687
+ "rstrip": false,
688
+ "single_word": false,
689
+ "special": false
690
+ },
691
+ "50338": {
692
+ "content": "[unused53]",
693
+ "lstrip": false,
694
+ "normalized": true,
695
+ "rstrip": false,
696
+ "single_word": false,
697
+ "special": false
698
+ },
699
+ "50339": {
700
+ "content": "[unused54]",
701
+ "lstrip": false,
702
+ "normalized": true,
703
+ "rstrip": false,
704
+ "single_word": false,
705
+ "special": false
706
+ },
707
+ "50340": {
708
+ "content": "[unused55]",
709
+ "lstrip": false,
710
+ "normalized": true,
711
+ "rstrip": false,
712
+ "single_word": false,
713
+ "special": false
714
+ },
715
+ "50341": {
716
+ "content": "[unused56]",
717
+ "lstrip": false,
718
+ "normalized": true,
719
+ "rstrip": false,
720
+ "single_word": false,
721
+ "special": false
722
+ },
723
+ "50342": {
724
+ "content": "[unused57]",
725
+ "lstrip": false,
726
+ "normalized": true,
727
+ "rstrip": false,
728
+ "single_word": false,
729
+ "special": false
730
+ },
731
+ "50343": {
732
+ "content": "[unused58]",
733
+ "lstrip": false,
734
+ "normalized": true,
735
+ "rstrip": false,
736
+ "single_word": false,
737
+ "special": false
738
+ },
739
+ "50344": {
740
+ "content": "[unused59]",
741
+ "lstrip": false,
742
+ "normalized": true,
743
+ "rstrip": false,
744
+ "single_word": false,
745
+ "special": false
746
+ },
747
+ "50345": {
748
+ "content": "[unused60]",
749
+ "lstrip": false,
750
+ "normalized": true,
751
+ "rstrip": false,
752
+ "single_word": false,
753
+ "special": false
754
+ },
755
+ "50346": {
756
+ "content": "[unused61]",
757
+ "lstrip": false,
758
+ "normalized": true,
759
+ "rstrip": false,
760
+ "single_word": false,
761
+ "special": false
762
+ },
763
+ "50347": {
764
+ "content": "[unused62]",
765
+ "lstrip": false,
766
+ "normalized": true,
767
+ "rstrip": false,
768
+ "single_word": false,
769
+ "special": false
770
+ },
771
+ "50348": {
772
+ "content": "[unused63]",
773
+ "lstrip": false,
774
+ "normalized": true,
775
+ "rstrip": false,
776
+ "single_word": false,
777
+ "special": false
778
+ },
779
+ "50349": {
780
+ "content": "[unused64]",
781
+ "lstrip": false,
782
+ "normalized": true,
783
+ "rstrip": false,
784
+ "single_word": false,
785
+ "special": false
786
+ },
787
+ "50350": {
788
+ "content": "[unused65]",
789
+ "lstrip": false,
790
+ "normalized": true,
791
+ "rstrip": false,
792
+ "single_word": false,
793
+ "special": false
794
+ },
795
+ "50351": {
796
+ "content": "[unused66]",
797
+ "lstrip": false,
798
+ "normalized": true,
799
+ "rstrip": false,
800
+ "single_word": false,
801
+ "special": false
802
+ },
803
+ "50352": {
804
+ "content": "[unused67]",
805
+ "lstrip": false,
806
+ "normalized": true,
807
+ "rstrip": false,
808
+ "single_word": false,
809
+ "special": false
810
+ },
811
+ "50353": {
812
+ "content": "[unused68]",
813
+ "lstrip": false,
814
+ "normalized": true,
815
+ "rstrip": false,
816
+ "single_word": false,
817
+ "special": false
818
+ },
819
+ "50354": {
820
+ "content": "[unused69]",
821
+ "lstrip": false,
822
+ "normalized": true,
823
+ "rstrip": false,
824
+ "single_word": false,
825
+ "special": false
826
+ },
827
+ "50355": {
828
+ "content": "[unused70]",
829
+ "lstrip": false,
830
+ "normalized": true,
831
+ "rstrip": false,
832
+ "single_word": false,
833
+ "special": false
834
+ },
835
+ "50356": {
836
+ "content": "[unused71]",
837
+ "lstrip": false,
838
+ "normalized": true,
839
+ "rstrip": false,
840
+ "single_word": false,
841
+ "special": false
842
+ },
843
+ "50357": {
844
+ "content": "[unused72]",
845
+ "lstrip": false,
846
+ "normalized": true,
847
+ "rstrip": false,
848
+ "single_word": false,
849
+ "special": false
850
+ },
851
+ "50358": {
852
+ "content": "[unused73]",
853
+ "lstrip": false,
854
+ "normalized": true,
855
+ "rstrip": false,
856
+ "single_word": false,
857
+ "special": false
858
+ },
859
+ "50359": {
860
+ "content": "[unused74]",
861
+ "lstrip": false,
862
+ "normalized": true,
863
+ "rstrip": false,
864
+ "single_word": false,
865
+ "special": false
866
+ },
867
+ "50360": {
868
+ "content": "[unused75]",
869
+ "lstrip": false,
870
+ "normalized": true,
871
+ "rstrip": false,
872
+ "single_word": false,
873
+ "special": false
874
+ },
875
+ "50361": {
876
+ "content": "[unused76]",
877
+ "lstrip": false,
878
+ "normalized": true,
879
+ "rstrip": false,
880
+ "single_word": false,
881
+ "special": false
882
+ },
883
+ "50362": {
884
+ "content": "[unused77]",
885
+ "lstrip": false,
886
+ "normalized": true,
887
+ "rstrip": false,
888
+ "single_word": false,
889
+ "special": false
890
+ },
891
+ "50363": {
892
+ "content": "[unused78]",
893
+ "lstrip": false,
894
+ "normalized": true,
895
+ "rstrip": false,
896
+ "single_word": false,
897
+ "special": false
898
+ },
899
+ "50364": {
900
+ "content": "[unused79]",
901
+ "lstrip": false,
902
+ "normalized": true,
903
+ "rstrip": false,
904
+ "single_word": false,
905
+ "special": false
906
+ },
907
+ "50365": {
908
+ "content": "[unused80]",
909
+ "lstrip": false,
910
+ "normalized": true,
911
+ "rstrip": false,
912
+ "single_word": false,
913
+ "special": false
914
+ },
915
+ "50366": {
916
+ "content": "[unused81]",
917
+ "lstrip": false,
918
+ "normalized": true,
919
+ "rstrip": false,
920
+ "single_word": false,
921
+ "special": false
922
+ },
923
+ "50367": {
924
+ "content": "[unused82]",
925
+ "lstrip": false,
926
+ "normalized": true,
927
+ "rstrip": false,
928
+ "single_word": false,
929
+ "special": false
930
+ }
931
+ },
932
+ "clean_up_tokenization_spaces": true,
933
+ "cls_token": "[CLS]",
934
+ "extra_special_tokens": {},
935
+ "mask_token": "[MASK]",
936
+ "model_input_names": [
937
+ "input_ids",
938
+ "attention_mask"
939
+ ],
940
+ "model_max_length": 8192,
941
+ "pad_token": "[PAD]",
942
+ "sep_token": "[SEP]",
943
+ "tokenizer_class": "PreTrainedTokenizerFast",
944
+ "unk_token": "[UNK]"
945
+ }