heegyu commited on
Commit
83acb85
·
1 Parent(s): 9574aef

한국어위키, 나무위키, aihub 구어 웹데이터 1%만 사용해서 8k 토크나이저 학습

Browse files
Files changed (3) hide show
  1. merges.txt +0 -0
  2. tokenizer_config.json +1 -1
  3. vocab.json +0 -0
merges.txt CHANGED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json CHANGED
@@ -18,7 +18,7 @@
18
  "single_word": false
19
  },
20
  "errors": "replace",
21
- "name_or_path": "tokenizer-8k-kowiki",
22
  "pad_token": null,
23
  "special_tokens_map_file": null,
24
  "tokenizer_class": "GPT2Tokenizer",
 
18
  "single_word": false
19
  },
20
  "errors": "replace",
21
+ "name_or_path": "tokenizer-8k",
22
  "pad_token": null,
23
  "special_tokens_map_file": null,
24
  "tokenizer_class": "GPT2Tokenizer",
vocab.json CHANGED
The diff for this file is too large to render. See raw diff