danielhanchen commited on
Commit
db77853
·
1 Parent(s): 7877031

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -9,7 +9,7 @@ Original model from https://huggingface.co/openlm-research/open_llama_3b_600bt_p
9
  This repo includes:
10
  1) Ported `LlamaTokenizer` to `LlamaTokenizerFast` via a few lines of code.
11
  Loading via `AutoTokenizer` takes 3 to 4 minutes. Now, a few seconds!
12
- ```
13
  from transformers import LlamaTokenizerFast
14
  from tokenizers import AddedToken
15
  tokenizer = LlamaTokenizerFast.from_pretrained(
@@ -20,6 +20,7 @@ tokenizer = LlamaTokenizerFast.from_pretrained(
20
  unk_token = AddedToken("<unk>", single_word = True),
21
  pad_token = AddedToken("<unk>", single_word = True)
22
  )
23
- ```
 
24
  2) `AutoTokenizer` does not recognize the BOS, EOS and UNK tokens. All tokenizations weirdly prepend 0 and append 0 to the end, when actually, you're supposed to prepend 1 and append 2.
25
  3) Manually added BOS `<s>`, EOS `</s>`, UNK `<unk>` tokens, with PAD (padding) being also the `<unk>` token.
 
9
  This repo includes:
10
  1) Ported `LlamaTokenizer` to `LlamaTokenizerFast` via a few lines of code.
11
  Loading via `AutoTokenizer` takes 3 to 4 minutes. Now, a few seconds!
12
+ ```
13
  from transformers import LlamaTokenizerFast
14
  from tokenizers import AddedToken
15
  tokenizer = LlamaTokenizerFast.from_pretrained(
 
20
  unk_token = AddedToken("<unk>", single_word = True),
21
  pad_token = AddedToken("<unk>", single_word = True)
22
  )
23
+ tokenizer.push_to_hub("open_llama_3b_600bt_preview")
24
+ ```
25
  2) `AutoTokenizer` does not recognize the BOS, EOS and UNK tokens. All tokenizations weirdly prepend 0 and append 0 to the end, when actually, you're supposed to prepend 1 and append 2.
26
  3) Manually added BOS `<s>`, EOS `</s>`, UNK `<unk>` tokens, with PAD (padding) being also the `<unk>` token.