Commit History
Pin protobuf dep
7d83c88
Hack some Dutch tokenizers into it
55df72d
update
f331792
update
9558ae0
add compression leaderboard
1b7fc74
update compress rate
367a536
update compress rate
988921c
update
7d2062e
add grok mixtral
480ae5d
add compress rate
814ee6b
add zephyr
a6aee1d
requi
6b70021
requi
510279b
requi
7b522e7
config python version
11379e2
add xlm-roberta
057bc67
add amber and crystal_coder
5db13e0
add character glm
f0f84b2
update
f02dd94
fix PyO3PanicException
2461705
fix unicode error: 'unicodeescape' codec can't decode bytes in position 602-608: unknown Unicode character name
bce41d0
fix fastchat_t5_3b
c766a08
fix tiktoken special tokens
adcfb97
add aya
44c3329
fix olmo
2442c83
add olmo tokenizer
bbefe94
update
24b4aa5
update
1f833af
fix tiktoken
a6c67ec
fix gemma_7b
7011963
add gemma_7b
9c8ace5
add more tokenizer
5425d5d
fix tokenize
e6543ac
add more tokenizer
c75633b
update
6bdf6c6
update
9820e00
fix chatglm; new feature about add_special_tokens;
d27a756
update
a37f943
Merge branch 'main' of hf.co:spaces/eson/tokenizer-arena
0415b36
add more tokenizer
d2417c7
Update README.md
fab95c3
xu song
commited on