Tiktokenizer.js
Token Count
A custom tokenizer visualizer written in pure JavaScript that mirrors the functionality of OpenAI's
GPT-2/GPT-3
Byte
Pair Encoding (BPE) tokenizer to showcase how text is tokenized
into subword
units. The encoder.json and vocab.bpe files provided by OpenAI are used here so the tokens IDs
are
exactly matches the official BPE (GPT-2/GPT-3) representation. Currently, the dropdown on top is just a
placeholder
to add more schemes in future. Try different characters including ASCII, emojis, and non-English languages!
Did you notice any difference from the official tokenizer 😎 Check out the GitHub Repo for this
project. See more projects here.