This document capturing the thoughts and ideas for the Tokenizer interface changes. The goal is to add or change the APIs that allow the callers to have more control over the memory allocation and performance. The Tokenizer class is the main class used for tokenization and that will be used by the model to encode and decode the input and output. When getting the right shape of this class, we can know the exact other changes will need to be done in the other main interfaces like the Model and PreTokenizer.
public class Tokenizer
{
public Tokenizer(Model model, PreTokenizer? preTokenizer = null, Normalizer? normalizer = null) { }
public Model Model { get; }