hysse
hysse t1_j30tpx8 wrote
Reply to comment by jakderrida in [D] Simple Questions Thread by AutoModerator
Thanks for the answer. I need to train a relatively large model and I need an efficient tokenizer.
I don't see how a tokenizer written in pytorch (or tensorflow) can be faster than a HuggingFace tokenizer (for example). HuggingFace has a rust backend that make the tokenizer faster and I guess that torchtext has an optimized backend too.
Knowing that the tokenizer run in cpu and not gpu, how can it run faster if I wrote it with pytorch (or even in python) ?
hysse t1_j2qqwsf wrote
Reply to [D] Simple Questions Thread by AutoModerator
Which tool is the best to train a tokenizer ? HuggingFace library seems the simplest one but is it the most efficient (computing) ? If yes, what torchtext, nltk... are useful for ?
hysse t1_j30ub8q wrote
Reply to comment by jakderrida in [D] Simple Questions Thread by AutoModerator
Haha I knew it. Unfortunately, I don't think ChatGPT can give a good answer to that question...