Viewing a single comment thread. View all comments

madmax_br5 OP t1_j625fr2 wrote on January 27, 2023 at 4:14 AM

Reply to comment by ww3ace in [D] Moving away from Unicode for more equal token representation across global languages? by madmax_br5

The token counts in my example were copied directly from OpenAI's tokenizer, so if not unicode-based, it is still representing logographs very inefficiently.