mikeholczer t1_j6bb50x wrote on January 29, 2023 at 2:39 AM

Reply to comment by kilopeter in Transition probabilities (shown as percentages) between successive letters in the names of girls born in 2021 in the USA [OC] by kilopeter

If one follows your steps, the most common outcome is one letter and there has no between-letter patterns which clearly doesn’t match the between-letter patterns of the source data.

kilopeter OP t1_j6bbmom wrote on January 29, 2023 at 2:43 AM

It does if you include the placeholder "characters" for the start and end of each name! The most probable "name" A represents three tokens: [name start], A, [name end]. And if you generate many names using the transition matrix, you will indeed observe that the frequency of [name start] -> A and A -> [name end] matches the corresponding frequencies in the source data.

EDIT: on reflection, I agree with you. I should introduce the heatmap as a description of transition probabilities, but should avoid walking the reader through using the transition matrix to generate new "names." I should separate the topic of generating new names using the transition matrix under the (invalid) Markov assumption as a diversion. Thanks for pointing out the flaw in my explanation. I'll edit my top level comment when I have a chance!

globglogabgalabyeast t1_j6bwjuz wrote on January 29, 2023 at 5:44 AM

Did you already edit it? Cause I never got the impression that you were implying this process would lead to realistic names

kilopeter OP t1_j6ea6zq wrote on January 29, 2023 at 7:09 PM

Nah, I'm only just now getting a chance to edit my top-level comment. Thanks for throwing in your vote! I feel like I can reword the "interpretation" part better to avoid any possible misinterpretation.