mikeholczer t1_j6bb50x wrote
Reply to comment by kilopeter in Transition probabilities (shown as percentages) between successive letters in the names of girls born in 2021 in the USA [OC] by kilopeter
If one follows your steps, the most common outcome is one letter and there has no between-letter patterns which clearly doesn’t match the between-letter patterns of the source data.
kilopeter OP t1_j6bbmom wrote
It does if you include the placeholder "characters" for the start and end of each name! The most probable "name" A represents three tokens: [name start], A, [name end]. And if you generate many names using the transition matrix, you will indeed observe that the frequency of [name start] -> A and A -> [name end] matches the corresponding frequencies in the source data.
EDIT: on reflection, I agree with you. I should introduce the heatmap as a description of transition probabilities, but should avoid walking the reader through using the transition matrix to generate new "names." I should separate the topic of generating new names using the transition matrix under the (invalid) Markov assumption as a diversion. Thanks for pointing out the flaw in my explanation. I'll edit my top level comment when I have a chance!
globglogabgalabyeast t1_j6bwjuz wrote
Did you already edit it? Cause I never got the impression that you were implying this process would lead to realistic names
kilopeter OP t1_j6ea6zq wrote
Nah, I'm only just now getting a chance to edit my top-level comment. Thanks for throwing in your vote! I feel like I can reword the "interpretation" part better to avoid any possible misinterpretation.
Viewing a single comment thread. View all comments