Viewing a single comment thread. View all comments

mikeholczer t1_j6bb50x wrote

If one follows your steps, the most common outcome is one letter and there has no between-letter patterns which clearly doesn’t match the between-letter patterns of the source data.

−2

kilopeter OP t1_j6bbmom wrote

It does if you include the placeholder "characters" for the start and end of each name! The most probable "name" A represents three tokens: [name start], A, [name end]. And if you generate many names using the transition matrix, you will indeed observe that the frequency of [name start] -> A and A -> [name end] matches the corresponding frequencies in the source data.

EDIT: on reflection, I agree with you. I should introduce the heatmap as a description of transition probabilities, but should avoid walking the reader through using the transition matrix to generate new "names." I should separate the topic of generating new names using the transition matrix under the (invalid) Markov assumption as a diversion. Thanks for pointing out the flaw in my explanation. I'll edit my top level comment when I have a chance!

2

globglogabgalabyeast t1_j6bwjuz wrote

Did you already edit it? Cause I never got the impression that you were implying this process would lead to realistic names

2

kilopeter OP t1_j6ea6zq wrote

Nah, I'm only just now getting a chance to edit my top-level comment. Thanks for throwing in your vote! I feel like I can reword the "interpretation" part better to avoid any possible misinterpretation.

1