Viewing a single comment thread. View all comments

Abdiel_Kavash t1_irj7klm wrote

Just a minor note, in your example, JATO is not a valid result, because you had already eliminated T.

Even so, why should choosing the "most popular" letter be the optimal strategy? Letter frequencies (Etaoin Shrdlu) are based on a corpus of English texts. Naturally the distribution of words in these texts is not uniform. The reason "E" and "T" are so high is partly because they occur in the word "The". If the game is selecting words uniformly at random from a dictionary, surely the frequency distribution of letters is going to be dramatically different.

I would imagine that an optimal strategy would involve building some decision tree over the set of all n-letter words in a dictionary, in a way that you eliminate as many different words as possible with each failed guess. Quick googling told me there are about 20,000 six-letter and 35,000 seven-letter words in English (the number of 4- and 5-letter words is much smaller). If we can get even 2 bits of information out of every guess on average, we should be able to uniquely determine the hidden word.

12

houstoncouchguy t1_irjaz07 wrote

Good analysis.

I would rebut that I pulled the information from a list of English words, and not the frequency of use in sentences. But that’s really just a side note. Removing ‘jato’ would not be necessary to remove the assured victory if you have one less guess, having already eliminated it before.

The real meat of it is showing that there are at least 8 words that could be chosen that have distinct letters which can not be differentiated using process of elimination alone, even if you have 2 of the 4 bits of information already available at your disposal.

It may or may not become more difficult with longer words. And I’m sure there are much more technical ways to break down the necessary burn-letters. But I feel that the 8-word answer is sufficient to answer the question.

2