Demystifying Wordle: A Crash Course in Information Theory

By Sonia Choy 蔡蒨珩

 

The game Wordle took the world by storm last year – you might have seen your friends posting green and yellow boxes on social media, claiming that they have solved this daily word puzzle in three guesses, or that dreaded “X/6,” which means that they didn’t manage to crack it. When one considers what first word to guess, it might be tempting to randomly put a five-letter word at the beginning, but this can actually be reduced to a scientific question. It is not hard to see that some words would be a better first guess than others; for example, the word “FUZZY” would be far less ideal than “RAISE”, since the letters in the former occur far less often than the letters in the latter. What, then, is one’s best shot at cracking the puzzle?

 

What Is Wordle?

We are assuming readers know how Wordle functions. For those who do not, here is a quick crash course.

 

Wordle’s database is made of 2,315 five-letter words picked by the creator of the game as solutions, and a pool of approximately 13,000 five-letter words that are valid guesses (which include the 2,315 words above, and many more words that are not commonly used) [1]. Each day, one word from the database is selected to be the answer to the puzzle. If your guess has a letter that is in the word and in the same position, the letter box shown will be green; if the guess has a letter that is in the word but not in the correct position, the letter box shown will be yellow; otherwise the box is gray.

 

How to Define “Informative”?

To give a satisfying answer, we first need to quantify what is meant by “more useful”. A “useful” guess gives us more information; but how do we quantify information? Luckily for us, this was done in the 1940s by Claude Shannon, the father of information theory.

 

Shannon defined information by the following equation: , where p is the probability of the event happening. You may ask, why the logarithm function? Recall a property of the logarithm from high school: . If we have two independent events happening each with a probability of p1 and p2 respectively, then the probability of them occurring together is p1p2, or:

So the multiplicity of probability is captured in the amount of information it gives. Information is typically measured in bits; in the case of Wordle, it basically means how many times a word can reduce the number of possible choices into halves.

 

It is rather unlikely for “FUZZY” to be the first hit. Suppose it returns five gray squares – what information do these squares give us?

 

Using the above two first guesses (“FUZZY” and “RAISE”) as an example, the probability of “F” occurring in an English word is approximately 2.2% (Table 1) [2], so the probability that it does not occur is 97.8% or 0.978. We can find out the probabilities for each letter in our guesses, and decide that the combined information “FUZZY” gives is 0.093 bits (footnote 1):

 

What if “RAISE” turns out to have all five gray guesses? We have:

 

Letter

Relative Frequency

Letter

Relative Frequency

F

2.2%

R

6.0%

U

2.8%

A

8.2%

Z

0.074%

I

7.0%

Y

2.0%

S

6.3%

 

E

13%

Table 1 Selected relative letter frequencies in English language (footnote 2) [2].

 

Note that the more unlikely an event is, the more information is provided if it occurs. For example, the letter E is more unlikely to be absent in a word than the letter Z, so the missing of the letter E can narrow our search down very much. In other words, the corresponding information a gray square E gives is more than a gray square Z after a round of guessing. This is why obtaining five gray squares when guessing “RAISE” would be more informative than obtaining five gray squares after guessing “FUZZY”.

 

Shannon Entropy and the Information Provided by a Guess

The calculation above gave us a sense of how informative a guess could be if we unluckily got five gray squares. However, in some cases, we may get a mixture of gray, yellow and green squares, say gray-yellow-gray-gray-gray, or even green-yellow-gray-gray-green. By analyzing the list of 2,315 answers, for a single guess, we can come up the exact probability of getting each possible pattern, and its corresponding amount of information under that pattern.

 

Taking all cases into account, we can calculate the weighted average of all information given by a word. This average is often called Shannon entropy, which is not directly related to the entropy in physics. Then it is possible to rank all words by the information it provides in the first guess. This has been done by multiple people, including the YouTuber Grant Sanderson (3Blue1Brown).

 

What Are the Best First Guesses?

The first guess that gives the most information is “SOARE” (5.89 bits), an obsolete term meaning a young hawk [3].

 

Can we do better? We can also look at the next few guesses. Wordle is not just a one-guess game; your next guesses also matter, and it can be useful to see how the next guess plays out when you use your first word. By also considering the average information obtained from an optimal second guess, we get that the best first guess is “SLANE”, a type of spade in Ireland, giving an average of 10.04 bits in the first two guesses [3].

 

Some readers may also be concerned about winning Wordle with a minimal number of guesses. With the top 250 first guesses generated by considering the first two guesses, researchers ran a simulation to find out the actual performance of these guesses in the 2,315 games. They found that, “SALET”, meaning a medieval helmet, is the winner, with the computer winning the game in an average of 3.412 times out of six [4].

 

But truth be told, the first few contenders are a close race. If you are looking for a word for your next first guess that isn’t too obscure, “CRATE” is a good choice that is not too far behind the above list of obscure words, giving 10.01 bits of information in the first two guesses, and passing with 3.434 guesses on average [3]. We are not suggesting that you recite the list of possible solutions and analyze every move like a computer, but having a good first guess should be a good way to start off the day with a good shot at a puzzle.

 

1 Editor’s remark: The second letter Z can indeed provide extra information. If both your guess and the answer contain two identical letters, say “FUZZY” and “WHIZZ”, both boxes of Z’s will turn yellow and/or green to confirm that the answer contains two Z’s.

 

2 Editor’s remark: The source of text (e.g. general documents, dictionary) can affect the values. Table 1 shows the frequencies of letters that appear in English documents of all types [2]; a more accurate way of estimating the probability in our case is to check the probability of each letter appearing in each slot according to the list of words in the New York Times code. For simplicity’s sake, values concerning the general English language were used here, but the more accurate way has already been done by the YouTuber Grant Sanderson (3blue1brown) [3].


References:

[1] Glaiel, T. (2021, December 30). The mathematically optimal first guess in Wordle. Medium. https://medium.com/@tglaiel/the-mathematically-optimal-first-guess-in-wordle-cbcb03c19b0a

[2] Lewand, R. E. (2000). Cryptological Mathematics. MAA Press.

[3] Sanderson, G. (2022). [3Blue1Brown Wordle video source files] [Source code]. GitHub. https://github.com/3b1b/videos/tree/master/_2022/wordle

[4] Selby, A. (2022, January 19). The best strategies for wordle. https://sonorouschocolate.com/notes/index.php?title=The_best_strategies_for_Wordle