[ad_1]
The list of valid Wordle answers is available in the Wordle website’s source code. There are two lists of words: 2315 valid solutions and then 10657 other five letter words that no normal person would ever guess.
If you count the occurrences of the letters in the possible solutions, you get these values:
'e': 1233, 'a': 979, 'r': 899, 'o': 754, 't': 729, 'l': 719, 'i': 671, 's': 669, 'n': 575, 'c': 477, 'u': 467, 'y': 425, 'd': 393, 'h': 389, 'p': 367, 'm': 316, 'g': 311, 'b': 281, 'f': 230, 'k': 210, 'w': 195, 'v': 153, 'z': 40, 'x': 37, 'q': 29, 'j': 27})
There is only one valid solution made of the top five letters: ORATE. (OATER appears in list of valid guesses, but is not a valid solution, because what the hell is an OATER anyway? “A movie or television show about cowboy or frontier life; a western movie,” it seems.)
(AROSE would be the only valid solution if you take your frequencies from the Scrabble dictionary.)
There is no valid word made of the second group of 5 letters: LISNC, nor can we fix the problem by swapping C for U (LISNU). Our next best choice is swapping N for U (LISUC), which gives the valid word SULCI, “a depression or groove in the cerebral cortex.” (SULCI is not valid solution.)
But it’s not clear that deploying SULCI as your second choice is optimal. (If we take letter position into account, there may be better first choices than ORATE.)
After your first choice, the optimal move may be to set any green letters aside, and then come up with a word that places, say, 2 yellow letters in their most likely locations, and then fills in the remaining 3 letters with the highest frequency letters you haven’t eliminated. Delightfully, that’s actually pretty hard to do.
Above is an example of implementing that strategy. After discovering that the solution contained A and L in the first two guesses, I tried MANLY, which contained A and L in new locations, plus M, N, and Y, which were among the most common letters not yet tested.
Just to illustrate the idea, other decent choices might have been MADLY or AMPLY, though I think MANLY is slightly preferable because of the relative prevalence of N, compared to D or P.
In placing your yellow letters, you can try to follow the likelihood that a letter appears in each position, as described by the bar chart in Mahmood Hikmet’s explainer: https://youtu.be/hJJaYvxQh8w?t=125
This would suggest a few general targets for words after ORATE and SULCI:
- Put C and S at the beginning.
- A, I, O and U go in the middle.
- Put E, R, and T at the end.
@JaapScherphuis, in a comment below, suggests the pair LATER and SONIC, which uses the 10 most common letters.
@Mahmood Hikmet suggests that SLATE is the optimal first word, taking letter position into account (but I think he calculated just from the valid solutions, which seems suboptimal to me). But that would allow a second guess of ORCIN (some chemical found in colorless lichen, I guess?).
Also note that the solution set appears to omit plurals. I assume this is deliberate. The only solutions that end in S are:
['floss', 'crass', 'abyss', 'rebus', 'truss', 'focus', 'fetus', 'glass', 'class', 'chaos', 'dross', 'cress', 'bless', 'bliss', 'brass', 'cross', 'amass', 'bonus', 'locus', 'grass', 'ethos', 'humus', 'basis', 'guess', 'lupus', 'torus', 'minus', 'dress', 'amiss', 'mucus', 'gross', 'gloss', 'virus', 'chess', 'ficus', 'press']
There are actually a few plurals that slipped through by ending in I, like CACTI, FUNGI, and RADII, plus a few odd nouns like GEESE, TEETH, and WOMEN.
(I guess I should say that I recognize that this is not the full algorithm that you’re hoping for, but it seemed ridiculous to try to stuff all of this in a comment.)
If it’s useful, my frequency counting code and the word sets are available at: https://github.com/pingswept/wordle-strategy/blob/main/strategy.py
[ad_2]