github en-wl/wordlist rel-2026.02.25
2026.02.25

8 hours ago

Overview

This is a dictionary-only release and the first since transitioning to the new database format. An official release of SCOWL itself will not be issued until the underlying architecture stabilizes.

Word Changes

  • Over 1,500 new high frequency words from the Corpus of Contemporary American English (COCA), selected by analyzing frequency data from the corpus and using LLMs to help filter the results.
  • Over 300 new hand-selected words from GitHub issues. This includes many modern additions, such as:
    • Tech & Digital: ChatGPT, LLM, codebase and tokenize/tokenization.
    • Science & Medical: AstraZeneca, BioNTech, Moderna, and psilocybin
    • Culture & Society: neurodiversity, influencer, doomscrolling, and staycation
  • Numerous manual cleanups reported in GitHub issues; for example, Kyiv was added and Kiev was removed from the default dictionary and several variant issues were fixed such as axe/ax, peddler/pedlar and yogurt/yoghurt/yogourt.
  • Removal of around 80 uncommon closed forms of compound words such as highhanded.
  • Removal of around 120 uncommon forms of words such as rarely used -er or -est adjective forms and obscure verb forms.
  • Major cleanup of the possessive forms of nouns.
  • Misc. other cleanups due to the change to the new database format.

Other Changes

  • Words with diacritical marks (e.g., naïve) are now treated as normal variants. If lists were previously generated without the strip option, some of these may now be missing from the output, as the marked version is officially considered a variant.
  • The Unicode (U+2019) character was added to WORDCHARS in the Hunspell affix file so that Hunspell can recognize words with the apostrophe. Based on testing, this should allow Hunspell to recognize both can't and can’t. The ASCII single quote at the end of the word won't be considered part of the word, but the Unicode character will. This means 'color' is okay, but ‘color’ will get flagged when Hunspell does the tokenization.
  • Simplified copyright statement.

Final Notes

The transition to the new database format represents a major structural change. Anyone using SCOWL directly to create wordlists or speller dictionaries will likely need to update their scripts.

Previously, the original SCOWL (SCOWLv1) compiled information into a set of simple, separate text lists that could be combined to create dictionaries of various sizes and dialects (American, British with both -ise and -ize, Canadian, and Australian). SCOWLv2 replaces this approach by consolidating all of that information into a single master text file and an SQLite3 database.

Don't miss a new wordlist release

NewReleases is sending notifications on new releases.