RELEASE 2.0!
Get reminded of the absolute worst passwords using the @WorstPasswords twitter account!
- More accurate!
- More inclusive!
- Includes analysis!
- Never needs winding!
- Non-Iron!
- Completely Overhauled!
Tasklist for 2.0 Included:
- Acquire more source files for larger corpus and more accurate results
- Include Non-ASCII Characters
- Acquire higher-quality sources, with fewer non-password results.
- Avoid duplicate lines
- Include Analysis with Version Release
- Include HashCat Ruleset
More Sources, Larger Corpus, More Accurate Results and Higher-Quality Sources
Release one used about 300 files. I did not filter them very well. Some "password" wordlists contained non-password lines. This time, I pulled out all the md5 and SHA1 hashes that had somehow weaseled their way in, removed files that contained nothing but lists of things like cartoon characters and made sure the lists were as pure as possible.
Release two uses over 1500 files. I attempted to clean each one, which in some cases required a judgement call on my end.
V1 Popularity was determined by appearances in as few as 2 files. Since some lists are compilations of smaller lists, this resulted in some lines that really appeared only once to make the cut. More files allowed me to raise the minimum threshold to 5 appearances.
Duplicate Lines
Duplicates in Release one were caused by a mismatch of newline characters. Not this time. Every single processing step included a tr -d '\r'
step. I wasn't going to have them slip in.
Non-ASCII and The Blankspace Judgement Calls
For V1, I made a judgement call (based on no evidence) that I wasn't going to include lines that contained blankspace characters. Since I was already disregarding ASCII characters, I thought this wouldn't cause much of a problem.
In V2, I included lines that had non-ASCII characters. This opens up the entire non-American English speaking world of passwords! I also decided not to remove blankspace characters - on the whole. These characters were rare enough to leave in without causing a flood of duplicates.
In some files, a blankspace character was at the end of every line. In these cases, I would remove the final blankspace character from all lines. However, some files did not have consistency when it came to beginning or ending with blankspace characters. In this instance, I would leave them in place, since I had reason to believe the blankspaces were part of the data.
Analysis and Hashcat Rules
While I haven't visualized them yet, I included mask analysis for the files in this release. These have been ranked in order of popularity.
I have also included HashCat rulesets, also in order of popularity.
Closed Issues
- Provide Password Occurrences - this was for the purpose of mask/rule generation. I have simply provided the rules and masks.
- Wordlists Don't Contain Non-ASCII Characters - they do now!
- Rev 2 isn't released yet - If you're reading this, it is!
- Duplicate Entries Found on WPA Lists - Avoided from the beginning, barring the potentials related to blankspace characters address above
Outstanding/New Issues
- List not filtered properly (maybe.) - had to make a judgement call here to err on the side of inclusivity.
- Some duplicates may appear due to newlines - another judgement call
- In Compressed versions, Top1575-probable-v2.txt is named Top1575-probable2.tx - just a file's name. No big deal.