I recently wrote about ways to manage passwords and I wanted to expand on that by sharing some basics of how passwords are stored and used by computers.
Passwords are generally stored in lists or tables in a computer file. To authenticate a user, the entered password is compared against the stored password for that user. Let’s say I have a username of jeremy-56 and password of unbreakable. When I enter my username, jeremy-56 and password, unbreakable into a web site, the server will lookup the username and password entered and compare it to the stored password for jeremy-56. If it matches, I will be granted permission to access the content, otherwise I will be denied access.
In a perfect world, passwords are not stored in plain text. Hackers can and do often obtain these password files so obviously we want something that others won’t be able to read or use. This is accomplished by encrypting or hashing passwords before storing. Common hashing algorithms are SHA-1 and MD5, also known as one-way encryption. To check a hashed password, the web server hashes the entered password using its preferred algorithm and compares it to the hash string (number)stored which was also hashed in the same manner. If the strings match, access is granted. In the perfect world, a website or administrator never has the ability to “see” a password as they are transported and stored only in hash form. I’ll go over one popular hash, the MD5, next.
MD5 stands for Message Digest Algorithm Series 5. This is a fifth generation algorithm developed in 1991 for data integrity (note: NOT specifically developed for passwords, which happens to be an inherent weakness of MD5). The MD5 algorithm generates a fixed-length 128 bit value for a variable length message. It is a checksum for messages of all lengths. The value A will generate a hash *a5c8ce6978e46813e453d0277e47ea53 *whereas the value *a *will generate *0cc175b9c0f1b6a831c399e269772661. *The contents of Wikipedia might generate a hash similar to *11b921ef080f7736089c757404650e40. *The hash is always 128 bits long no matter the length of the input value.
I should note that perfect security is only achieved with a key length equal or greater to the message length so we are stuck with imperfect but more practical security measures. A crypto visionary in the 1940s, Claude Shannon published mathematical proofs related to perfect security. The theory of perfect security forms a basis for the science of crypto in spite of its impracticality in encrypting data due to the key length requirements. (Despite its name, perfect security has many vulnerabilities. This post is not the place to write about them.)
A short definition of hexadecimal numbers is in order before diving into MD5: Computers operate in 1s and 0s; numbers by extension. Hexadecimal numbers such as the hashes above are simply base 16 numbers—instead of counting 1-10, hex counts 1-16, or 1-10+a-f, where the letters a-f stand for numbers 11-16. The decimal number 1,000 is 3e8 in hex. The hash of A *(a5c8ce6978e46813e453d0277e47ea53) *equals *220,365,265,208,968,000,000,000,000,000,000,000,000 *in decimal form. As you can see, base 16 hex is easier to define really big numbers. Other common uses for hexadecimal numbers are in MAC and IPv6 addresses.
Hashes are often used to verify data integrity during file transfers. A file or batch of files (software for example) is hashed before downloading. After downloading is complete it is re-hashed and the strings are compared. If even a single bit has been tampered with or is missing it will generate a different hash creating a mismatch, and the user is alerted that the file is corrupt.
MD5 follows several rules:
1) It must always generate the same hash for the same input value.
2) It must avoid collisions, or duplicate hashes for different input values.
3) It must not be reversible, i.e. the input message must not be able to be derived from the hash.
Obviously there are bound to be collisions, since the hash has a finite number of unique hashes—10^24 to be exact, while the number of unique input values/combination is virtually infinite. The recent Flame malware exploited this collision vulnerability, and it took some very very smart people to figure out how to take advantage of relatively rare collisions. For now, though, a list of hashed passwords is much safer than plain text passwords.
The MD5 algorithm is public and is defined at length in this Wikipedia article. It involves splitting the message into blocks and rotating different layers of numbers, sort of like mixing up a giant rubix cube. It can be done on paper but it’s very tedious. Computers, however, are very good at these tasks (XOR, AND, OR, and NOT operations) and can quickly flip a message into a hash.
Other hashes such as SHA use a similar block cipher design although their algorithms differ significantly from MD5.
Since hashes are not reversible, hackers came up with the idea of rainbow tables, or precomputed lists, to make sense of the hashed password lists they found. These tables are lists of precomputed hashes sorted by hash value and length of input. Common passwords, names, and dictionary words are fed into a computer and hashed into MD5 and other hashes. These tables grow with each security breach as more known passwords are fed into these lists. A hacker in possession of rainbow tables can quickly and easily convert hashes obtained in a security breach into plain text passwords.
That is why it is so important to never use the same password twice and never use passwords consisting of names or dictionary words. A person can Google an MD5 hash and find a list of corresponding passwords. If you use an obscure password, chances are slim that the hash will be in a rainbow table, and if it does it will likely correspond to a different password. Moreover, if it is unique to that website, the hacker will not be able to use the discovered password on your other online accounts.
Where there is an exploit, there will soon be a defense. Salting passwords is a security measure taken by the entity storing passwords. Every website should use this technique at a minimum to ensure complete security of its customers’ passwords in the event of a breach.
Salting is done by adding a value to plain text passwords before making a hash. There are many ways of salting; the most secure would be to make a hash and add it to the plain text password—and then hash the whole thing, rendering rainbow tables completely useless against these hashes. Recent LinkedIn and eHarmony breaches released millions of unsalted passwords, a gross negligence on the part of the security administrators. Salting is easy for computers to perform and adds a thick layer of protection to stored passwords.
Unfortunately, too many online companies have poor password management policies. Some even store in plain text, a disaster when the next security breach occurs. It’s almost impossible to prevent breaches but it’s relatively easy to follow best practices and implement secure password policies. In the event of a breach, passwords will be unusable, which in turn reduces the incentive for a hacker to attempt a breach in the first place.
I hope this is interesting and informative. I enjoy digging into these topics and figuring out how these things work and break. Much much more could be said; I’ve only scratched the surface and given a summary of the most basic practices in use. Hopefully this will help clarify what happens when a password is entered into a website, and ultimately prompt better password habits. For further information including complete algorithms and history, look stuff up on Wikipedia and the Interwebs. A few books I found interesting (if you’re really nerdy) are listed below.