Storing Passwords Password Security


This section is about how you should be storing passwords and other access 'secrets' in your application's database. While we will refer to usernames and passwords in this section, the same applies for access tokens that you may be storing for use with OAuth or any other secret key that you need to authenticate against.

It is important to protect these things for two reasons. Firstly, you want only your real users to be logging in to your site, but possibly more importantly, people use the same password for more than one thing, and if a password on your site is compromised, the hacker can try Gmail, Hotmail, Paypal or many other popular sites, and probably get access to a lot more than just your application.

Describing the Problem

At first thought the issue seems easy enough, we store a username and a password. When the user submits the login form, we check that the password is correct for that username by retrieving them both from the database and doing a simple check.

form->password  ==  database->password
However, it isn't quite this simple. What happens if someone manages to break into your server and steal your database? Or if someone uses SQL Injection (coming up later in the tutorial) to output the passwords?

So you should encrypt the passwords right? That would be fine, but to encrypt the passwords you would have to have an encryption key, and where do you put this key? It could be hardcoded into your site's code, but if the hacker has broken in far enough to get your database, they can probably get the code as well, and there are other security exploits that may allow them to get the key.

Passwords should be stored in such a way that, should someone get full acccess to the server and database, they still cannot see the actual passwords.

Hashing Algorithms

Hashing is a process very similar to encryption with one key difference. It is one way only. So where I could normally encrypt and then decrypt some data, I can only 'encrypt', or hash, the data. With most hashing algorithms, this produces a standard sized chunk of relatively random data. For example:

MD5 ("Hello World") = b10a8db164e0754105b7a99be72e3fe5
SHA 256 ("Hello World") = d2a84f4b8b650937ec8f73cd8be2c74add5a911ba64df27458ed8229da804a26
Whatever the size of my input, whether it is "Hello World", or a large video file, the outputs of these algorithms will always be a specific size.

Obviously, if we can input any amount of data, but always get out the same amount, there must be collisions in the algorithm, i.e. two pieces of data which when hashed will produce the same result. This is a mathematical certainty because there are a finite number of hashing outputs, but an infinite number of inputs. However for a good algorithm, the probability of finding these two pieces of data is so astronomically high that we don't need to worry about it.

If we apply the hashing technique, this is how we are now storing the passwords:

hash( form->password )  ==  database->password_hash
When the password is first set we calculate its hash and store that, then each time we authenticate we hash the password the user has given and compare the hashes. Now, if someone manages to steal the database, all they have is a bunch of hashed passwords and they can't use an algorithm to convert these back into the plaintext passwords. However these are still very vulnerable to brute-force attacks as you will see on the next page. What can we do to reduce this risk?

Salting Passwords

Let's say your web application has a million users and the database gets stolen. You have hashed all the passwords you store though, so the hacker can't just see the plaintext. What will they do now?

For hashing algorithms to work, the same input must always produce the same output, so what if they hash 'password' and look to see which users have that hash stored in their database entry. The answer is that for a large enough dataset, like your million user website, they will get access to, on average, 4.7% of users accounts. This would take them almost no time at all to do.

Top 100 Passwords

Most of the risk with weak passwords is that people will simply guess users passwords for the site, and you can implement minimum password requirements to help prevent that, but that is not where you are really liable.

Hashing alone is clearly not enough so what we need to do is make the password 'password' or whatever else it may be, hash differently for each user. This forces an attacker to do a full brute-force attack which is often enough to dissuade them from attacking as it takes so long. The process of making passwords hash differently is called 'salting'.

Very simply, when first setting the password, rather than just hashing it, you generate a short random string to add to the beginning of it. It doesn't have to be long as it's not helping the security of the password, it just needs to be long enough that you probably won't generate it again within your number of users. After generating this salt, you add it to the password, hash as normal, and store the salt and hash in the database.

salt  =  random_string()
password  =  salt + password
database->password_hash  =  hash( password )
database->salt  =  salt
The to authenticate against the password you can just retrieve the salt and use it when you compare the hashes like this:
salt  =  database->salt
salted_password  =  hash( salt + form->password )
salted_password  ==  database->password_hash

Comparison of Hashing Algorithms

There are a few different algorithms that can be used to hash the data, but some of them are better than others so this is just supposed to be a quick guide to the best to use:

MD4

This is a very old algorithm and has been shown to have weaknesses in how random its output data is. It is possible to crack this quite easily on modern hardware and it should no longer be used. Old systems may still use it.

MD5

This is still quite old, but the successor to MD4. It is thought it probably does contain vulnerabilities, but they are not publicly known and may not be that important. It would be better not to use it if at all possible.

DES

There have been improvements to this, and it is still being used in many systems however the algorithm has a limit imposed which means any data over 8 characters is truncated. This means that the password 'helloworld' and 'hellowor' are considered the same by it. As you will learn later it is becoming very quick to crack passwords around 8 characters long so DES is no longer a good system to use.

SHA 1 and SHA 2

These are relatively new algorithms and have mostly superceeded MD5 as the standard for password storage. Many web frameworks such as Django and Ruby-on-Rails use these as the standard algorithm and they are considered very secure. However check back in 20 years and that might no longer be true!

bcrypt

While most hashing algorithms are designed for speed in calculating the 'digest' of very large amounts of data as fast as possible, this is not a good thing for password security. If someoneis attempting to brute-force passwords, you want to slow them down as much as possible. Yes, your password operations in your application will take more time, but it is not even worth taking this speed loss into account when optimising your application, it's almost nothing.
bcrypt is designed to take lots of time. It is built on the Blowfish encryption algorithm, one that is well trusted, and utilises it to perfom very good hashing. The use of bcrypt makes it impossible for brute-force attacks to be effective for many years to come

Conclusion

This is it for storing passwords. It is a very important concept to learn, even if you are using a framework that does it for you already, as just knowing what is going on in the background will make you more security aware and hopefully improve the security of your apps. When you have put all this together you should end up storing something like this in the password fields in your database (where the $ symbol is sued as the separator for the different parts of the data; algorithm, salt, digest):

sha1$c6711$22e5a330d2f88fc440fef7301b2dc406d1967a00