General Knowledge

Hashing Vs. Encryption

There is a common misunderstanding that hashing and encryption are the same thing. They are not. Hashing is non-reversable. Take the following example:

$ echo -n Password123 | md5sum
42f749ade7f9e195bf475f37a44cafcb -

We pass the string "Password123" to the MD5 algorithm (algo), it does its math, and returns the resulting hex encoded hash. The only way to get the same value of the hashed output, is to feed the algo the original input. There are collisions, but we can talk about those later.

Most hashing algo's output hex encoded binary strings of a fixed length. Others, such a this example, use base64 encoded strings as output. Notice the length is always the same:

{SHA}uNF8eZRJ8jmr8WTjyocRJPVpe7w=
{SHA}lqdu/Zr6o2dgIER8Up1/7lcUtgw=
{SHA}qYUOwLMlDEuukA5HCT4LR1kQzco=
{SHA}MXZBCpyWJ7TTs1w2kgGJslwNwTg=
{SHA}2sAPLaUh9Mz0bI+XxKEg7qyABe8=

TL;DR

Encryption is reversable, hashing is not

I don't want to talk about encryption too much (because that stuff is for nerds), but it's important to be able to differentiate between a hashed string, and an encrypted string. Encryption often pads the string to meet a specific length before encryption happens. It also requires a key (or password) to decrypt. If encrypted passwords are in use, the length of the string will vary depending on the length of the input. If you see a bunch of ciphertext'd passwords that have varying lengths, you're probably dealing with encryption and not hashing. Example of AES-256-CBC encrypted strings and associated plaintext (key is "asdf"):

foobar             | U2FsdGVkX19G+KtytNHdj6yH2AVvX26pEmtunS/PRnU=
foobarfoobar       | U2FsdGVkX18sPpIvN6nVh68lOUCcb3gR2fKbCCnBxog=
foobarfoobarfoobar | U2FsdGVkX1/EOnUt57TCW4Rh0EdNnWX+lDatuQv2xEXXeAeowW0XG/EXJUe9aSUz

$ cat encrypted_passwords
U2FsdGVkX19G+KtytNHdj6yH2AVvX26pEmtunS/PRnU=
U2FsdGVkX18sPpIvN6nVh68lOUCcb3gR2fKbCCnBxog=
U2FsdGVkX1/EOnUt57TCW4Rh0EdNnWX+lDatuQv2xEXXeAeowW0XG/EXJUe9aSUz

$ for i in `cat encrypted_passwords`; do echo $i | openssl enc -base64 -d -aes-256-cbc -pass pass:asdf; echo; done
foobar
foobarfoobar
foobarfoobarfoobar

Now you might be thinking: pfft, no one's going use asdf as the key to encrypt their users' passwords and you're probably right. The fact that the key material has to be accessible by the system does however mean that essentially a plaintext master password is laying around somewhere - whether it's as an RSA key, a password or passphrase in a file, embedded in the database, hardcoded in the application, or somewhere in memory.

Same strings using MD5:

foobar             | 3858f62230ac3c915f300c664312c63f
foobarfoobar       | 59faa421729e846dd800dce59943bfc0
foobarfoobarfoobar | 1352aadab322d1a033c27964be0965db

Hashing passwords is far from perfect, in fact it's kind of terrible, but it's the less bad than encryption for what it's trying to accomplish. A user will pick the absolute bare minimum requirement more often than not, and also use it on multiple sites. A lot of "hacks" are actually nothing more than credential reuse attacks. If the initial compromise was using encryption, the only level of effort the attacker has to put in is finding the key to decrypt all the passwords. With hashing, they at least have to put effort into cracking them. When used with modern algos like sha512crypt, bcrypt, scrypt, or argon2, hashes can take a significant level of effort to crack.

Salting

Salting is adding a string to your password before hashing. Salts should be unique per hash, usually randomly chosen, because the whole point is to make the same plaintext "password" hash to different values every time. This makes the password crackers' lives difficult because in order to check the word "password" for each of 1,000 users who each have a unique salt, they have to do the work 1,000 times - once per each user/salt. It also means they can't effectively use precompiled dictionaries or rainbow tables (usually...), because they would need one custom one per salt.

Sites sometimes screw this up and use a common salt for all users; that defeats the purpose.

The following is a salted SHA1 hash:

b353977827f67a4ae0318f3a9447fae1c13d9d90:b8d18ca
|______________________________________|||_____|
                  hash                  |  salt
                                        |
                                    separator

The plaintext for this hash is "password". It has a salt value of "b8d18ca", and uses a format of SHA1($salt.$pass). This means that the algo took the plaintext of password, generated the salt, and prepended it to the plaintext. When the website or application attempts to verify your password in the future, it takes your plaintext password as input, reads the salt value in the stored hash, prepends it to your chosen password, and compares the resulting hash to the stored hash. Cracking the hash without knowing that part of it is a salt would yield the following plaintext:

b8d18capassword

Because the algorithm manipulated the plaintext input, the salt and resulting hash can stay transparent to the user. In cracking, if the algo is salted, we need to know the salt so we can supply it when generating the candidate plaintext.

Salting makes cracking more time intensive when implemented correctly. With randomized salts, you force the cracker to waste time attempting to crack hashes with a salt mismatch. This roughly works out to device_speed / number_of_salts in required effort because we need to generate a candidate for every salt. If the salt is static, the math is the same... speed_of_device / 1. Another way to view this:

Our GTX 980 cracks SHA1($salt.$pass) at 3576.8 MH/s or 3.5 billion candidates per second
Our hashlist contains 1000 unique salts

3,500,000,000 / 1000 = 3,500,000 candidates per second

That's three orders of magnitude slower, a loss of 99.9%. With a static salt it looks like so:

Our GTX 980 cracks SHA1($salt.$pass) at 3576.8 MH/s or 3.5 billion candidates per second
Our hashlist contains 1 unique salt
              
3,500,000,000 / 1 = 3,500,000,000 candidates per second

If that doesn't make sense, keep reading, we have pretty graphs later...

Iterating

Another common improvement over simply "hash this plaintext" is, "hash this plaintext, and then hash that result, and then hash that result" repeated thousands of times. That way to try a single candidate password, the password cracker has to do thousands of operations. This is called iterating, looping, or variable cost. Some password hash algorithms use a hardcoded number of iteration rounds; others make it configurable in a part of the hash itself. For instance, md5crypt() uses md5, includes a salt, and loops exactly 1,000 times. sha512crypt() uses sha512, includes a salt, and loops a configurable number of times (defaulting to 5,000).

Iterating mostly impacts the compute-cycle-cost of a hash algorithm, not its memory footprint or other factors. Those are also important when designing hash types optimized to resist certain kinds of attacks, but that's too in the weeds to go into here.

Effect of hash type on cracking speed

Let's look at some examples to demonstrate the impact of hash algorithm selection, whether it is salted, uses multiple iterations, etc. Suppose an attacker has collected hashes for 1,000 users from some compromised website, and they want to just do one simple attack of testing every password hash against the RockYou data - 143 million candidate passwords.

The hash type used by the compromised site will have a massive impact on how long it would take the attacker to get through that attack. Here is a graph of (relatively, roughly) how many seconds it would take to complete that attack depending on what hash type was in use, with a standard graphics card:

Well, that's useless! The strongest hash types are so much slower that all the faster types just squash down to nothing. Let's try the same data again using a logarithmic X axis time scale. As the bars go left-to-right they are increasing by powers of 10:

So some takeaways are: When you want to crack some password hashes, single-round is easier than multi-round, and unsalted is easier than salted. Conversely, when some company or website announces a data breach that included user data, a) the passwords had better have been hashed, not just plaintext; b) they had better have been salted, not just hashed; c) they had better have been using a strong multi-round salted hash, not just single-round.

Identifying Hash Types

Before attempting to crack a given password hash, it's up to the cracker to figure out what kind of hashing algorithm was used to make it. Identifying hash types is often a straightforward process, but not always. Crackers usually make educated guesses based on clues, such as hash length and format. Ultimately, the only way to know with confidence that a hash type guess is correct is if the hash is cracked.

Some great resource for this task are here, and here, both of which show what common hashes look like.

The package "hash-identifier" (available here) is available in Kali Linux and help in identifying unknown hash types.

Collisions

A collision happens when two different inputs result in the same hash output. This is bad (obviously). For passwords, this can mean that maybe I don't crack your actual password, but since I found an input that produces the same hash, I can use the plaintext value to trick the system into thinking the password is legitimate.

Microsoft Office used an algorithm that was prone to collisions for years in its document protection. It wasn't uncommon to find multiple collisions for a single hash, all of which would unlock the document.

Once a collision is found, the algo is effectively broken. If it happens once, it is statistically probable that it will happen again. The only thing holding us back is time, and processing power. As algo's become more robust, it takes more power and time to generate candidates and compare the hashed outputs to search for them. With that, designers are getting better at creating algorithms that are not prone to collisions.