Just over a month ago, Adobe announced that they had been the victim of a sophisticated cyber attack. With the company's source code and customer information stolen, it was a serious breach that could have tremendous implications. I'm going to take a look at the customer data that was subsequently leaked and how bad the situation is.




Originally announced on the Adobe blog, the leak was thought to have affected a staggering 2.9 million customers, include encrypted credit/debit card numbers, expiry dates and 'other information'. As the story began to unfold this number climbed to a dizzying 39 million customers and then on to an absolutely astronomical 152.9 million customers. That is, single handedly, the largest ever breach of its kind. The only one good thing that I can bring up at this point is that I haven't found any credit or debit card information in any of the leaked data. This doesn't mean that it wasn't stolen, just that it isn't yet widely available.


The leak

Diving straight in, the data that was leaked in the Adobe hack is now widely available as a file called users.tar.gz across the Internet. Compressed, the file weighs in at 3.8Gb and quickly expands out to 9.26Gb of raw data. The table includes a customer id, username, email address, password and password hint. The customer id and username were useless to me so I dropped them after the import to save space. After decompressing the file I renamed it to cred.csv, used the following MySQL command to import it into a local MySQL instance and created an index on the email column.
load data local infile 'x:\cred.csv' into table adobe_creds.stuffs fields terminated by '-|-' lines terminated by '|--';


db-import db-info


Once I took a quick look at the data there were a few things that struck me immediately.



Password hints are a terrible idea to start with, but here they are, in the data leak, in plain text. There is no protection whatsoever afforded to them. The idea of a little bit of text that's supposed to help you figure out what your password is, when it's supposed to be a secret, seems to be a bit of a contradiction. Next up is the passwords themselves. A password should never be encrypted, but instead should be properly hashed and salted before being stored in a database. As hashing algorithms always produce a digest with a fixed length, I can immediately determine that these passwords have indeed been encrypted and not hashed. Not off to a great start... I don't want to go into too much detail on the passwords themselves as there is a superb write up on them here by Naked Security. It's definitely worth a read.


Analysing the data

I immediately ran a few simple queries against the data to see what information I could glean. After all, it's not everyday we get to work with a genuine sample of relevant 'big data'! The first query that popped into my head was 'What is the most common password hint?'. The hints aren't protected with any encryption so it's something pretty easy to kick off with.
select hint, count(*) from adobe_creds.data group by hint order by count(*) desc limit 100;

Here all the hints that had over 100,000 occurrences (top 100).

dog 559,216
name 479,676
usual 387,153
?? 328,199
???? 303,591
same 242,866
me 242,095
??? 232,817
cat 227,927
son 183,895
Daughter 180,991
nickname 165,611
pet 142,593
????? 140,614
normal 140,063
work 125,664
?????? 123,719
car 113,503
school 111,955
birthday 109,697
my name 102,810
love 102,380


From this, we can already see a huge opportunity for social engineering. Knowing that someone's password is their dog/cat/son/daughter provides a very narrow target for an attacker to focus their efforts. Not only that, when you see things like 'the usual' and 'standard' in there, you know that if you do crack the password, there's an even better chance it will be useful on other sites and services. A couple of other worrying hints that I picked up on were 'password' and '123456'. Let's take a look at the passwords for everyone who used those hints.

That is, select all the passwords where the hint was 123456, giving us the following data (top 100).

select password, count(*) from adobe_creds.data where hint = '123456' group by password order by count(*) desc limit 100;
j9p+HwtWWT86aMjgZFLzYg== 2503
Ypsmk6AXQTk= 1113
dQi0asWPYvQ= 701
7LqYzKVeq8I= 485
j9p+HwtWWT/ioxG6CatHBw== 458
diQ+ie23vAA= 428
5djv7ZCI2ws= 358
e6MPXQ5G6a8= 258
j9p+HwtWWT8/HeZN+3oiCQ== 184
v0cefPh3OLI= 160


Then this query, select all the passwords where the hint was password, gives us the following data (top 100) Removed by PasteBin.

select password, count(*) from adobe_creds.data where hint = 'password' group by password order by count(*) desc limit 100;
L8qbAD3jl3jSPm/keox4fA== 1,890
IbF1vGcYjCrioxG6CatHBw== 1,321
STWrgIvDDp3ioxG6CatHBw== 804
ygtKdMXm1tHioxG6CatHBw== 624


Going back to the article I mentioned earlier over on Naked Security, it explained how the passwords weren't encrypted properly. As there is no randomisation introduced into the encryption process (a nonce), when you encrypt a particular value, you always get the same output. Looking closely at the password data you can see patterns emerging in the encrypted values. For the hint 'password' you can see that the value 'ioxG6CatHBw' starts to become extremely prevalent in the results. This is a great indicator that your password actually contains 'password' and allows an attacker to launch a more effective attack on cracking the encryption key. Once they have that, it will give them access to every single password in the database, in plain text.

The folks over at XKCD summarised the situation very nicely with this cartoon. As you can see, even if you didn't have a password hint, just a very poor password, someone else's password hint can actually give you up very, very easily.



Just to see what else I could dig up I decided to run a bunch of other queries.

Top 100 passwords for hints that contained 'qwerty': (link)Removed by PasteBin. That's always a good starting point.

Top 100 passwords for hints that contained 'color': (link). American spelling, I know. Points for guessing the top 3 password values!

Top 100 passwords for hints that contained '1to6': (link)Removed by PasteBin. This one has a particularly interesting result.


The truth is I could go on doing that for a very long time. There are some ridiculously compromising password hints in there and a great deal more would only need the slightest amount of knowledge to crack, like 'DOB'.


Cracking 1,911,522 passwords with 2 SQL queries

The last thing I did want to check out was just how bad password re-use was. Because we know the same password will result in the same cipher text, it's actually easy to figure out just how bad it is. With a data set of ~153 million people, it's a pretty good sample.

This query selects the most commonly used passwords in the database, the top 5 are below and the top 100 are here. Removed by PasteBin

select password, count(*) from adobe_creds.data group by password order by count(*) desc limit 100;
EQ7fIpT7i/Q= 1,911,522
j9p+HwtWWT86aMjgZFLzYg== 446,072
L8qbAD3jl3jioxG6CatHBw== 345,758
BB4e6X+b2xLioxG6CatHBw== 211,629
j9p+HwtWWT/ioxG6CatHBw== 201,546


That's a whopping 1,911,522 (1.25% of users) using exactly the same, very weak password! To show how easy it is to crack the password with nothing more than a SQL query, I ran the following:

This query gives me the most commonly used password hints for this particular encrypted password. I don't think it's going to take a genius to figure out what almost 2 million people were using as their account password... (full list)

select hint, count(*) from adobe_creds.data where password = 'EQ7fIpT7i/Q=' group by hint order by count(*) desc limit 100;
1to6 53,410
numbers 25,325
numeros 14,877
123 14,660
654321 14,327


Going back to my earlier point, this shows how the 1.4 million people that use the password 123456, but had no password hint, have been compromised by the presence of obvious password hints from other users, and the lack of proper password security from Adobe of course.



It's only a matter of time until someone totally breaks the encryption protecting the passwords, but even now the risk is huge. Millions of user passwords have already been compromised by the poor security protecting them and the use of password hints. There really isn't any reason to use a password hint on the modern web and everyone should simply drop these fields if they have them. To demonstrate just how bad the breach is, several large profile companies have already started taking action to protect user accounts on other services. Most notably, Facebook got hold of a copy of the leaked database and started warning users who had the same email address and password on Facebook. They simply ran the deciphered passwords through the same process that the password you provide at login goes through, and if they match up you get this: facebook-message


This was first confirmed over on Brian Kreb's blog by a chap from the Facebook Security Team called Chris Long and then later by BBC News. This is the perfect example of why you should never re-use passwords across sites. You can read up on my article about 2 factor authentication and password managers to help protect yourself against exactly that problem.


Going forwards, I hope that this breach will help whip other companies into shape with regards to their password security. Passwords should be salted and hashed with a strong hash function, not encrypted. There is no possible requirement to ever need to recover a user's password so a one way hash will do just fine, thanks.