Alexa Top 1 Million Analysis - August 2018

It's time! August 2018 represents the 7th time I've published a report of the Alexa Top 1 Million sites so let's get stuck in and see what changes have taken place over the last six months on the biggest sites on the web.


Crawler data

As always the data from my crawler is available along with all of the data you see in this report. I recently launched https://crawler.ninja which you can read about in the launch blog but all of the details on how to get the raw data and statistics are on there, including how to donate if you'd like to support the crawler (which would be really awesome because it's expensive!). That said, let's get on to what you all came here for, the results!


August 2018

Diving right in I'm really please to say that all of the metrics we care about are still going up! This is genuinely great news and shows that we are continuing to make progress on making the web more secure.


stats


HTTPS

Here's the one we've all been waiting for, and this one is a pretty big announcement too. Not only because we've seen amazing growth in HTTPS again in this crawl, but because we've passed through 50% of the Alexa Top 1 Million sites actively redirecting to HTTPS for the first time!


🎉🎉🎉 50%+ of Top 1 Million sites on HTTPS! 🎉🎉🎉

https-graph


This is really awesome progress and you can see we're maintaining strong growth in the HTTPS department. In the previous report it looked like the growth of HTTPS had slowed, which it had at the time, but as you can see from the graph here, adoption has picked up again and we're continuing to see that sharp incline sustained. The growth shown here in this graph is unrivaled in any other security mechanism and if you think about the effort required to achieve this, how impressive it is becomes crystal clear.


https-percent


Looking at the history we've made serious progress in the last couple of years and again we're continuing to see maintained growth which is exactly what we need. The web is now well on its way to being 100% encrypted and long may it continue.


https-history


HTTP Public Key Pinning

In the Feb 2018 report I saw a notable growth in the use of HPKP, PKP was up 183% and PKPRO was up 3,924% too! I noted at the time how I'd recently given up on HPKP and how Chrome was deprecating it so we might see a reversal of that trend, and we have. There are still far more sites lower down the ranking using HPKP, thanks almost exclusively to Tumblr, so the distribution is still the same, but the numbers are a lot less now.


hpkp


The use of PKP is down -18% and the use of PKPRO is down -5% so rather than continued growth like all other metrics, we're seeing sites drop the header now. If you check the distinct PKP/PKPRO values in the top 1 million sites, you can see that thousands of them are all using the same policy, and that's because they're all related to Yahoo in some way, mostly Tumblr sites. You can see the list of sites using PKP/PKPRO and you'll notice that a lot are indeed subdomains of Tumblr and they use either https://cspreports.srvcs.tumblr.com/hpkp or http://csp.yahoo.com/beacon/csp?src=yahoocom-hpkp-report-only as their reporting endpoint.


Security Headers

It's also great to see that the use of Security Headers is also continuing to grow, with a notable increase in the adoption of some headers in particular. We saw a 40% increase in CSP which is epic and a 23% increase in HSTS which is following behind the increase in HTTPS usage.


security-headers


a-to-e


Whilst we did see a slight reduction in the use of CSPRO, we saw a considerably larger increase in the use of CSP. My guess on what's most likely happening is that sites are moving from a report only version of a policy to an enforced version, which shows progress in deployments of CSP.


Let's Encrypt

We've seen some truly massive growth in HTTPS and with that comes an associated growth in the use of CAs. One of the CAs that seem to be helping the growth in adoption is Let's Encrypt, and they seem to be seeing the largest amount of growth themselves too!



Their presence is in the top 1 million has seen similar growth across the board, from the very top to the very bottom they've increased their presence.


lets-encrypt


Their own stats show exactly how well they're doing with 53.5 million active certificates currently issued and a rough average of 600,000 issued per day! Keep up the good work!


EV

Despite seeing strong growth in HTTPS across the top 1 million sites, EV certificates have not seen much of that growth at all.


ev


The graph is pretty flat and overall there really hasn't been much change and I actually just wrote a blog post about sites that used to have EV. Using my crawler data I can track sites that used to have EV certs and have switched away from them to either OV or DV certs. With such a massive flood of new sites coming to HTTPS and the proposed benefits of EV, I'd have thought we'd at least see a little more increase in the use of EV but we really haven't.


CAA

Certificate Authority Athorisation is a new DNS record that all sites should be using, but most aren't. The last scan in Feb 2018 was the first time I tracked the use of CAA so this scan is the first time we can look at growth from one scan to the next. Whilst we have seen an increase in the use of CAA, it's not a huge increase...


caa


Some of the spikes on that graph looked a little odd so I checked it out and it turns out they're not errors. There are literally huge collections of sites within those groups that deploy CAA on their domains, and the vast majority of them are sites with a .pl TLD. I can't explain it, but the data is definitely right. Take a look at the list of sites using CAA but remember that updates daily so the sites could move around the list so I took a screen shot to show the results on the day of the report.


caa-list


security.txt

A new metric to track in this report is the use of the security.txt file. If you've not heard of this file then you can learn more about it in my blog Say hello to security.txt but in short, it's a file where you can put security contact information for your site/company so that researchers can make responsible disclosures more easily. You can see my security.txt file right here.


Contact: security@scotthelme.co.uk
Contact: https://twitter.com/Scott_Helme
Encryption: https://scotthelme.co.uk/contact/

It's really simple and is intended to make the information easy to find in a reliable location. Even though it's a pretty new standard we have seen some good adoption but a lot of the large spikes here are down to Tumblr creating the files for sites they manage.


security-txt


Referrer Policy

Whilst I did track Referrer Policy in the Feb 2018 report, I didn't present any data on it as it was a fairly new addition at the time. It has since been added to Security Headers and counts towards your grade there and we are seeing people starting to use it.


rp


Interestingly this header doesn't show the same trend as other Security Headers and has a fairly consistent deployment across all of the top 1 million sites. It's a little strange to see such a break but perhaps this will change over time as the header becomes more widely deployed?


Feature Policy

Another new Security Header, Feature Policy is so new that whilst I've added support to the Security Headers scan, it doesn't count towards your grade yet. How new this header is can also be clearly seen in the crawl results.


fp


Those are some really low numbers but, in honesty, this is where I'd expect it to be but not where I'd like it to be. You can also see, even at such low numbers, the trend emerging that follows the other headers of higher usage in the higher ranked sites and a decrease as you go down the ranking. I wonder if that will be maintained as usage grows up to the Feb 2019 where I hope numbers will be a lot bigger!


General Stats

This is the overview of some of the crawl data but of course, all of the crawler data is now much more available.


Total Rows: 940129 

Security Headers Grades:
A 16,670
A+ 1,200
B 10,422
C 26,769
D 119,262
E 17,497
F 748,239
R 70

Sites using strict-transport-security:
115,402

Sites using content-security-policy:
31,325

Sites using content-security-policy-report-only:
2,622

Sites using x-webkit-csp:
520

Sites using x-content-security-policy:
1,451

Sites using public-key-pins:
7,896

Sites using public-key-pins-report-only:
3,815

Sites using x-content-type-options:
145,475

Sites using x-frame-options:
135,748

Sites using x-xss-protection:
119,496

Sites using x-download-options:
16,601

Sites using x-permitted-cross-domain-policies:
15,904

Sites using access-control-allow-origin:
34,498

Sites using referrer-policy:
22,707

Sites using feature-policy:
416

Sites redirecting to HTTPS:
473,391

Sites using Let's Encrypt certificate:
159,635

Top 10 Server headers:
Apache 205,204
nginx 164,507
cloudflare 102,022
Microsoft-IIS/8.5 31,537
LiteSpeed 26,319
nginx/1.14.0 24,316
Microsoft-IIS/7.5 21,651
GSE 21,182
openresty 16,046
Microsoft-IIS/10.0 13,803

Top 10 TLDs:
.com 464,792
.org 43,552
.ru 39,460
.net 37,528
.de 30,971
.br 24,599
.ir 16,571
.uk 16,379
.pl 14,902
.in 10,959

Top 10 Certificate Issuers:
C = US, O = Let's Encrypt, CN = Let's Encrypt Authority X3 159,635
C = GB, ST = Greater Manchester, L = Salford, O = COMODO CA Limited, CN = COMODO RSA Domain Validation Secure Server CA 54,328
C = GB, ST = Greater Manchester, L = Salford, O = COMODO CA Limited, CN = COMODO ECC Domain Validation Secure Server CA 2 52,811
C = US, ST = Arizona, L = Scottsdale, O = "GoDaddy.com, Inc.", OU = http://certs.godaddy.com/repository/, CN = Go Daddy Secure Certificate Authority - G2 33,823
C = US, O = Amazon, OU = Server CA 1B, CN = Amazon 13,147
C = US, ST = TX, L = Houston, O = "cPanel, Inc.", CN = "cPanel, Inc. Certification Authority" 13,090
C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert SHA2 High Assurance Server CA 12,108
C = US, O = DigiCert Inc, OU = www.digicert.com, CN = RapidSSL RSA CA 2018 10,576
C = US, O = DigiCert Inc, CN = DigiCert SHA2 Secure Server CA 10,352
C = BE, O = GlobalSign nv-sa, CN = AlphaSSL CA - SHA256 - G2 7,846

Top 10 Protocols:
TLSv1.2 281,313
TLSv1 4,065
TLSv1.1 89

Top 10 Cipher Suites:
ECDHE-RSA-AES256-GCM-SHA384 120,760
ECDHE-RSA-AES128-GCM-SHA256 99,106
ECDHE-ECDSA-AES128-GCM-SHA256 37,199
ECDHE-RSA-AES256-SHA384 11,313
DHE-RSA-AES256-GCM-SHA384 3,201
ECDHE-RSA-AES256-SHA 1,932
0 1,805
AES256-SHA 1,412
DHE-RSA-AES256-SHA 1,330
AES128-SHA 1,073

Top 10 PFS Key Exchange Params:
ECDH, P-256, 256 bits 263,502
ECDH, P-521, 521 bits 4,842
ECDH, P-384, 384 bits 4,051
DH, 1024 bits 3,619
DH, 2048 bits 1,285
DH, 4096 bits 125
ECDH, B-571, 570 bits 62
DH, 3072 bits 9
ECDH, brainpoolP512r1, 512 bits 7
ECDH, secp256k1, 256 bits 2

Top Key Sizes:
2048 bit 224,389
256 bit 37,691
4096 bit 20,986
384 bit 241
1024 bit 191
3072 bit 141
8192 bit 11
4056 bit 3
2432 bit 2
3096 bit 1

Sites using CAA:
8,314

I'd highly recommend checking out this folder on the crawler site!


Other Observations

This is a gathering of other observations that I've made using the crawler data.


Public Keys

Whilst we're seeing an increase in the adoption of HTTPS, we're not seeing ECDSA keys grabbing as much of that new adoption as I'd like. It seems that RSA will remain the top choice for quite some time.


public-keys


key-type


Cipher Suites

There isn't much change in cipher suites or the trends in their use either, these graphs remain surprisingly similar despite the huge shifts in total HTTPS usage.


ciphers


Protocol Support

At first I was surprised to see a drop in the amount of support for TLSv1.2 despite a huge growth in HTTPS, how can that be possible? Then I realised, the crawler doesn't gather data when using TLSv1.3 and there's probably a good portion of sites using that now. I've made a note to upgrade the crawler and hopefully we should see some really good adoption stats for TLSv1.3 in the next crawl!


protocols


Raw Data

If you want to see the Google Sheet that I produced all of these graphs from, which contains the raw scan data too, you can get that here.

The raw data is all available through https://crawler.ninja and you really should check out the text files for for all of the stats too. They contain a lot of interesting data and you can even get them in JSON format if you want to do something more than look through them. These JSON files are what powers https://whynohttps.com/ to provide the list of HTTP sites!