Alexa Top 1 Million Analysis - February 2018

It's that time of year again! I'm really excited to publish the 6th installment of my Alexa Top 1 Million analysis so we can take a look over our progress on securing the web over the last 6 months.


Previous Crawls

It's hard to believe there are now 5 previous crawls available for comparison purposes!


August 2015
February 2016
August 2016
February 2017
August 2017


As I publish more of these reports we start to get a much clearer picture of the progress we're making. If you're interested in doing your own analysis not only do I have the links above but I also publish the data from my crawlers on a daily basis. If you want to get hands on with a large set of data I'd love to see what further analysis you can do.


February 2018

The first report of 2018 and it's looking like a good one. As is tradition, let's start with a quick summary and get a look at what kind of things we have in store.


feb-2018-results


Similar to the Aug 2017 when we saw a huge jump in the number of sites using HPKP, we've seen a continued rise in the use of HPKP and a huge jump in the number of sites using HPKP-RO too. I used to a be a big supporter of HPKP, I even have guidance on how to set it up, but I recently gave up on HPKP and Chrome announced they may deprecate it. This does make it interesting to see continued and strong growth in its usage and it's also make a trend pretty clear; the larger sites are less likely to use HPKP. This is the reverse of the trend for every other metric.


pkp-feb-2018


One of the things I'm always eager to see in these reports is the adoption of HTTPS and whether we're still continuing to encrypt the web at an impressive rate. I'm really glad to say that we are continuing to make outstanding progress on that front!


https-feb-2018


The line does look a little less smooth in this scan, and checking the daily scans this does seem to have been a trend developing over the last few weeks, but either way, we have seen a 32.2% increase in the number of sites redirecting to and enforcing HTTPS in the Alexa Top 1 Million!


https-redirect-percent-feb-2018


One thing I am sad to say though is that something I predicted back in 2017 and have talked about a few times on Twitter has come to pass. The rate at which we were migrating to HTTPS was not only being maintained but it was actually increasing in previous reports, you can see that in the graph. This, of course, could not be maintained forever. Whilst we are still seeing tremendous growth, and I'm massively excited about that and proud to be a part of it, the graph is starting to show signs of a plateau. From Aug 2017 to Feb 2018 the rate of progress has slowed. We're still going in the right direction, and no doubt will continue to do so, but the Aug 2018 and Feb 2019 reports may show much smaller steps forward.


Security Headers

We can't forget the original reason that this whole report started and the use of Security Headers was that reason. Powered by the scanning and analysis engine on securityheaders.io here are the usage of headers and the Security Headers grading in the Alexa Top 1 Million sites.


headers-feb-2018


grades-feb-2018


We're still seeing the same interesting trends that have been present in all previous scans and another one has emerged. Right down towards the bottom of the ranking there is a clear group of sites with a noticeably higher grading. Perhaps an opportunity for someone to grab the data and take a look why. It could be a large hosting provider or platform doing something new by default, or maybe just an anamoly. Let me know if you figure it out!


Let's Encrypt

It's now 2 years since I started tracking the use of Let's Encrypt certificates in these reports and I'm pretty sure that no one here needs me to tell them what's coming.


lets-encrypt-feb-2018


Let's Encrypt have continued to see strong growth in their presence in the top 1 million sites on the web. Removing cost and technical barriers really does help increase adoption and this is the proof. Back in Aug 2017 Let's Encrypt were close to becoming the largest issuing CA in the top 1 million sites and they did it by Oct 2017, just 2 months later.




EV Certificates

In the Aug 2017 scan I introduced a check for EV certificate usage in the Alexa Top 1 Million and I've left the logic in place to continue to monitor the usage of EV certs. I guess one important thing to point out here is that has been only one change in the methodology that allows me to identify more EV certificates than I did previously. Anyone that's tried to do something like this will tell you that identifying EV certs isn't exactly easy!


ev-feb-18


We're still seeing the same considerably higher adoption at the top end of the ranking but the really interesting thing here is that overall there's almost no growth in the use of EV certificates. In Aug 2017 I detected 17,877 sites using an EV certificate but I ran the new logic against my old data (I keep all scan data for historic scans) and identified a new total of 18,552 sites using EV certificates. In the new Feb 2018 scan that number has only increased to 19,803 EV certificates. Whilst HTTPS has seen an increase in adoption of 32.30% compared to the last scan, EV certificates only accounted for 6.74% of the increase.


Certificate Authority Authorisation

CAA is a brand new DNS record that sites can set to control which CAs they authorise to issues certificate for their domain. I have a great introduction blog on CAA if you want more information, but the good news is that it's now one extra metric that I'm tracking in the daily crawl! I did a brief intro post about CAA usage back in December when I first added the metric and this is the first time it will be included in a report.


caa-feb-2018


As is common in these results now we're seeing comparatively huge adoption in the sites higher up the ranking with a quick decline followed by a much steadier decrease. I found a total of 4,064 sites with a valid CAA policy set compared to 3,404 in the first scan in Dec 2017, an increase of 19.39% in roughly 2 months. Let's hope that by the Aug 2018 scan we will continue to see a healthy increase in adoption.


General Stats

The raw crawler data is available but I also like to publish a selection of statistics from the data:


Total Rows: 946719 


Security Headers Grades:
A+	763
A	15258
B	18954
C	26957
D	146633
E	29691
F	708385
R	78 


Sites using strict-transport-security: 94116 
Sites using content-security-policy: 24044 
Sites using content-security-policy-report-only: 4595 
Sites using x-webkit-csp: 455 
Sites using x-content-security-policy: 1235 
Sites using public-key-pins: 6889 
Sites using public-key-pins-report-only: 2709 
Sites using x-content-type-options: 132085 
Sites using x-frame-options: 124835 
Sites using x-xss-protection: 105956 
Sites using x-download-options: 12021 
Sites using x-permitted-cross-domain-policies: 11593 
Sites using access-control-allow-origin: 32294 
Sites using referrer-policy: 3990 

Sites redirecting to HTTPS: 372125 
Sites using Let's Encrypt certificate: 108146 

Top 10 Server headers:
 Apache	            221564
 nginx	            160874
 cloudflare         92251
 Microsoft-IIS/8.5  35599
 nginx/1.12.2	    29258
 Microsoft-IIS/7.5  24947
 LiteSpeed          23226
 GSE	            23041
 openresty          14749
 Apache/2           12885 

Top 10 TLDs:
.com  443948
.org  45933
.ru   40995
.net  38964
.de   38756
.br   27815
.uk   22215
.pl   17704
.it   14246
.ir   13841 

Top 10 Certificate Issuers:
C = US, O = Let's Encrypt, CN = Let's Encrypt Authority X3 108146
C = GB, ST = Greater Manchester, L = Salford, O = COMODO CA Limited, CN = COMODO RSA Domain Validation Secure Server CA 46220
C = GB, ST = Greater Manchester, L = Salford, O = COMODO CA Limited, CN = COMODO ECC Domain Validation Secure Server CA 2 38537
C = US, ST = Arizona, L = Scottsdale, O = "GoDaddy.com, Inc.", OU = http://certs.godaddy.com/repository/, CN = Go Daddy Secure Certificate Authority - G2 29436
C = US, O = GeoTrust Inc., CN = RapidSSL SHA256 CA 10741
C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert SHA2 High Assurance Server CA 10662
C = US, O = Amazon, OU = Server CA 1B, CN = Amazon 9380
C = US, ST = TX, L = Houston, O = "cPanel, Inc.", CN = "cPanel, Inc. Certification Authority" 8489
C = BE, O = GlobalSign nv-sa, CN = AlphaSSL CA - SHA256 - G2 6580
C = US, O = DigiCert Inc, CN = DigiCert SHA2 Secure Server CA 6441

Top 10 Protocols:
TLSv1.2	 350451
TLSv1    7309
TLSv1.1  165 

Top 10 Cipher Suites:
ECDHE-RSA-AES256-GCM-SHA384    147985
ECDHE-RSA-AES128-GCM-SHA256    127964
ECDHE-ECDSA-AES128-GCM-SHA256  41043
ECDHE-RSA-AES256-SHA384        15400
DHE-RSA-AES256-GCM-SHA384      4326
ECDHE-RSA-AES256-SHA           3231
DHE-RSA-AES256-SHA             2484
0000                           2194
AES256-SHA                     2113
AES128-SHA                     1855 


Top 10 PFS Key Exchange Params:
ECDH, P-256, 256 bits            325059
ECDH, P-384, 384 bits            6822
ECDH, P-521, 521 bits            6267
DH, 1024 bits                    6208
DH, 2048 bits                    1275
ECDH, B-571, 570 bits            103
ECDH, brainpoolP512r1, 512 bits  18
DH, 4096 bits                    5
DH, 3072 bits                    2
DH, 768 bits                     1 

Top Key Sizes:
2048 bit  289141
256 bit   41402
4096 bit  24527
1024 bit  315
3072 bit  231
384 bit   87
8192 bit  7
2432 bit  4
2049 bit  3
512 bit   2 

Sites using CAA: 4186 

Other Observations

Looking over the data myself there are some other interesting observations that can be made.


Public Keys

We've seen a huge jump in the number of 2,048 bit RSA keys as you'd expect from a jump in the adoption of HTTPS, but we're also seeing the use of 256 bit ECDSA key usage increasing too, up from 32,070 in Aug 2017 to 41,402 in Feb 2018. The majority of the increase in HTTPS was taken up by RSA though.


public-keys-feb-2018


Not only that but the use of 3,072 bit and 4,096 RSA keys has also risen quite sharply. 3,072 bit went from 142 to 231 and 4,096 bit went from 16,942 to 24,527. Those are some pretty sizeable keys and there are a lot of sites using them, which does come as a little bit of a surprise.


public-key-sizes-all-feb-2018


Cipher Suites

Given the constant drive towards performance on the web, the public key usage above was fairly interseting and so too is the user of cipher suites. The top cipher suite remains as ECDHE-RSA-AES256-GCM-SHA384 raising from 113,309 sites in Aug 2017 to 147,985 sites in Feb 2018. I would have expected that ECDHE-RSA-AES128-GCM-SHA256 would be the most popular suite but that ranked second in both scans with 79,256 sites in Aug 2017 and 127,964 in Feb 2018.


top-3-ciphers-feb-2018


From the graph I guess we can say that the very top sites in the ranking have the highest amount of support for ECDHE-RSA-AES128-GCM-SHA256 which is the faster of the two RSA suites.


Protocol Support

With the pending removal of TLSv1.0 support in PCI DSS coming in June, protocol support will be another interesting thing to keep an eye on. GitHub also did an expirement recently where they disabled TLSv1.0 and TLSv1.1 support on github.com and other services to see what would break. The good news is that protocol support does look pretty good.


protocol-support-feb-2018


To put that another way.


tls-pie-feb-2018


Protocol support looks pretty good in the top 1 million. We have the vast majority on TLSv1.2, a tiny slice on TLSv1.0 and an even tinier slice on TLSv1.1 after that. Once sites do remove TLSv1.0 they may as well remove TLSv1.1 at the same time and just have TLSv1.2 unless TLSv1.3 is here by then.


Servers

The top 4 servers in use hasn't changed and in order are still Apache, nginx, cloudflare and Microsoft-IIS/8.5. Cloudflare have changed their header from cloudflare-nginx to cloudflare and also saw a small loss in the number of sites returning their header but remain 3rd in the ranking. As the 3rd most popular server on the planet I'd imagine removing those 6 bytes from the Server header has actually added up to a fairly significant amount of data of the last few weeks/months!


Report URI

Another cool thing that I wanted to look at was how many sites are using Report URI in the Alexa Top 1 Million.


report-uri-feb-2018


As of right now that graph is showing 413 sites which is somewhat short of the real total for two main reasons. One, some of the larger sites that report with us downsample their reports by only injecting the report-uri directive into a subset of responses and two, not all sites configure reporting via the HTTP response header. It is also possible to enable reporting using Report URI JS and my crawler doesn't analayse the body of the page so it'd miss those too. As with all of the other trends we have a much larger presence in the higher ranked sites and a steady trend once you get out of the top few thousand.


Raw Data

As always, details on how to get hold of the raw data can be found here and I'd love to see any further analysis that other members of the community could contribute!


Details on raw data here.
Raw data download links here.
Raw stats from Feb 2018 here.
Google sheet with tables and graphs here.