It's time for the 5th instalment of my Alexa Top 1 Million scan and this time around there's another new metric in the data.
I've done 4 previous crawls before now and they were Aug 2015, Feb 2016, Aug 2016 and Feb 2017. I'm also publishing my daily crawl data which is available here for further analysis by the community. Let's dig into the latest data!
To start off with the good news, things are continuing to get better!
One of the biggest changes since the last scan has to be the enormous jump in the number of sites deploying HPKP. This is rather interesting for many reasons, not least because just last week I announced that I'm giving up on HPKP...
The number of sites in the Alexa Top 1 Million deploying HPKP recently jumped from 187 to 6,616! A 3,438% increase!— Scott Helme (@Scott_Helme) May 29, 2017
The increase in HPKP is almost entirely caused by Tumblr deploying HPKP across their entire catalogue of sites. Whilst the number of their sites in the top 1 million has changed since I first noticed this, there's still a huge jump of over 3,000 sites. Another big win in the scan this time around is the continued growth of the deployment of HTTPS. We're really seeing a continuation of the awesome progress being made here and this was confirmed by Adrienne Porter Felt and April King recently in their talk 'Measuring HTTPS Adoption on the Web' at USENIX (slides).
We can clearly see there's huge progress each time I conduct these scans on just how fast HTTPS is being deployed across the web but another thing that's really important is that not only is adoption continuing, it's accelerating. I've noticed this increase in the rate of adoption in previous scans and I'm really excited to see it again.
One of the original purposes of my scans was to determine the adoption of various HTTP security headers and I'm still tracking good progress in that area too. We've seen increases in usage across the board and some of them are quite significant. I'd like to think that securityheaders.io is at least helping to drive adoption and education about these headers.
What's really odd is that the trend for XXP and XCTO are still there! The presence of all other headers decreases as you go down the ranking except these two and still to this day there isn't a solid explanation for this. As I mentioned above the raw data from my daily scans is available so please do dig into the data and see if you can identify why this trend exists.
I've now been tracking the adoption of Let's Encrypt over 18 months and they too have seen some great progress in their adoption in the Alexa Top 1 Million.
The low usage in the very top ranked sites is still present but across the rest of the ranking they've seen significant growth. My guess is that sites right near the top probably have established commercial agreements with a CA but we may see them shifting over time, albeit more slowly.
After a recent, debate, about the use of EV certificates on Twitter between various parties I decided to add tracking to my crawler for the type of certificates used by sites in the top 1 million. It's interesting that the use of EV certs follows the same trend line as most of the other metrics that I track.
As you can see the usage of EV certs is much higher in the higher ranked sites and tails off much in the same way that most other metrics do. I'm sure there will be various arguments for why this is the case but my guess is that sites near the top have a higher budget so the cost of EV is less significant to them and worth a shot for any potential benefits.
You should check over the raw data that I make available if you want to dig into specifics but this is a nice overview of a few of the stats that my crawlers now collect.
Total Rows: 890204
Security Headers Grades:
Sites using strict-transport-security: 65244
Sites using content-security-policy: 17437
Sites using content-security-policy-report-only: 1297
Sites using x-webkit-csp: 439
Sites using x-content-security-policy: 1154
Sites using public-key-pins: 3508
Sites using public-key-pins-report-only: 99
Sites using x-content-type-options: 104099
Sites using x-frame-options: 110391
Sites using x-xss-protection: 82551
Sites using x-download-options: 9696
Sites using x-permitted-cross-domain-policies: 9390
Sites using access-control-allow-origin: 29601
Sites using referrer-policy: 1615
Sites redirecting to HTTPS: 273837
Sites using Let's Encrypt certificate: 63843
Top 10 Server headers:
Apache/2.4.7 (Ubuntu) 11094
Apache/2.2.15 (CentOS) 10985
Top 10 TLDs:
Top 10 Certificate Issuers:
Let's Encrypt Authority X3 63842
COMODO RSA Domain Validation Secure Server CA 37827
COMODO ECC Domain Validation Secure Server CA 2 30170
Go Daddy Secure Certificate Authority - G2 22479
RapidSSL SHA256 CA 12438
DigiCert SHA2 High Assurance Server CA 6191
GeoTrust SSL CA - G3 5812
AlphaSSL CA - SHA256 - G2 5550
Symantec Class 3 Secure Server CA - G4 4849
Top 10 Protocols:
Top 10 Cipher Suites:
Top Key Sizes:
RSA 2048 bit 212830
ECDSA 256 bit 32070
RSA 4096 bit 16942
RSA 1024 bit 293
RSA 3072 bit 142
ECDSA 384 bit 81
RSA 8192 bit 6
RSA 4056 bit 3
RSA 3248 bit 3
RSA 2058 bit 2
There are a few other nice things that I've noticed whilst looking over the data here that I think are worth pointing out.
As I mentioned above Let's Encrypt have seen tremendous growth in the top 1 million sites, but they're actually really close to becoming the biggest issuing CA! In the Feb 2017 scan Comodo had 46,466 certificates issued and Let's Encrypt had 31,030. Now in the Aug 2017 scan Comodo has 67,977 and Let's Encrypt has 63,842. Given the rate at which Let's Encrypt are closing that gap they will very soon become the largest issuing CA in the Alexa Top 1 Million!
Another awesome development is the increase in the use of ECDSA keys in certificates instead of RSA. The Feb 2017 scans saw RSA 2048 bit keys number 146,817 whilst ECDSA 256 bit keys were 20,046. Looking at the data from Aug 2017 we can see that RSA 2048 bit keys are 212,830 with ECDSA 256 bit keys at 32,070. To put that another way, in Feb 2017 13.7% of sites supported ECDSA but in Aug 2017 that had increased to 15.1% of sites.
The protocol support also surprised me a little with some of the changes there. As expected we've seen a huge jump in the number of sites using TLSv1.2 from 171,723 to 253,949. Again as expected we've seen a decrease in the use of TLSv1.1 from 208 sites to 177 sites. What did surprise me was that we've seen an increase in the number of sites using TLSv1.0 from 7,945 to 8,266. Remember, these are sites that can't negotiate a higher protocol version with me and there's really no reason that they we shouldn't be seeing 100% TLSv1.2 support in the top 1 million.
The last few things to quickly note are that Nginx is closing the gap on Apache as the most popular server choice, Cloudflare have seen a significant increase in their presence and this is the first report where no sites in the top 1 million negotiated SSLv3 with my crawler!
As always the raw data from my scans is available here.
By making the raw data available I'm hoping that others will be able to conduct further analysis or use it to further their own research. I dump the data from the crawlers every day so there's a lot to go at!
You can also view the Google Sheet with all of the data and graphs I've used throughout this article and all of my previous articles here.
I will be sending a few tweets with other bits of information that I've found and embedding them below.