With my recent interest in security based HTTP headers like CSP and HPKP following the launch of my new service report-uri.io, I found myself wondering just how many people out there are actually taking advantage of the huge security gains on offer from such simple features.
What are HTTP response headers?
When your browser communicates with a site to request a page, the browser and the server at the other end usually exchange a lot more information than just the request for the page and the response containing the page. The browser and server exchange this information using HTTP headers, which are just a list of key:value pairs that contain information. When the browser makes the request, it sends request headers along with it to pass information to the server.
These headers tell the server various pieces of information about the browser and help it better handle our request. You can see a list of languages that the browser will accept, the cookie we are using, the dnt header stands for Do Not Track and indicates that we don't want the site to track us for things like targeted advertising, the referrer details where we came from to get to this page and so on. They are just pieces of information that could be useful to the server in dealing with our request. When the server issues a response back, the server can also include additional information for the browser in the form of response headers.
There are countless headers that a server could issue in the response, but there are a set of defined headers that allow the server to improve security. You can see some of these headers in the screenshot above and there is a full list of these headers in my blog on Hardening your HTTP response headers. If you'd like to read more on them then that is a great place to start and links out to more detailed information on each one and how to implement them. For this blog, I'm going to focus on looking at just how many sites out there are actually taking advantage of these really easy security gains.
Who is using them?
Considering some of the huge improvements to security that we can gain from issuing a simple HTTP response header, I thought I'd do some research and see just how many sites out there are making use of them. To do this I grabbed a copy of the Alexa Top 1 Million list which details the most visited sites on the Internet. These guys are literally the biggest and best out there, so we should be getting a good feel for just how widely used these security based response headers are, right? Well, let's take a look and see.
Crawling the Alexa Top 1 Million
Having never written anything like a web crawler before I thought I'd give it a shot myself and see just how easy or hard it would be. I threw some code together in PHP (yes, terrible I know!) and after a few tweaks and improvements, I was able to analyse the response headers of the Alexa Top 1 Million in a little under 3 hours. I will do a follow up blog with the technical details and code that I used to do this but for now, I'm going to focus on the results of the scan.
What am I looking for?
Below is a list of the specific headers that I looked for in the scan.
Alongside these headers I also recorded which sites redirected me from HTTP to HTTPS. The Alexa Top 1 Million list simply provides the domain name like google.com or facebook.com, there is no scheme. The crawler will default to http:// (domain name) and follow redirects until they are complete.
Well, after much verification to make sure what I was seeing was actually correct, the results are a little shocking actually. I wasn't expecting usage to be at particularly high levels, but levels this low really did come as a surprise. These graphs show the Alexa Top 1 Million in groups of 4,000.
The first thing I love about this kind, and quantity, of data is that trends jump out at you right away. Before we start drilling down into specifics though, I just want to put some headline figures out there.
These % figures came as a genuine shock to me. I checked my code, looked at the crawler, tested it on a small subset of sites and kept coming back with the same thing, the figures were really this low. I even ran the crawl 3 times just to make sure.
Content Security Policy
The content security policy header was only found on 1,365 sites. Coupled with the report only version of the header, and the now deprecated XWCSP and XCSP headers, only a fraction over 2% of the top 1 million sites showed any indication of CSP.
Public Key Pinning
The public key pinning header, only found on 148 out of 1,000,000 sites, is the most underutilised security based HTTP response header. There is also the report only version of the header, found on 21 sites bringing the total number of domains showing usage of PKP to 0.0183%!
The graph isn't really very clear due to such small numbers, but I still think it could be argued there is a downward trend as we move down the list of domains.
Strict Transport Security
The most widely used of the non 'X' headers, strict transport security, was being used on 11,308 of the sites making for a 1.2231% usage rate. Still pretty low for the 1 million most visited sites on the internet. Interestingly, of the 1 million sites, 62,043 of them, or 6.7108%, were actively redirecting to HTTPS on their domain which leaves 50,735 domains that are redirecting to HTTPS but not enforcing it with STS.
As with the other headers, there is quite a prominent downward trend in the use of STS as we move down the list of domains.
XCTO / XFO / XXSSP
I decided to group these more common 'X' headers together as there is clear cut distinction between these and the remaining headers in terms of the number of sites that use them. The XFO header is by far the most prevalent security based header in use at 55,042 sites (5.9536%) and shows the same downward trend as the rest. Interestingly though the XCTO and XXSSP headers both show similarly high usage at 44,315 (4.7933%) and 41,948 (4.5373%) respectively, but, after their initial dip they actually show an upward trend as you move down the list which goes against the trend of all other headers.
I have a few thoughts as to why this trend might be as it is, but it'd be great to hear your thoughts or suggestions in the comments below too!
One of the other things that the crawler was looking for was how many of the domains would redirect you to HTTPS if you loaded them over HTTP on the first request. As I mentioned previously, a total of 62,043 sites redirected to HTTPS (6.7108%) and we can see the same downward trend as we move down the list of domains.
The skipped sites details sites that there was any kind of issue in contacting and were excluded from the results. This could be anything from a failure to resolve the domain, the connection timing out (10 seconds) or they returned a HTTP error code (4xx/5xx).
The raw data for the crawl is available on my Google Doc spreadsheet right here. This includes the table containing all of the crawl results and the graphs used throughout this blog post. Please feel free to use this data which falls under the same licence as my blog, found in the footer. Basically, share and use it for anything you want, but provide attribution back here.
Check your own headers
If you want to check the headers on your own site, you can use my free service over at https://securityheaders.io to analyse them. Simply type in your address and hit scan and you will get feedback on the headers you are implementing and the headers you aren't.
Short URL: https://scotthel.me/hsh