In my previous blog post, I came across a bug in Report URI that took some effort to debug and fully understand before I could fix it. Whilst I'd identified what the issue was, and how it was happening, I never got to the bottom of why it was happening. In this post, I'm going to delve into the PHP source to figure it out!

Truncated POST Requests

If you haven't read my previous blog post, Unravelling The Mystery Of Truncated POST Requests On Report URI, you should start there as it sets the scene for what I'm investigating here. It's a bit lengthy, but it has all of the details that are required to understand the journey in this post. Just like in that previous post, I'm investigating things outside of my area of knowledge, but sometimes, I have an itch that I just have to scratch, and this was one of those times!

It Wasn't Nginx

At the end of the last blog post, I wasn't 100% sure whether Nginx was at fault, or whether PHP was at fault. Admittedly, I did have a feeling that it was going to be PHP that was responsible, but I hadn't tested my idea and didn't have any evidence, so I didn't share that view at the time.

The problem we had was that POST requests were coming through and by the time they got to PHP the payload was truncated, and that caused a failure in parsing the JSON payload because it was invalid. I knew that Nginx was receiving the full payload and we use the default value for fastcgi_request_buffering (link) so "the entire request body is read from the client before sending the request to a FastCGI server". I could verify this with some simple debug logging in Nginx, which I did, and the failure was happening between Nginx and PHP during the fastcgi_pass. Nginx has the full request body to pass, it's just not making it to the other side because the socket is closed before everything can be sent.

Previously, we were using a TCP socket between Nginx and we were experiencing a lot of memory pressure on TCP resources. That resulted in an out of memory - consider tuning tcp_mem error (source) and Linux does warn us that it will "kill" sockets if there is "strong memory pressure" (source).

bool tcp_check_oom(struct sock *sk, int shift)
{
	bool too_many_orphans, out_of_socket_memory;

	too_many_orphans = tcp_too_many_orphans(shift);
	out_of_socket_memory = tcp_out_of_memory(sk);

	if (too_many_orphans)
		net_info_ratelimited("too many orphaned sockets\n");
	if (out_of_socket_memory)
		net_info_ratelimited("out of memory -- consider tuning tcp_mem\n");
	return too_many_orphans || out_of_socket_memory;
}

/**
 *  tcp_out_of_resources() - Close socket if out of resources
 *  @sk:        pointer to current socket
 *  @do_reset:  send a last packet with reset flag
 *
 *  ...
 *
 *  Criteria is still not confirmed experimentally and may change.
 *  We kill the socket, if:
 *  1. If number of orphaned sockets exceeds an administratively configured
 *     limit.
 *  2. If we have strong memory pressure.
 *  3. If our net namespace is exiting.
 */
static int tcp_out_of_resources(struct sock *sk, bool do_reset)
{

Since we swapped away from a TCP socket to a Unix socket, the same problem could still be reproduced, but instead by pressuring PHP.

Reproducing the error with a Unix Socket

Using the same slow script from the previous blog, with a small tweak to remove the outbound TCP connection, we can keep all of the PHP processes busy easily.

sleep(30);

$json = json_decode(file_get_contents('php://input'));

if ($json == null) {
  syslog(LOG_WARNING, "JSON decode failed: " . $_SERVER['CONTENT_LENGTH'] . " " . strlen(file_get_contents('php://input')));
}

I then tweaked the Nginx config so that it would timeout on fastcgi operations quickly.

fastcgi_connect_timeout 3s;
fastcgi_send_timeout 3s;

The new Nginx config, coupled with the really long running PHP scripts, means that it will be easy to get Nginx to start timing out on the fastcgi_pass and that's when the error occurs.

Debugging PHP

Fortunately, I had quite a specific area of the php-src I'd need to look at to investigate so this would help me narrow down quite quickly. We're running v8.2.10 in production on Report URI and I've also verified my findings here against the latest 8.2 branch which is v8.2.11 at the time of writing. I will do my work here against that branch.

Following the processing of a request, we go through:

php_request_startup()
sapi_activate()
sapi_read_post_data()
SAPI_POST_READER_FUNC()
sapi_read_post_block()
sapi_module.read_post()
sapi_fcgi_read_post()
fcgi_read()
safe_read()

In safe_read(), we make the final system call to read(), to read data from the socket. I've snipped the WIN32 code for ease of reading because it isn't relevant here.

static inline ssize_t safe_read(fcgi_request *req, const void *buf, size_t count)
{
	int    ret;
	size_t n = 0;

	do {
#ifdef _WIN32
		/* snip */
#endif
		errno = 0;
#ifdef _WIN32
		/* snip */
#else
		ret = read(req->fd, ((char*)buf)+n, count-n);
#endif
		if (ret > 0) {
			n += ret;
		} else if (ret == 0 && errno == 0) {
			return n;
		} else if (ret <= 0 && errno != 0 && errno != EINTR) {
			return ret;
		}
	} while (n != count);
	return n;
}

Looking at the manual page for read(), I notice a couple of things that stand out in Description and later in Return Value:

If the file offset is at or past the end of file, no bytes are read, and read() returns zero.
On success, the number of bytes read is returned (zero indicates end of file)

The PHP source has handling for ret being zero, and also checks errno in the two else if statements, but I wasn't sure what was being returned by read(). I decided to add some of my own logging to see exactly what was going on by patching PHP. I spun up a test server with some hefty resources so I'd be able to build PHP quickly and set it up as follows.

add-apt-repository ppa:ondrej/php
add-apt-repository ppa:ondrej/nginx
apt update
apt install nginx php8.2 php8.2-cli php8.2-common php8.2-curl php8.2-fpm php8.2-gd php8.2-igbinary php8.2-intl php8.2-mbstring php8.2-soap php8.2-xml php8.2-zip php8.2-redis
apt upgrade -y
reboot

mkdir build
cd build
sudo sed -i 's/^#\s*\(.*\)/\1/' /etc/apt/sources.list.d/ondrej-ubuntu-php-jammy.list
sudo apt-get update -y
sudo apt-get install -y devscripts build-essential fakeroot
sudo apt-get build-dep -y php8.2-fpm
apt-get source -y php8.2-fpm
cd php8.2-8.2.10
dpkg-buildpackage -rfakeroot -uc

Even running on a hefty VPS with 16x Intel vCPU, 32 GB RAM and a 200 GB NVMe drive, it still took a little over 30 minutes to build, but it was ready. I made the following patch for main/fastcgi.c:

--- fastcgi.c.orig      2023-09-29 10:06:52.180803585 +0000
+++ fastcgi.c   2023-09-29 10:06:15.932395211 +0000
@@ -982,8 +982,10 @@
                if (ret > 0) {
                        n += ret;
                } else if (ret == 0 && errno == 0) {
+                       php_syslog(LOG_NOTICE, "errno 0: %s", strerror(errno));
                        return n;
                } else if (ret <= 0 && errno != 0 && errno != EINTR) {
+                       php_syslog(LOG_NOTICE, "errno non-0: %s", strerror(errno));
                        return ret;
                }
        } while (n != count);

I used strerror() to get the string version of the error number for ease of reading and I also wanted to monitor the progress of reading chunks from the socket too. I made another patch for main/SAPI.c:

--- SAPI.c.orig 2023-09-29 12:32:46.020740222 +0000
+++ SAPI.c      2023-09-29 12:34:35.769896620 +0000
@@ -248,6 +248,9 @@
                SG(post_read) = 1;
        }

+       php_syslog(LOG_NOTICE, "Read %zu bytes (" ZEND_LONG_FMT " bytes remaining)",
+                       read_bytes, SG(request_info).content_length - SG(read_post_bytes));
+
        return read_bytes;
 }

@@ -285,6 +288,7 @@
                        }

                        if (read_bytes < SAPI_POST_BLOCK_SIZE) {
+                               php_syslog(LOG_NOTICE, "POST data reading complete. Size: " ZEND_LONG_FMT " bytes (out of " ZEND_LONG_FMT " bytes)", SG(read_post_bytes), SG(request_info).content_length);
                                break;
                        }
                }

Now it was just a case of running make to incorporate my changes and overwrite the existing binary with my changes.

cd fpm-build
make
systemctl stop php8.2-fpm.service
cp ./sapi/fpm/php-fpm /usr/sbin/php-fpm8.2
systemctl start php8.2-fpm.service

All I had to do then was cause the issue and check syslog for the errors. I've tidied up a snippet here and removed unrelated entries.

Sep 28 21:13:30 test-server php: Read 16384 bytes (1043937 bytes remaining)
Sep 28 21:13:30 test-server php: Read 16384 bytes (1027553 bytes remaining)
Sep 28 21:13:30 test-server php: Read 16384 bytes (1011169 bytes remaining)
Sep 28 21:13:30 test-server php: Read 16384 bytes (994785 bytes remaining)
Sep 28 21:13:30 test-server php: Read 16384 bytes (978401 bytes remaining)
Sep 28 21:13:30 test-server php: Read 16384 bytes (962017 bytes remaining)
Sep 28 21:13:30 test-server php: Read 16384 bytes (945633 bytes remaining)
Sep 28 21:13:30 test-server php: Read 16384 bytes (929249 bytes remaining)
Sep 28 21:13:30 test-server php: Read 16384 bytes (912865 bytes remaining)
Sep 28 21:13:30 test-server php: Read 16384 bytes (896481 bytes remaining)
Sep 28 21:13:30 test-server php: Read 16384 bytes (880097 bytes remaining)
Sep 28 21:13:30 test-server php: Read 16384 bytes (863713 bytes remaining)
Sep 28 21:13:30 test-server php: Read 16384 bytes (847329 bytes remaining)
Sep 28 21:13:30 test-server php: Read 16384 bytes (830945 bytes remaining)
Sep 28 21:13:30 test-server php: errno 0: Success
Sep 28 21:13:30 test-server php: Read 0 bytes (830945 bytes remaining)
Sep 28 21:13:30 test-server php: POST data reading complete. Size: 229376 bytes (out of 1060321 bytes)
Sep 28 21:13:30 test-server php: JSON decode failed: 1060321 229376

Gotcha! 😎

But we didn't get everything?

At least now I can finally understand what's happening. Because PHP thinks it is hitting EOF, before it should, it's beginning to process the request without having the entire payload. What surprises me is that PHP knows it hasn't got all of the request payload, but it continues anyway. This is the part where I can't decide what it should do, but, saying "hey, 50% of this payload is missing, let's go!", is what caught me out here.

To try and determine why the truncation is happening at a particular size, I'd spotted that sapi_read_post_block() was using SAPI_POST_BLOCK_SIZE defined as define SAPI_POST_BLOCK_SIZE 0x4000, or, converting that to decimal, 16,384... Going back to the previous blog post, the POST payload was consistently being truncated at 65,536 bytes, which is 16,384 * 4 = 65,536, so a possible cause for the specific size of the truncation? I'm not so sure, because, when using TCP sockets, tweaking the values for /proc/sys/net/ipv4/tcp_rmem will change the size of the truncation, and the truncated size is not always a multiple of 16,384. Also, when we switched to the Unix socket from the TCP socket, the size of the truncated payload changed to 229,376, which closely aligns with /proc/sys/net/core/[rw]mem_default defined here. I even patched PHP again to modify SAPI_POST_BLOCK_SIZE but that didn't have an impact on the size of truncated payload. At this point, it's still not clear, but I know PHP will read from the socket until all of the data is read, what changes is how much data is there.

Once the read() system call in safe_read() returns 0, we're going all the way back up the call stack to SAPI_POST_READER_FUNC() before any checking of the payload size takes place. In this function, there are only three checks taking place, and only one relates to the size of the payload (source).

if ((SG(post_max_size) > 0) && (SG(read_post_bytes) > SG(post_max_size))) {
    php_error_docref(NULL, E_WARNING, "Actual POST length does not match Content-Length, and exceeds " ZEND_LONG_FMT " bytes", SG(post_max_size));
    break;
}

This seems like a sensible sanity check that the size of the POST payload is not exceeding the post_max_size limitation, which defaults to 8M if not set, but the error message threw me a little. The first part of the error message states Actual POST length does not match Content-Length, which is precisely the error I'm hoping to catch, but I can't see how the conditions match the text of the message, so this seems like a mistake in the text. Based on the conditions, a more accurate error text would be POST length exceeds post_max_size, but I have to let them off as this code is 23 years old! The size of the POST is also checked earlier in the same function (source), before the body is read, by checking the declared content-length against post_max_size, so the check is backed up later after reading the body, and, the error text here is accurate.

if ((SG(post_max_size) > 0) && (SG(request_info).content_length > SG(post_max_size))) {
		php_error_docref(NULL, E_WARNING, "POST Content-Length of " ZEND_LONG_FMT " bytes exceeds the limit of " ZEND_LONG_FMT " bytes",
					SG(request_info).content_length, SG(post_max_size));
		return;
	}

After that, we can run back up through the rest of the call stack and find no additional checks to match the size of the POST that was read against the declared size of the payload. The request will be processed even though we didn't get all of the request.

Is this a bug?

I've certainly gone backwards and forwards on my answer to this question, but ultimately, I think this is a bug. It boils down to choosing between two options:

  1. Should PHP dump the partial request with an error?
  2. Should PHP attempt to process a partial request?

I have looked at this from a security angle, and, so far, I've not found a way this can be obviously abused, but it depends if receiving partial request payloads would be dangerous for your application. Arguably, this is something you should be handling anyway.

To see behaviour closer to what I was expecting to see, although not quite, I created a patch for SAPI.c that will dump the whole request body if we get a partial request.

--- SAPI.c.orig 2023-09-29 12:32:46.020740222 +0000
+++ SAPI.c      2023-09-29 20:28:36.530234632 +0000
@@ -288,6 +288,12 @@
                                break;
                        }
                }
+
+               if (SG(read_post_bytes) != SG(request_info).content_length) {
+                       php_stream_truncate_set_size(SG(request_info).request_body, 0);
+                       php_error_docref(NULL, E_WARNING, "POST length of " ZEND_LONG_FMT " bytes does not match declared Content-Length " ZEND_LONG_FMT " bytes; all data discarded", SG(read_post_bytes), SG(request_info).content_length);
+               }
+
                php_stream_rewind(SG(request_info).request_body);
        }
 }

It's not perfect, but it does give me some clear feedback as to what happened, and, I don't receive a partial payload to process now. You can see the problem is detected on the final line of this log slice.

Sep 29 20:23:31 test-server php: Accepted connection: 0ce264eae07198c3c59ae90d04127c39
Sep 29 20:23:31 test-server php: Read 16384 bytes (1024737 bytes remaining)
Sep 29 20:23:31 test-server php: Read 16384 bytes (1008353 bytes remaining)
Sep 29 20:23:31 test-server php: Read 16384 bytes (991969 bytes remaining)
Sep 29 20:23:31 test-server php: Read 16384 bytes (975585 bytes remaining)
Sep 29 20:23:31 test-server php: Read 16384 bytes (959201 bytes remaining)
Sep 29 20:23:31 test-server php: Read 16384 bytes (942817 bytes remaining)
Sep 29 20:23:31 test-server php: Read 16384 bytes (926433 bytes remaining)
Sep 29 20:23:31 test-server php: Read 16384 bytes (910049 bytes remaining)
Sep 29 20:23:31 test-server php: Read 16384 bytes (893665 bytes remaining)
Sep 29 20:23:31 test-server php: Read 16384 bytes (877281 bytes remaining)
Sep 29 20:23:31 test-server php: Read 16384 bytes (860897 bytes remaining)
Sep 29 20:23:31 test-server php: Read 16384 bytes (844513 bytes remaining)
Sep 29 20:23:31 test-server php: Read 16384 bytes (828129 bytes remaining)
Sep 29 20:23:31 test-server php: Read 16384 bytes (811745 bytes remaining)
Sep 29 20:23:31 test-server php: errno 0: Success
Sep 29 20:23:31 test-server php: Read 0 bytes (811745 bytes remaining)
Sep 29 20:23:31 test-server php: POST data reading complete. Size: 229376 bytes (out of 1041121 bytes)
Sep 29 20:23:31 test-server php: POST length of 229376 bytes does not match declared Content-Length 1041121 bytes; all data discarded

I guess the only thing left to do is to raise a bug (done) and see what see happens, but for now, at least I finally understand why PHP was trying to process those truncated JSON payloads! 🎉

Outstanding Questions

The final piece of the puzzle that I'm still trying to put together is how we end up in this situation. How does PHP end up reading only part of the request?

My knowledge on the inner workings of TCP sockets or Unix sockets is basically nothing and I've learnt the little that I do know between this blog post and the last one. My current theory looks something like this, and bear in mind, this is mostly guess work!

  1. Nginx receives the full POST request from the client.
  2. Nginx opens the socket to PHP and starts writing until the buffer is filled?
  3. PHP is busy so no child process can spawn to handle the request?
  4. Nginx times out and goes away.
  5. PHP finally gets around to processing the request, which is limited to the partial content in the buffer.
  6. PHP processes the partial request.

Without knowing more on how sockets work, a large portion of my assumptions might be wrong, so it'd be great to hear from someone with real knowledge in this area. If you can help out, please do drop by the comments below and give me some tips or answers!

Another huge thanks has to go to João Poupino again for assistance with this post, it simply wouldn't have been possible otherwise 🍻