Removing Memory Allocations in HTTP Requests Using ArrayPool<T>

Some of the code I work on can make a huge amount of network requests in very little time. Therefore, I took a special interest into trying to optimize this piece of code and reducing memory allocations over time. CPU Performance wasn't a huge concern since the network itself would by far be the limiting factor, but if we allocate huge chunks of memory, we also put a lot of load on the Garbage Collector, so I decided to investigate if there was some allocation overhead that could be improved. What I initially found was a bit horrifying, but the solution I ended up with was quite frankly awesome and I still brag about it whenever I get a chance :-) I wanted to share my finding and solution here, to help anyone else wanting to improve their networking code.

To set out benchmarking some code, I created a simple little webserver that serves out exactly 1Mb responses:

using var listener = new HttpListener();
listener.Prefixes.Add("http://localhost:8001/");
listener.Start();
CancellationTokenSource cts = new CancellationTokenSource();
_ = Task.Run(() =>
{
    byte[] buffer = new byte[1024*1024];
    cts.Token.Register(listener.Stop);
    while (!cts.IsCancellationRequested)
    {
        HttpListenerContext ctx = listener.GetContext();
        using HttpListenerResponse resp = ctx.Response;
        resp.StatusCode = (int)HttpStatusCode.OK;
        using var stream = resp.OutputStream;
        stream.Write(buffer, 0, buffer.Length);
        resp.Close();
    }
});

 Now next was set up a simple benchmark that measures allocations. I use BenchmarkDotNet, and since I'm focused on allocations and not the actual execution time, I'm using a ShortRunJob. Example:

[MemoryDiagnoser]
[ShortRunJob(RuntimeMoniker.Net80)]
public class HttpBenchmark
{
    readonly HttpClient client = new HttpClient();
    const string url = "http://localhost:8001/data.bin"; // 1,000,000 byte file
	
    [Benchmark]
    public async Task<byte[]> Client_Send_GetByteArrayAsync()
    {
        using var request = new HttpRequestMessage(HttpMethod.Get, url);
        using var response = await client.SendAsync(request).ConfigureAwait(false);
        var bytes = await response.Content.ReadAsByteArrayAsync().ConfigureAwait(false);
        return bytes;
    }
}


The obvious expectation here is I'm downloading 1Mb, so the allocation should likely be only slightly larger than that. However to my surprise the benchmark reported the following:

| Method                                    | Gen0      | Gen1      | Gen2      | Allocated  |
|------------------------------------------ |----------:|----------:|----------:|-----------:|
| Client_Send_GetByteArrayAsync             | 1031.2500 | 1015.6250 | 1000.0000 | 5067.52 KB |

That's almost 5mb (!!!) allocated just to download 1mb!

For good order, I also tried using the MUCH simpler HttpClient.GetByteArrayAsync method. This method is pretty basic, and doesn't really give you much control of the request or how to process the response, but I figured it would be worth comparing:

    [Benchmark]
    public async Task<byte[]> GetBytes()
    {
        return await client.GetByteArrayAsync(url).ConfigureAwait(false);
    }

This gave a MUCH better result, around the expected allocations:

| Method                                    | Gen0      | Gen1      | Gen2      | Allocated  |
|------------------------------------------ |----------:|----------:|----------:|-----------:|
| Client_Send_GetByteArrayAsync             | 1031.2500 | 1015.6250 | 1000.0000 | 5067.52 KB |
| GetByteArrayAsync                         |  179.6875 |  179.6875 |  179.6875 | 1030.59 KB |


So I guess that means, if you can use this method, use it. And we're done right? (Spoiler: Keep reading since we eventually will make this WAY better!)

First of all, I'm not able to use HttpClient.GetByteArrayAsync method. I needed the SendAsync method that provides the HttpRequestMessage overload to tailor the request beyond a simple GET request, and I also want to be able to work with the full HttpResponse even if the request fails with an HTTP Error code (a failed request can still have a body you can read, for example what you see on a 404 Page), or do early rejection of the request before the response has completed. So what's going on here? The more efficient method will probably also have clues to what it does that is different.

Next I changed my benchmark slightly, using the HttpCompletionOption ResponseHeadersRead, which allows me to get the HttpResponse before the actual content has been read, and only the headers have been parsed. My thinking is, I'll be able to get at the head of the stream before any major allocations are made and read through the data.

   [Benchmark]
   public async Task SendAsync_ResponseHeadersRead_ChunkedRead()
   {
       var requestMessage = new HttpRequestMessage(HttpMethod.Get, url);
       using var message = await client.SendAsync(requestMessage, HttpCompletionOption.ResponseHeadersRead).ConfigureAwait(false);
       using var stream = message.Content.ReadAsStream();
       byte[] buffer = new byte[4096];
       while (await stream.ReadAsync(buffer, 0, buffer.Length).ConfigureAwait(false) > 0)
       {
           // TODO: Process data chunked
       }
   }

The results now look MUCH more promising:

| Method                                    | Gen0      | Gen1      | Gen2      | Allocated  |
|------------------------------------------ |----------:|----------:|----------:|-----------:|
| Client_Send_GetByteArrayAsync             | 1031.2500 | 1015.6250 | 1000.0000 | 5067.52 KB |
| GetByteArrayAsync                         |  179.6875 |  179.6875 |  179.6875 | 1030.59 KB |
| SendAsync_ResponseHeadersRead_ChunkedRead |    7.8125 |         - |         - |   52.02 KB |


This tells us there is a way to process this data without a huge overhead, by simply copying out byte by byte the data as it's coming down the network. If we know the length of the response, we can just copy the content into a byte array and our allocation with be on-par with GetByteArrayAsync, and return the data then. In fact if we dig into how GetByteArray is implemented, we get the clue that that is what's going on: HttpClient.cs#L275-L285 

The code comment literally says if we have a content-length we can just allocate just what we need and return that - if we don't, we can use ArrayPools instead to reduce allocations while growing the array we want to return. It also refers to an internal `LimitArrayPoolWriteStream` that can be found here: HttpContent.cs#L923

It very cleverly uses ArrayPools to reuse a chunk of byte[] memory, meaning we don't have to constantly allocate memory for the garbage collector to clean up, but instead reuses the same chunk of memory over and over. Networking is a perfect place to use this sort of thing, because the number of simultaneous requests you got going out is limited by the network protocol, so you will never rent a large amount of byte arrays at the same time. So why don't we just take a copy of that LimitArrayPoolWriteStream and use it for our own benefit? The code copies out pretty much cleanly except for `BeginWrite` and `EndWrite`, and two exception helpers missing, but since I don't plan on using the two methods, I just removed those two, and replaced the call to the exception helpers with

throw new ArgumentOutOfRangeException("_maxBufferSize");

and code compiles fine.

This means we can now create a request that is able to access the entire byte array via its ArraySegment<byte> GetBuffer() method and work on it as a whole, before releasing it back to the pool. As long as you remember to dispose your LimitArrayPoolWriteStream, you'll be good, and we won't allocate ANY memory for the actual downloaded data. How cool is that?

    [Benchmark]
    public async Task SendAsync_ArrayPoolWriteStream()
    {
        var requestMessage = new HttpRequestMessage(HttpMethod.Get, url);
        var message = await client.SendAsync(requestMessage, HttpCompletionOption.ResponseHeadersRead).ConfigureAwait(false);

        using var data = new LimitArrayPoolWriteStream(int.MaxValue, message.Content.Headers.ContentLength ?? 256);
        await message.Content.CopyToAsync(data).ConfigureAwait(false);
        var buffer = data.GetBuffer();
        foreach(byte item in buffer)
        {
            // Work on the entire byte buffer
        }
    }
| Method                                    | Gen0      | Gen1      | Gen2      | Allocated  |
|------------------------------------------ |----------:|----------:|----------:|-----------:|
| Client_Send_GetByteArrayAsync             | 1031.2500 | 1015.6250 | 1000.0000 | 5067.52 KB |
| GetByteArrayAsync                         |  179.6875 |  179.6875 |  179.6875 | 1030.59 KB |
| SendAsync_ResponseHeadersRead_ChunkedRead |    7.8125 |         - |         - |   52.02 KB |
| SendAsync_ArrayPoolWriteStream            |         - |         - |         - |    6.15 KB |

6.15kb!!! For a 1 megabyte download, and we still have full access to the entire 1 mb of data to operate on at once. Array pools really are magic and allows us to allocate memory without really allocating memory over and over again. ArrayPools are the good old Reduce, Reuse and Recycle mantra but for developers ♻️

Now, there is one thing we can do to improve the first method that allocates 5mb: We can get that down to about 2mb by simply setting the Content-Length header server-side, which allows the internals to be a bit more efficient. But 2mb is still incredibly wasteful though, compared to what is pretty much allocation-free, and you don't have to rely on the server always knowing the content-length it'll be returning. Having a known length, we can just use the ArrayPool right in the read method, and do something much simpler like this, which is just a simplified version of the LimitArrayPoolWriteStream without the auto-growing of the pool if no content-length is available:

    [Benchmark]
    public async Task KnownLength_UseArrayPool()
    {
        using var requestMessage = new HttpRequestMessage(HttpMethod.Get, url);
        using var message = await client.SendAsync(requestMessage, HttpCompletionOption.ResponseHeadersRead).ConfigureAwait(false);
        using var stream = message.Content.ReadAsStream();
        if (!message.Content.Headers.ContentLength.HasValue)
            throw new NotSupportedException();
        var buffer = ArrayPool<byte>.Shared.Rent((int)message.Content.Headers.ContentLength);
        await stream.ReadAsync(buffer, 0, buffer.Length).ConfigureAwait(false);
        foreach(var b in buffer)
        {
            // Process data chunked
        }
        ArrayPool<byte>.Shared.Return(buffer);
    }
| Method                                    | Gen0      | Gen1      | Gen2      | Allocated  |
|------------------------------------------ |----------:|----------:|----------:|-----------:|
| KnownLength_UseArrayPool                  |         - |         - |         - |    3.71 KB |

This is definitely very efficient and a simple way to read data, _if_ you know you got a content length in the header.

 I can't claim too much credit for this blogpost. I initially logged a bug with the huge amount of memory allocated - that led to some great discussion which ultimately led me to finding this almost-zero-allocation trick for dealing with network requests. You can read the discussion in the issue here https://github.com/dotnet/runtime/issues/81628. Special thanks to Stephen Toub and David Fowler for leading me to this discovery.

 

Here's also some extra resources to read more about Array Pools:

Adam Sitnit on performance and array pools: https://adamsitnik.com/Array-Pool/
Since that blogpost, the pools only got faster, and you can read more about that here: https://devblogs.microsoft.com/dotnet/performance-improvements-in-net-6/#buffering