I have been speaking about HTTP/2 for almost a year now,
and throughout that time I have stated over and over again:
As a community, we are still trying to figure out best practices. The examples I’m showing here, are ideas. Nobody has figured this out yet. We [application developers], web server vendors, and browser vendors, all have a role to play in exploring HTTP/2 and informing each other about what we find.
I want to make clear that while this post talks about PHP specifically, the issues are present in most server-side languages that use FastCGI or similar models.
When I first started to explore HTTP/2 Server Push, my first thought was that PHP would not be able to do server push. PHP has a single output buffer, and its interface (the Server API, or SAPI) with the web server is built around that — whatever is output to the buffer, is then served to the end-user.
Exploring further, it became clear that those discussing and trying to figure out how to deploy the HTTP/2 spec have solved this issue by using
This would then inform the web server and it would then be responsible for making sub-requests and pushing the results out to the user. By doing this, we avoid the issue of PHP having a single output buffer, by treating each push as if it were a unique incoming request. PHP is none-the-wiser.
Why We Need a New SAPI
Most of the existing exploration with HTTP/2 is focusing on websites (or web applications), rather than APIs. These two applications can vary a lot in their performance needs.
An API, however, is typically comprised on discreet resources with references to one or more other discreet resources.
While they may seem similar on the surface, the difference is that an API is surfacing a data-structure, while a webpage is a single flat document.
The APIs data-structure is often based on a datastore that supports efficient fetching of related resources.
Take the example I use in my talk, a blog API. A blog post might be comprised of:
- The blog post itself
- The author information
- Related comments
- The comments author information
We can imagine a couple of SQL queries like this:
Which, in a perfect RESTful world, would result in a number of seperate resources, each with their own URL (and therefore, separate request):
── post resource ├── author resource │ └── author avatar image resource └── comments collection └── comment resource (per comment) └── author resource └── author avatar image resource
Due to the current world of HTTP/1.1, we likely would at best split this into two resources (which happen to match up with our SQL queries up there) with each of the sub-resources embedded like so:
── post resource with author resource └── comments collection with each comment and author resource embedded
More often than not, we’ll just flatten the entire structure to a single
This is a trade-off we have to make in the name of performance.
If we wanted to move towards the first model, we would end up having to do many small queries at each layer of the structure, which could be very inefficient, or duplicate effort — especially if we need some of the sub-resource data to generate the resource URLs (think: pretty URLs using the authors name for author resources).
So, what do we do? We can cache the intermediate information for later retrieval by the web server sub-request, or we can write a SAPI that supports responding with the request resource, and subsequent pushes.
This however needs web server support.
Currently all SAPIs are based on the original CGI single request/response model. We need to move beyond this.
We need a new web server interface that supports multiplexing from the application layer, and we need PHP to be able to multiplex it’s output.
Additionally, we are going to want to control other features available in HTTP/2 dynamically for those multiplexed streams, such as stream weights and dependencies.
That Sounds Hard!
To do this would require a large effort on the part of many projects — on the scale of creating the original CGI spec, bringing to together web server vendors and language authors to decide upon a standard way to handle multiplexed communication.
We also don’t know how effective having these abilities would be.
Browser vendors are still figuring out the best practices for handling what is now a much more complicated priority tree, and re-building rendering around it.
Because it’s difficult to do so, there’s few sites taking advantage of these features yet for them to make anything more than an educated guess how to do this.
New Application Architectures
Additionally, we’re going to have to explore new application architectures, that feature asynchronous and parallel processing to create and output these multiplexed streams.
Introducing The HyPHPer Project
For the last few months I’ve been looking at the Python Hyper project, a series of libraries for handling HTTP/2. These libraries are for building concrete HTTP/2 clients and servers upon, and do not have any I/O — making them framework independent.
I have decided to try and port Hyper to PHP, as HyPHPer.
The goal is to provide a base for writing both servers and clients to explore both writing applications that can handle multiplexed responses, and documenting current browser behavior and performance implications of different response profiles.
We can then attempt to determine current best practices for performant web applications.
During the PyCon AU sprint days I managed to port the Hyper HTTP/2 frame implementation (hyperframe) entirely to PHP — including tests.
Still to be completed are:
- H2 Full Protocol Stack
If you’re interested in helping migrate these packages to PHP so that we can explore what HTTP/2 means for the future of PHP, let me know!