Mimetypes (and APIs)

Project
This post is over 4 years old and is probably out of date.

As part of my day-job at Engine Yard, I spend a lot of time working with, and writing APIs.

For all of the APIs I write, I use the awesome FRAPI API framework; and have been hacking away adding new features and fixing bugs more and more frequently over the last few months.

One such feature, was the addition of mimetype support. Mimetype support allows you to specify mimetypes allowed in the Accept header, and to which format the response would be.

The reason behind this was that at one point I was working with Githubs API which uses mimetypes extensively for various reasons; and at the time, I thought they were very good reasons.

A full github mimetype looks like: application/vnd.github[.version].<param>[+json]

So lets break this down:

application/
The media type (this indicates it is intended for an application to work with the data)
vnd.github
As specified in RFC 2048, vendor specific mimetypes should be prefixed with vnd. followed by the producers name
[.version]
Next up, we have [optionally] the API version, currently 3
.<param>
This is followed by a value that designates the response data type (full, raw, text, or html)
[+json]
We finish with the [optional] serialization, always json.

In theory, this looks great; just parse the mimetype and you have the version number, the response type, and potentially any number of serializations (think: XML, jsonp, serialize PHP even!). Simple, right?

Well… not quite. Lets look at an example API request (using the awesome httpie):

The important line to note here, is Content-Type: application/json; charset=utf-8.

Now lets try again, this time, with the Accept: application/vnd.github.3+json. What this says is: I want you to serve me content of this type.

Notice that we get the same type of response as before, application/json… but this isn’t what we requested. While it’s what we expect (because we read the docs and we know the entire API uses json); why isn’t the response Content-Type: application/vnd.github.3+json?

Most likely, it’s because most clients that understand json (for example those that unserialize it in to native data structures automatically) look for two types: text/json and application/json, others like jquery just look for anywhere in the Content-Type response header.

What’s wrong with this? Well, for starters, what if you request Accept: text/plain and you get back Content-Type: image/jpeg. That’s not going to work. Secondly, for caching proxies and such, it is isn’t possible to differentiate between the application/vnd.github.3.full response, and the application/vnd.github.3.raw or application/vnd.github.4.full response.

It would be entirely reasonable to respond with a HTTP/1.1 406 Not Acceptable header in the case where the server cannot respond with an accepted media type.

All in all, I think this use of mimetypes is less than great. Not terrible; definitely within the confines of the spec. So what is the solution?

The solution is part of the MIME RFCs (there are 5 original RFCs, 2045-2049, and dozens of updates) and explained in this context, in the HTTP 1.1 spec (RFC 2616), and that solution is accept-params or mimetype parameters. You’ve seen, and used them, regularly I’d bet, here’s an example:

Content-Type: text/html;charset=utf-8

These exist explicitly to pass variable named values along with the mimetype. The one explicitly defined in the HTTP spec is the q parameter for designating preference for mimetypes in the Accept header.

So lets try the github mimetypes using parameters instead:

application/json;version=3;response=raw

And in the response:

Content-Type: application/json;version=3;response=raw

Not only do we get back the same mimetype, any decent HTTP client will still see the application/json, and caches will respect the parameters, just like they would the charset parameter. You could even drop parameters to indicate it was ignored; add parameters to give more details (e.g. charset) or change the values to indicate what you actually did.

Better yet, we can now do things like use q-values. So, should Github add XML output to their API, we could indicate we can handle both, but prefer json:

Accept: application/xml;q=0.8;version=3;response=raw, application/json;q=1.0;version=3;response=raw

Or that we prefer (the future) version 4, but can still handle version 3:

Accept: application/json;q=1.0;version=4;response=raw, application/json;q=0.5;version=3;response=raw

All-in-all, I believe this to be more semantically and technically correct, while also being easier (parameters are named (case-insensitive), optionally quoted, and can be in any order except q which must be first)

One thought on “Mimetypes (and APIs)

  1. I like these concepts and they make great sense. I have been learning lately to be more exact in meaning and not make assumptions. It’s so easy to assume that things will always work a certain way, but what if “they” change something. What if you want to change something?

    Being a little more verbose is not a bad thing, and can lead to more flexibility down the road when you least expect it.

    Thanks for sharing and giving me a new perspective.

Comments are closed.