Thursday, March 6, 2014

Pushing state through the URL efficiently

I am building a web app. I want to be able to share URLs to particular things, and more specifically, to parameter-tuned views of particular things. The kind of tuning might be a database query, e.g. to restrict a date range. Or, it might be setting the X and Y axes to use for a chart.

Either way, I needed to gather the state necessary for that somehow. Doing it server-side or using session state was out of the question, since that made it hard to email a URL to a friend.

One option would be to use something like a URL shortener to store the config on the server, and share the key to that set of configuration through the URL. That would work fine, but it has two downsides:
  (1) The state is not user-readable
  (2) You can't blow away the server data and start again without affecting URLs
  (3) Remember, cool URIs don't change

For those reasons, I thought something like json would be perfect. It's well-described, human-readable, and very standard. However, it make your URLs look a bit ... meh. I wanted an alternative which to some degree hid what was going on, but was reverse-engineerable.

So I hit upon encoding the data. Python supports, e.g. string.encode('hex'). This meets some of the brief -- it happily turns a string into a hexadecimal number which can be trivially converted back again. This can be used to encode config into a less visibly clumsy way of passing state. It just tends to be a bit long.

I then hit the tubes to see how one could more efficiently pack data. There were a lot of good answers for really long strings which provided an efficient encoding, but few examples for short strings of ascii. What people were doing, however, was minifying the json.

I ended up using the following process to achieve my goals:
  -- minify the json
  -- call base64.urlsafe_b64encode(minified.encode('bz2'))

This first packs the json down into an efficient number of ascii characters, then applies a bz2 compression technique, and then packs that into a url-safe parameter which can be easily interpreted by the server (or anyone else). It also puts the JSON config data into a fairly safe packet. There's not a lot of risk on the server-side that poor decoding of the url component will result in some kind of security exception.

So, how does it perform? Well, here is the non-minified json snippet:

{
    "title": "Thunderstorm track error",
    "x_axis": "time",
    "x_labels": "time_labels",
    "y_axis": "dist",
    "y_labels": "dist_labels",
    "series_one": "blah"
}

The input json snippet was 177 characters long.
The length of the minified json was 140 characters long
The length of the bz2 data was 126 'characters' long
The length of the base-64 url encoding was 168 characters long

For larger json files, the saving from minifying is even greater. Also, for much larger json files, I would expect the saving from the bz2 compression to be a much higher proportion also.

The final url string was slightly shorter than the original string. It's not a big saving, but at least it's not larger. By contrast, if I just hex encode the minified string, the length is 280 long. Each step of the process is important to keeping the shared string as short as possible while still keeping a sensible transport format.

I'd be curious if anyone else had done any work looking into sharing shortish ascii strings for sharing configuration via URL parameter.