Decoupling the Client and Server with Hypermedia

This is part two in a series about hypermedia and REST. You don't need to read the previous post to follow this one, but if you like this and want to read more or prefer to start at the beginning, check out the full series.

Hypermedia is an old idea that dates back to at least the 1940s. The first proposal of a hypermedia system was in 1945 when Vannevar Bush proposed a system he called Memex. Despite being implemented with microfilm, Memex would be capable of things like indexing and searching linked content. The term hypermedia was coined in 1963 by Ted Nelson. Over the years there have been several implementations of hypermedia systems, but the concept didn't really explode until the early '90s when Tim Berners-Lee brought us the World Wide Web.

Hypermedia and REST

In a future post we'll get into the details of the formal definition of the REST architectural style and where hypermedia fits into that definition. But for now we'll use the Richardson Maturity Model (RMM) as a proxy to see where hypermedia fits into REST. RMM is a classification model for REST APIs. There are four levels, 0-3. As you go up the levels, more of the formal definition of REST is taken into account. Here they are very briefly.

Level 0 APIs pipe all requests through one URI and one HTTP method (usually POST).
Level 1 APIs employ many URIs, but still only use one HTTP method (usually POST).
Level 2 APIs have URIs that represent resources and use the HTTP methods to perform CRUD operations on those resources.
Level 3 APIs use self-descriptive messages that include hypermedia controls to drive the API.

Decoupling Client and Server

Level 2 is the REST we are most familiar with. In Level 3, we do all the Level 2 stuff, except we use a media type that allows us to define hypermedia controls. The benefit we get from using hypermedia is that we decouple the client from the server. The responsibility of knowing how different resources fit together and how the system changes state moves from the client to the server. This means that the server has much more freedom to change because changes to the server don't require changes to the client. On the web, the client is the browser.

This property is what has made the web such a popular platform for application development. With web applications, we no longer need to worry about distributing new copies of our application or getting people to upgrade. If you fix a bug or build a new feature, just update your server and everyone who uses your web application is immediately upgraded to the latest version. Users use the same browsers with the new version as they did with the old version and don't have to do anything to get the update.

A RESTful Mobile Platform

Imagine we are creating a mobile app that is backed by a typical API. One part of our app collects some information from the user and sends a request to our API. Now assume we want to add something to the set of information we are collecting. If our API uses hypermedia, the mobile app should be building the form the user fills out dynamically from what it gets from hypermedia controls sent by the server. We can make the change on the server side and the mobile app will get the changes immediately, just like it does with the web. Without hypermedia we have to wait a few weeks for the App Store to approve changes, and then wait some more for all of our users to update the app on their devices to the latest version.

If mobile platforms were built as REST architectures, the mobile platform would provide a hypermedia format (similar to HTML) that is optimized to build interfaces with native mobile controls. This format would include direct access to things like cameras, sensors, and location services. One of the reasons web applications running on mobile tend to be slower than native mobile apps is because they are a web platform running on top of a mobile platform. A mobile-native REST architecture would not only be optimized for mobile, but also allow you to use the mobile platform's native language (such as Kotlin or Swift) as opposed to JavaScript like you would on a web platform.

What is Hypermedia?

We've talked a lot about what hypermedia can do, but what is hypermedia exactly? Hypermedia is the concept of a document that includes references to other documents. This allows us to create a graph (or web) of related content. A media type that includes hypermedia controls is called a hypermedia format. The hypermedia format we are all familiar with is HTML. Hypermedia can take many forms, but we're going to explore its concepts through the familiar constructs we know from HTML.

Links

On the web, we primarily know links as HTML anchor tags.

<a href="/author.html">Jason Desrosiers</a>

When the browser sees this tag, it knows that it should construct an HTTP request to GET the resource identified by the href attribute. The text between the open and close tags is used for UI generation. It gives the user something to click on to trigger the browser to follow the link.

Relations

The text between the <a> open and close tags communicates to the user how the linked document relates to the current document. This works well when the user is a human, but when the user is a computer, as it is with a web crawler or an API, we need a more machine-friendly way of knowing what a link refers to. That's what the rel attribute is for.

<a rel="author" href="/author.html">Jason Desrosiers</a>

rel is short for "relation". It describes how two documents are related. This link indicates that a document about the author of the current document is available at /author.html. The rel attribute can't be just any string that is convenient. It has to be something with agreed-upon semantics so any user-agent (aka web browser, web crawler, etc) understands them the same way.

There is a registry for link relations that contains all of the official link relations. You can define additional relations to fit your needs, but they must be uniquely identified with a URI. The primary reason for this is to avoid name collisions. If rel is to be effective as a way for computers to understand what links mean, we need to have agreed-upon semantics for all of these values.

Before defining your own relation, it's always a good idea to check that what you need hasn't already been defined in a vocabulary like Schema.org. But, if you do need to define your own relation, it's recommended that the URI you choose to identify your relation can be used to retrieve documentation that tells your users what it means.

Forms

At first glance, HTML Forms appear to be a different concept from hyperlinks, but they are actually a variation of the same thing. When the browser sees an anchor tag, it knows to build an HTTP GET request. When the browser sees a <form method="post"> tag it knows to build an HTTP POST request. It's the same thing, it's just used for making a different kind of request.

<form method="post" action="/comments">
 <input type="text" name="username">
 <textarea name="comment"></textarea>
 <button type="submit">Add Comment</button>
</form>

But, an HTML Form does more than specify a URI, it also has <input> tags that specify what kind of input the resource expects. The browser can use those tags to construct a UI that the user can use to fill out the form. Then, when the form is submitted, the browser knows how to encode the user's input into a media type that the server understands. In HTML, a form is designed to encode the request using application/x-www-form-urlencoded.

Applying Hypermedia to JSON

HTML is great for representing documents, but it's not great for representing data. So, let's look at an example of what adding hypermedia controls to a JSON document might look like.

Level 2

The response from a Level 2 API might look like this.

{
		"firstName": "Joe",
		"lastName": "Shmo",
		"email": "jshmo@example.com",
		"company": "Widget Co."
}

Because this is from a Level 2 API, there are a number of things a client can infer when it receives this response. If we want to change this resource, we can make a PUT request. If we want to delete the resource we can make a DELETE request.

But, there is a lot we don't know as well. If I attempt a PUT or DELETE request, how do I know if the server supports those operations on this resource? What if I want to access information about this person's company? What about other related information, such as the blog posts he's written? This person is on an island isolated from the world. In order to access additional information, our users would have to consult the documentation and hardcode the relationships into their app. We could create an API client for our users to define those relationships, but then we have to maintain that client and get all of our users to update the version of the client they are using any time we make a change.

Level 3

Now let's look at an example that uses hypermedia. To do this, we need a hypermedia format like HTML, except compatible with JSON. For this example, I've made up a hypermedia format so I could keep things as simple as possible. For a real API, you would want to use an established hypermedia format¹ with a formal specification. This made up hypermedia format needs a content type, so let's call it application/not-a-real-hypermedia-format+json. Even though it is compatible with JSON, it needs a distinct content type so the client knows to interpret certain elements as hypermedia controls. Otherwise the browser doesn't know the difference between a hypermedia control and arbitrary data.

{
		"$type": "http://schema.org/Person",
		"$data": {
				"givenName": "Joe",
				"familyName": "Shmo"
		},
		"$links": [
				{ "rel": "self", "href": "/person/1234" },
				{ "rel": "http://schema.org/worksFor", "href": "/company/5678" },
				{ "rel": "http://schema.org/email", "href": "mailto:jshmo@example.com" },
				{ "rel": "http://schema.org/knows", "href": "/person/1234/knows" },
				{ "rel": "http://schema.org/Blog", "href": "/person/1234/blogs" }
		],
		"$forms": [
				{
						"rel": "http://schema.org/knows", "href": "/person/1234/knows"
				}
		]
}

The first thing you might notice is that this is a lot more complicated than the Level 2 example. That's true, but it also expresses much more. We still know all the things we knew from the previous example, but now we also know about related resources. This resource is no longer an island. The $links and $forms provide a bridge connecting it to other resources. In this made up format, a generic client should expect $links to be followed with GET and $forms to be followed with POST.

We are using Schema.org to give meaning to things in a universal way. The $type property lets us know that this resource is describing a http://schema.org/Person. If we have a generic client that understands this definition of a Person, it can automatically trigger handling such as generating a UI to display the person. It can also do things like generate a http://schema.org/Person form for the http://schema.org/knows link to add a person to Joe's contacts. The generic client knows it should generate a http://schema.org/Person form because the definition of the http://schema.org/knows relation tells us that it's expected to point to a http://schema.org/Person.

Even though there is a lot going on here, there should not be a burden on someone using our API because they should be using a generic client that understands how to parse the hypermedia and makes it easy to interact with.

What's Next

In an attempt to simplify explanations, I made up a fake hypermedia format. In the next post in the series, we'll look at some real hypermedia formats and introduce a classification system that we'll use to discuss the properties of each format.

Check out Part 3: The Hypermedia Maturity Model.