This morning, @tommorris struke a nerve with the Hackernews community with his post .
Paraphrasing a bit, his main point was:
I just want the data on your website in a machine readable format. XML, JSON, RDF, CSV, YAML.
I just want to take the URL of the thing I’m looking at, send you a content negotiation header or tack a little .xml or .json or whatever on the end and get the damn data.
This post was #1 on Hackernews for a while, garnering a wide variety of comments, including Tom being called arrogant, arguing about the purpose of URLs, and a tangential discussion of post URL formats.
I think most people missed the point of the article, and took it too literally. Perhaps @tommorris should have clarified what he meant (assuming I am right in interpreting this rant). I don't think he was referring to APIs like Twitter or Github, where you obviously need to register API keys since you can use the APIs to manipulate user data.
My assumption was that he was talking specifically about APIs for content-driven websites. What I mean is that websites like Nasa's raw image feed from the Curiosity Rover or Portland's Rose Quarter event calendar are data-rich websites, and since the content is already available as HTML, there should be alternative machine-readable formats easily available without needing to first register for an API key.
One way to do this, as Tom suggested, would be to have alternate versions of the content available by appending ".json" or ".xml" to the URL, or to accept alternate "Accept" HTTP headers specifying the content type.
Enter Microformats
A potentially easier solution for the content providers is to mark up the HTML content with Microformats to give it a machine-readable structure. There are several Microformats parsers available in a variety of languages. Most will accept the HTML of a page as input and output a structured version of the page. A goal of Microformats 2 is to be able to convert any HTML page into a JSON representation by parsing the Microformats on the page.
For example,
<a class="h-card" href="http://benward.me">Ben Ward</a>
when parsed, is converted to the following JSON representation:
{
"items": [{
"type": ["h-card"],
"properties": {
"name": ["Ben Ward"],
"url": ["http://benward.me"]
}
}]
}
If all content-driven websites marked up content with the appropriate Microformats, you wouldn't even need a separate API to access a machine-readable version of the content.
So if you're creating or maintaining a website with content that people will potentially want to access programmatically, consider marking up the content with Microformats. There are currently several types of content described, including: