Since beginning the jf2 spec, I've continued developing XRay, and its format has diverged from the original jf2. Tonight I spent a while trying to reconcile the changes to submit a PR to the spec. I was unable to come up with a short PR, and instead got drawn in to thinking about the motivations behind a simpler mf2 JSON format to begin with.
I use XRay in a number of projects for various purposes.
There are a number of things that XRay does when extracting the mf2 data.
name
property if it's a duplicate of the content
.published
is always a single string, and category
is always an array.refs
object, making it easier to consume.author
property is a simplified h-card
containing only name/photo/url properties that are single values.As you can see, a lot of what XRay is doing is cleaning up some of the the "messy" parts of Microformats JSON. Not necessarily the specific JSON format, but more about the overall structure, such as how an author of a post can be in many different places in a parsed Microformats JSON object. This is not to place blame on Microformats, since what it's doing is creating a JSON representation of the original HTML, and allowing authors flexibility in how they publish HTML rather than prescribe specific formats is a core principle.
What this means is XRay is actually acting more as an interpreter of the Microformats JSON, in order to deliver a cleaned-up version to consumers. Most of my projects that use XRay could actually be considered "clients", such as how I use XRay to parse posts for my reader, whether that's output to me in IRC or re-rendered as a post on IndieNews.
My primary need for an alternative Microformats JSON format is actually a client-to-server serialization, where the client is getting a cleaned up version of external posts, and can assume that the server it's talking to is responsible for taking the messy data and normalizing it to something it expects. In this sense, the use case of jf2 is a client-to-server serialization, whereas the Microformats JSON is a server-to-server serialization. This would then be a core building block for Microsub, a spec that provides a standardized way for clients to consume and interact with feeds collected by a server.
The main current challenge in defining a spec for this use case is how tied to specific vocabularies it should be. For example, Microformats JSON says that every value should always be an array. However, there are a few properties for which it never makes sense to have multiple values, and creates additional complexity in consuming it, e.g. published
, uid
, and location
. It's easier to consume these when the values can be relied upon to always be a single value. With the author
of a post, the author
of an h-entry
may be an object or a string, making it more complicated to consume that when it can vary, so XRay's format always returns a consistent value. However this is tied to the h-entry
vocabulary, since other Microformats vocabularies don't have an author
property. In general, the success I've had with XRay's format is due to the fact that it makes hard decisions about what properties it returns, and is consistent about whether those properties are single- or multi-valued, in order to provide a consistent API to consumers.
I am just not sure how to balance wanting to provide that simplicity for consuming clients while also allowing flexibility in publishing, while also not hard-coding too much into a spec that might be obsoleted later.
I’ve been diving a bit into the Microformats and JF2 formats and I was quite confused today. On my new system, I’m storing some properties as a “flattened” version of Microformats. For some reason I assumed that was JF2, but it isn’t! Here’s a nice read!
X-Ray returns structured JSON data from any URL. Potentially useful for extracting reply contexts: it follows authorship and comments presentation rules and constructs a simplified set of data.
Further background: https://aaronparecki.com/2017/04/24/15/jf2
This is something I’m running into as I’m building out Koype. Over at my site; I end up “wrapping” values if they have only one value and work off that. I think it might be a construct of the language but it’s very encouraging to the concept of everything being a list. That said, things like
author
andlocation
do come up funny (evenend
I notice tends to be a list.One idea to this would be providing an external schema that could define how a list-value is turned into a more normalized value. Just spitballing.