73°F

Aaron Parecki

  • Articles
  • Notes
  • Photos
  • tantek https://github.com/tantek   •   Jun 22

    #8 Need use-cases section

    Some thoughts on the XRay and jf2 JSON formats

    April 24, 2017

    Since beginning the jf2 spec, I've continued developing XRay, and its format has diverged from the original jf2. Tonight I spent a while trying to reconcile the changes to submit a PR to the spec. I was unable to come up with a short PR, and instead got drawn in to thinking about the motivations behind a simpler mf2 JSON format to begin with.

    I use XRay in a number of projects for various purposes.

    • My website runs every external URL through XRay to handle consuming the Microformats on the page, converting it to a simplified form. This is used whenever I reply to a post to display the reply context, as well as to fetch the post contents when I make a repost.
    • Loqi uses XRay to create a one-line summary of URLs pasted into IRC.
    • webmention.io uses XRay to parse the source URL of webmentions to extract useful data about the webmention, and makes this data available via an API.
    • IndieNews uses XRay to parse submitted URLs to display the name and author of the posts.
    • Quill uses XRay to show a preview of in-reply-to URLs.
    • My rudimentary reader uses XRay to extract the h-entry data from posts to display in my reader.

    There are a number of things that XRay does when extracting the mf2 data.

    • Finds the author of a post following the authorship algorithm
    • Follows the comments presentation algorithm to remove the name property if it's a duplicate of the content.
    • Figures out the primary object on the page, or whether the page represents a list of posts, which is sometimes tricky. (some discussion on representative object)
    • Is vocabulary-aware, so always returns a consistent set of properties, and doesn't return unknown properties. e.g. published is always a single string, and category is always an array.
    • Sanitizes all HTML, allowing only a small subset of HTML tags and Microformats classes on the HTML elements.
    • For any values that might be embedded objects, e.g. a person-tag or in-reply-to property, always returns the URL in the value and moves the embedded object to a refs object, making it easier to consume.
    • The author property is a simplified h-card containing only name/photo/url properties that are single values.

    As you can see, a lot of what XRay is doing is cleaning up some of the the "messy" parts of Microformats JSON. Not necessarily the specific JSON format, but more about the overall structure, such as how an author of a post can be in many different places in a parsed Microformats JSON object. This is not to place blame on Microformats, since what it's doing is creating a JSON representation of the original HTML, and allowing authors flexibility in how they publish HTML rather than prescribe specific formats is a core principle.

    What this means is XRay is actually acting more as an interpreter of the Microformats JSON, in order to deliver a cleaned-up version to consumers. Most of my projects that use XRay could actually be considered "clients", such as how I use XRay to parse posts for my reader, whether that's output to me in IRC or re-rendered as a post on IndieNews.

    My primary need for an alternative Microformats JSON format is actually a client-to-server serialization, where the client is getting a cleaned up version of external posts, and can assume that the server it's talking to is responsible for taking the messy data and normalizing it to something it expects. In this sense, the use case of jf2 is a client-to-server serialization, whereas the Microformats JSON is a server-to-server serialization. This would then be a core building block for Microsub, a spec that provides a standardized way for clients to consume and interact with feeds collected by a server.

    The main current challenge in defining a spec for this use case is how tied to specific vocabularies it should be. For example, Microformats JSON says that every value should always be an array. However, there are a few properties for which it never makes sense to have multiple values, and creates additional complexity in consuming it, e.g. published, uid, and location. It's easier to consume these when the values can be relied upon to always be a single value. With the author of a post, the author of an h-entry may be an object or a string, making it more complicated to consume that when it can vary, so XRay's format always returns a consistent value. However this is tied to the h-entry vocabulary, since other Microformats vocabularies don't have an author property. In general, the success I've had with XRay's format is due to the fact that it makes hard decisions about what properties it returns, and is consistent about whether those properties are single- or multi-valued, in order to provide a consistent API to consumers.

    I am just not sure how to balance wanting to provide that simplicity for consuming clients while also allowing flexibility in publishing, while also not hard-coding too much into a spec that might be obsoleted later.

    Portland, Oregon
    Mon, Apr 24, 2017 8:59pm -07:00 #jf2 #xray #indieweb
    2 likes 1 reply 2 mentions
    • Jacky Alcine
    • Eddie Hinkle
    • Jacky Alcine v2.jacky.wtf

      This is something I’m running into as I’m building out Koype. Over at my site; I end up “wrapping” values if they have only one value and work off that. I think it might be a construct of the language but it’s very encouraging to the concept of everything being a list. That said, things like author and location do come up funny (even end I notice tends to be a list.

      One idea to this would be providing an external schema that could define how a list-value is turned into a more normalized value. Just spitballing.

      Thu, Feb 7, 2019 11:30am -07:00

    Other Mentions

    • Henrique Dias hacdias.com

      I’ve been diving a bit into the Microformats and JF2 formats and I was quite confused today. On my new system, I’m storing some properties as a “flattened” version of Microformats. For some reason I assumed that was JF2, but it isn’t! Here’s a nice read!

      Thu, Nov 4, 2021 12:58pm -07:00
    • Barry Frost barryfrost.com

      X-Ray returns structured JSON data from any URL. Potentially useful for extracting reply contexts: it follows authorship and comments presentation rules and constructs a simplified set of data.

      Further background: https://aaronparecki.com/2017/04/24/15/jf2

      Fri, Apr 28, 2017 9:11am -07:00
Posted in /articles using quill.p3k.io

Hi, I'm Aaron Parecki, Director of Identity Standards at Okta, and co-founder of IndieWebCamp. I maintain oauth.net, write and consult about OAuth, and participate in the OAuth Working Group at the IETF. I also help people learn about video production and livestreaming. (detailed bio)

I've been tracking my location since 2008 and I wrote 100 songs in 100 days. I've spoken at conferences around the world about owning your data, OAuth, quantified self, and explained why R is a vowel. Read more.

  • Director of Identity Standards at Okta
  • IndieWebCamp Founder
  • OAuth WG Editor
  • OpenID Board Member

  • 🎥 YouTube Tutorials and Reviews
  • 🏠 We're building a triplex!
  • ⭐️ Life Stack
  • ⚙️ Home Automation
  • All
  • Articles
  • Bookmarks
  • Notes
  • Photos
  • Replies
  • Reviews
  • Trips
  • Videos
  • Contact
© 1999-2025 by Aaron Parecki. Powered by p3k. This site supports Webmention.
Except where otherwise noted, text content on this site is licensed under a Creative Commons Attribution 3.0 License.
IndieWebCamp Microformats Webmention W3C HTML5 Creative Commons
WeChat ID
aaronpk_tv