72°F

Aaron Parecki

  • Articles
  • Notes
  • Photos

#xray

  • Aaron Parecki

    XRay, the library that I use to parse URLs to show comments, now supports parsing direct Microformats JSON, ActivityStreams 2.0, as well as finding a rel=alternate link and parsing data from that instead!

    This means I now get great results when parsing Mastodon or other ActivityPub links, and this is also the first step in what I hope will result in fixing the Microformats situation for WordPress, since a WordPress plugin will be able to generate Microformats JSON and advertise that in a rel=alternate link.

    Next up is updating Aperture to take advantage of these new features!

    Portland, Oregon, USA • 90°F
    12 likes 6 reposts 2 replies
    Mon, Jul 30, 2018 7:32pm -07:00 #activitypub #xray #microformats #p3k #indieweb
  • aaronpk https://github.com/aaronpk   •   Jan 12

    #52 Remove images from posts containing a photo

    Aaron Parecki

    Encountered two blockers working on this:

    1) In a simple example of an img tag inside an e-content tag, the parsers are using the img tag as an implied photo property. This seems wrong to me. Example This means XRay sees a post like this as a photo post, and would remove the img tag from the content, which is definitely not the right thing to do.

    <div class="h-entry"><p class="e-content p-name">Hello World <img src="example.jpg"></p></div>
    
    {
        "type": [
            "h-entry"
        ],
        "properties": {
            "name": [
                "Hello World http://example.com/example.jpg"
            ],
            "content": [
                {
                    "html": "Hello World <img src=\"http://example.com/example.jpg\">",
                    "value": "Hello World http://example.com/example.jpg"
                }
            ],
            "photo": [
                "http://example.com/example.jpg"
            ]
        }
    }
    

    2) At the point that XRay is sanitizing the HTML value, the Microformats parser has already converted the HTML to plaintext.

    For example, XRay sees this object and runs the HTML sanitizer on the HTML value:

    {
        "html": "Hello World <img src=\"http://example.com/example.jpg\">",
        "value": "Hello World http://example.com/example.jpg"
    }
    

    This means I can't remove the img tag from the plaintext value since it's already been parsed. I think my only solution for this is going to be to create my own plaintext value out of the sanitized HTML. Unfortunately, that is not a straightforward process, as demonstrated by this relatively long function that does this in the PHP parser. However that might be the technically better option anyway, since XRay can't be sure exactly what method was used to generate the plaintext value from the original HTML anyway.

    Portland, Oregon, USA • 49°F
    Fri, Jan 12, 2018 7:32am -08:00 #xray
  • Feed Support for XRay (github.com)
    Sat, Nov 11, 2017 1:16pm -08:00 #indieweb #xray
  • tantek https://github.com/tantek   •   Jun 22

    #8 Need use-cases section

    Some thoughts on the XRay and jf2 JSON formats

    Since beginning the jf2 spec, I've continued developing XRay, and its format has diverged from the original jf2. Tonight I spent a while trying to reconcile the changes to submit a PR to the spec. I was unable to come up with a short PR, and instead got drawn in to thinking about the motivations behind a simpler mf2 JSON format to begin with.
    continue reading...
    2 likes 1 reply 2 mentions
    Mon, Apr 24, 2017 8:59pm -07:00 #jf2 #xray #indieweb
  • Day 82: Switching to Let's Encrypt for XRay on App Engine #100DaysOfIndieWeb

    A couple days ago, I switched most of my *.p3k.io domains over to individual Let's Encrypt certificates. It was relatively easy for the apps that are running on my main server. However, XRay is actually running on Google App Engine, which means my streamlined workflow for requesting and renewing certificates doesn't apply.
    continue reading...
    2 replies 2 mentions
    Sun, Mar 12, 2017 10:28am -07:00 #100daysofindieweb #xray #letsencrypt
  • Day 37: Parsing h-recipe with XRay #100DaysOfIndieWeb

    XRay now supports the h-recipe vocabulary!
    continue reading...
    2 mentions
    Thu, Jan 26, 2017 11:20am -08:00 #100daysofindieweb #recipe #xray
  • Day 36: Parsing h-review with XRay #100DaysOfIndieWeb

    Today I added the h-review vocabulary to XRay. This means you may now see objects of "type: review" show up when using XRay. 
    continue reading...
    1 like 2 mentions
    Wed, Jan 25, 2017 2:53pm -08:00 #100daysofindieweb #xray #mf2
  • Day 27: Parsing meta http-equiv and returning status code in XRay #100DaysOfIndieWeb

    Today I closed a long-standing request on XRay to return the HTTP status code from the retrieved page, as well as parsing the <meta http-equiv="Status" content="410 Gone"> tag in the HTML. I also now return the final URL that XRay retrieved the document from, after following any HTTP redirects that were sent.
    continue reading...
    2 mentions
    Mon, Jan 16, 2017 1:08pm -08:00 #100daysofindieweb #xray #indieweb
  • Week in Review #100DaysOfIndieWeb

    aaronparecki.comDay 18: I updated my reposts to show the full contents of the post I reposted rather than just the URL.Day 19: I updated my website to automatically fetch the contents of my reposted URLs when I make new reposts.Day 24: I updated my reply posts to be able to show the full contents of the post I'm replying to.Day 25: I updated my website to automatically fetch the contents of the posts I reply to.XRayDay 20: I added Instagram support to XRay, so now XRay returns data when given Instagram URLs.Day 21: I added Twitter support to XRay, although you need to pass your own OAuth keys to XRay in order for it to fetch tweets.Day 22: I updated a few things in XRay to make it easier to deploy to shared hosting, and simplified its dependencies.LibrariesDay 23: I published my timezone lookup tools as a standalone library, and updated XRay and Quill to use the library instead of the duplicated class.
    continue reading...
    Sat, Jan 14, 2017 10:48am -08:00 #100daysofindieweb #indieweb #xray #p3k
  • Day 22: XRay Ready for Deployment #100DaysOfIndieWeb

    Today I made a few changes to XRay to make it easier to deploy in more kinds of environments. I also removed a bunch of CSS/JS dependencies and simplified the UI a bit.
    continue reading...
    3 mentions
    Wed, Jan 11, 2017 10:19am -08:00 #xray #p3k #indieweb #100daysofindieweb #100daysofcode
  • Day 21: Twitter Support for XRay #100DaysOfIndieWeb

    Continuing yesterday's work, today I added support for parsing Twitter URLs to XRay.There were a couple tricks to make this work. I wanted to make sure that Tweets are always expanded to include the most data possible, and also wanted to avoid needing to make a bunch of HTTP requests. Scraping from the twitter.com website wasn't an option, since some of the data isn't available or would require additional HTTP calls to fetch. (For example I would have to fetch every t.co URL to expand them.) So I set to work using the Twitter API to fetch the tweets.
    continue reading...
    1 like 1 reply 3 mentions
    Tue, Jan 10, 2017 3:36pm -08:00 #100daysofindieweb #100daysofcode #indieweb #xray #twitter
  • Day 20: Instagram Support for XRay #100DaysOfIndieWeb

    XRay is my service that parses web pages and extracts information from them. Right now I mostly use it to parse comments, but now that I've been adding support for reposts, it's used there as well.Today I added support for XRay to extract data from Instagram URLs!This means anything that uses XRay will now return structured data when given an Instagram URL, just like how it parses h-entry and other Microformats. Unfortunately, Instagram does not provide timezone data for the published date, only a Unix timestamp. So if the photo is tagged at a location, then XRay will look up the appropriate timezone for that location and adjust the timezone of the published date accordingly!Here's what the parsed JSON looks like for this photo. Note that the timezone is set to East Coast because this photo was taken at MIT.{ "data":{ "type":"entry", "url":"https://www.instagram.com/p/BM4rGs-lApG/", "author":{ "type":"card", "name":"Aaron Parecki", "url":"http://aaronparecki.com/", "photo":"https://scontent.cdninstagram.com/t51.2885-19/s320x320/14240576_268350536897085_1129715662_a.jpg" }, "content":{ "text":"Here again" }, "photo":[ "https://scontent.cdninstagram.com/t51.2885-15/e35/14269001_1162908790471145_6084871298582839296_n.jpg?ig_cache_key=MTM4NTA0NjQ2MjAyNzc5NTAxNA%3D%3D.2" ], "location":[ "https://www.instagram.com/explore/locations/206258876/" ], "published":"2016-11-16T16:07:06-05:00" }, "refs":{ "https://www.instagram.com/explore/locations/206258876/":{ "type":"card", "name":"Massachusetts Institute of Technology (MIT)", "url":"https://www.instagram.com/explore/locations/206258876/", "latitude":42.360011410484, "longitude":-71.091869836761 } }}In addition to my website using this for reposts and comments, when I paste that URL into IRC, Loqi uses XRay to expand it and provide a little text preview.
    continue reading...
    1 like 1 reply 4 mentions
    Mon, Jan 9, 2017 9:25am -08:00 #100daysofindieweb #100daysofcode #indieweb #xray #instagram
  • Day 2: Handling URLs with Fragment IDs #100DaysOfIndieWeb

    Earlier this year when I launched XRay, I connected Loqi the IRC bot to it so that we would get inline IRC text previews when people linked to web pages in IRC. XRay works by finding an h-entry on the page, and getting the content and author information from it. Here's what it normally looks like in IRC.
    continue reading...
    2 likes 1 mention
    Thu, Dec 22, 2016 8:47am -08:00 #100daysofindieweb #indieweb #xray #quill #microformats
  • Aaron Parecki
    Root canal complete, with temporary filling. Considering how to find an RFID chip small enough to fit under the crown I'm going to get. #cyborg #dentist #tooth #xray
    8 likes 5 replies
    Wed, Jan 7, 2015 9:41am -08:00 #xray #cyborg #dentist #tooth
older

Hi, I'm Aaron Parecki, Director of Identity Standards at Okta, and co-founder of IndieWebCamp. I maintain oauth.net, write and consult about OAuth, and participate in the OAuth Working Group at the IETF. I also help people learn about video production and livestreaming. (detailed bio)

I've been tracking my location since 2008 and I wrote 100 songs in 100 days. I've spoken at conferences around the world about owning your data, OAuth, quantified self, and explained why R is a vowel. Read more.

  • Director of Identity Standards at Okta
  • IndieWebCamp Founder
  • OAuth WG Editor
  • OpenID Board Member

  • 🎥 YouTube Tutorials and Reviews
  • 🏠 We're building a triplex!
  • ⭐️ Life Stack
  • ⚙️ Home Automation
  • All
  • Articles
  • Bookmarks
  • Notes
  • Photos
  • Replies
  • Reviews
  • Trips
  • Videos
  • Contact
© 1999-2025 by Aaron Parecki. Powered by p3k. This site supports Webmention.
Except where otherwise noted, text content on this site is licensed under a Creative Commons Attribution 3.0 License.
IndieWebCamp Microformats Webmention W3C HTML5 Creative Commons
WeChat ID
aaronpk_tv