Aaron Parecki

#xray

XRay, the library that I use to parse URLs to show comments, now supports parsing direct Microformats JSON, ActivityStreams 2.0, as well as finding a rel=alternate link and parsing data from that instead!

This means I now get great results when parsing Mastodon or other ActivityPub links, and this is also the first step in what I hope will result in fixing the Microformats situation for WordPress, since a WordPress plugin will be able to generate Microformats JSON and advertise that in a rel=alternate link.

Next up is updating Aperture to take advantage of these new features!

Portland, Oregon, USA • 90°F

12 likes 6 reposts 2 replies

Mon, Jul 30, 2018 7:32pm -07:00 #activitypub #xray #microformats #p3k #indieweb
aaronpk https://github.com/aaronpk • Jan 12

#52 Remove images from posts containing a photo
Encountered two blockers working on this:

1) In a simple example of an img tag inside an e-content tag, the parsers are using the img tag as an implied photo property. This seems wrong to me. Example This means XRay sees a post like this as a photo post, and would remove the img tag from the content, which is definitely not the right thing to do.

<div class="h-entry"><p class="e-content p-name">Hello World <img src="example.jpg"></p></div>

{ "type": [ "h-entry" ], "properties": { "name": [ "Hello World http://example.com/example.jpg" ], "content": [ { "html": "Hello World <img src=\"http://example.com/example.jpg\">", "value": "Hello World http://example.com/example.jpg" } ], "photo": [ "http://example.com/example.jpg" ] } }

2) At the point that XRay is sanitizing the HTML value, the Microformats parser has already converted the HTML to plaintext.

For example, XRay sees this object and runs the HTML sanitizer on the HTML value:

{ "html": "Hello World <img src=\"http://example.com/example.jpg\">", "value": "Hello World http://example.com/example.jpg" }

This means I can't remove the img tag from the plaintext value since it's already been parsed. I think my only solution for this is going to be to create my own plaintext value out of the sanitized HTML. Unfortunately, that is not a straightforward process, as demonstrated by this relatively long function that does this in the PHP parser. However that might be the technically better option anyway, since XRay can't be sure exactly what method was used to generate the plaintext value from the original HTML anyway.
Portland, Oregon, USA • 49°F

Fri, Jan 12, 2018 7:32am -08:00 #xray
Feed Support for XRay (github.com)

Sat, Nov 11, 2017 1:16pm -08:00 #indieweb #xray
tantek https://github.com/tantek • Jun 22

#8 Need use-cases section

Some thoughts on the XRay and jf2 JSON formats

Since beginning the jf2 spec, I've continued developing XRay, and its format has diverged from the original jf2. Tonight I spent a while trying to reconcile the changes to submit a PR to the spec. I was unable to come up with a short PR, and instead got drawn in to thinking about the motivations behind a simpler mf2 JSON format to begin with.
continue reading...

2 likes 1 reply 2 mentions

Mon, Apr 24, 2017 8:59pm -07:00 #jf2 #xray #indieweb
Day 82: Switching to Let's Encrypt for XRay on App Engine #100DaysOfIndieWeb

A couple days ago, I switched most of my *.p3k.io domains over to individual Let's Encrypt certificates. It was relatively easy for the apps that are running on my main server. However, XRay is actually running on Google App Engine, which means my streamlined workflow for requesting and renewing certificates doesn't apply.
continue reading...

2 replies 2 mentions

Sun, Mar 12, 2017 10:28am -07:00 #100daysofindieweb #xray #letsencrypt
Day 37: Parsing h-recipe with XRay #100DaysOfIndieWeb

XRay now supports the h-recipe vocabulary!
continue reading...

2 mentions

Thu, Jan 26, 2017 11:20am -08:00 #100daysofindieweb #recipe #xray
Day 36: Parsing h-review with XRay #100DaysOfIndieWeb

Today I added the h-review vocabulary to XRay. This means you may now see objects of "type: review" show up when using XRay.
continue reading...

1 like 2 mentions

Wed, Jan 25, 2017 2:53pm -08:00 #100daysofindieweb #xray #mf2
Day 27: Parsing meta http-equiv and returning status code in XRay #100DaysOfIndieWeb

Today I closed a long-standing request on XRay to return the HTTP status code from the retrieved page, as well as parsing the <meta http-equiv="Status" content="410 Gone"> tag in the HTML. I also now return the final URL that XRay retrieved the document from, after following any HTTP redirects that were sent.
continue reading...

2 mentions

Mon, Jan 16, 2017 1:08pm -08:00 #100daysofindieweb #xray #indieweb
Week in Review #100DaysOfIndieWeb

aaronparecki.comDay 18: I updated my reposts to show the full contents of the post I reposted rather than just the URL.Day 19: I updated my website to automatically fetch the contents of my reposted URLs when I make new reposts.Day 24: I updated my reply posts to be able to show the full contents of the post I'm replying to.Day 25: I updated my website to automatically fetch the contents of the posts I reply to.XRayDay 20: I added Instagram support to XRay, so now XRay returns data when given Instagram URLs.Day 21: I added Twitter support to XRay, although you need to pass your own OAuth keys to XRay in order for it to fetch tweets.Day 22: I updated a few things in XRay to make it easier to deploy to shared hosting, and simplified its dependencies.LibrariesDay 23: I published my timezone lookup tools as a standalone library, and updated XRay and Quill to use the library instead of the duplicated class.
continue reading...

Sat, Jan 14, 2017 10:48am -08:00 #100daysofindieweb #indieweb #xray #p3k
Day 22: XRay Ready for Deployment #100DaysOfIndieWeb

Today I made a few changes to XRay to make it easier to deploy in more kinds of environments. I also removed a bunch of CSS/JS dependencies and simplified the UI a bit.
continue reading...

3 mentions

Wed, Jan 11, 2017 10:19am -08:00 #xray #p3k #indieweb #100daysofindieweb #100daysofcode
Day 21: Twitter Support for XRay #100DaysOfIndieWeb

Continuing yesterday's work, today I added support for parsing Twitter URLs to XRay.There were a couple tricks to make this work. I wanted to make sure that Tweets are always expanded to include the most data possible, and also wanted to avoid needing to make a bunch of HTTP requests. Scraping from the twitter.com website wasn't an option, since some of the data isn't available or would require additional HTTP calls to fetch. (For example I would have to fetch every t.co URL to expand them.) So I set to work using the Twitter API to fetch the tweets.
continue reading...

1 like 1 reply 3 mentions

Tue, Jan 10, 2017 3:36pm -08:00 #100daysofindieweb #100daysofcode #indieweb #xray #twitter
Day 20: Instagram Support for XRay #100DaysOfIndieWeb

XRay is my service that parses web pages and extracts information from them. Right now I mostly use it to parse comments, but now that I've been adding support for reposts, it's used there as well.Today I added support for XRay to extract data from Instagram URLs!This means anything that uses XRay will now return structured data when given an Instagram URL, just like how it parses h-entry and other Microformats. Unfortunately, Instagram does not provide timezone data for the published date, only a Unix timestamp. So if the photo is tagged at a location, then XRay will look up the appropriate timezone for that location and adjust the timezone of the published date accordingly!Here's what the parsed JSON looks like for this photo. Note that the timezone is set to East Coast because this photo was taken at MIT.{ "data":{ "type":"entry", "url":"https://www.instagram.com/p/BM4rGs-lApG/", "author":{ "type":"card", "name":"Aaron Parecki", "url":"http://aaronparecki.com/", "photo":"https://scontent.cdninstagram.com/t51.2885-19/s320x320/14240576_268350536897085_1129715662_a.jpg" }, "content":{ "text":"Here again" }, "photo":[ "https://scontent.cdninstagram.com/t51.2885-15/e35/14269001_1162908790471145_6084871298582839296_n.jpg?ig_cache_key=MTM4NTA0NjQ2MjAyNzc5NTAxNA%3D%3D.2" ], "location":[ "https://www.instagram.com/explore/locations/206258876/" ], "published":"2016-11-16T16:07:06-05:00" }, "refs":{ "https://www.instagram.com/explore/locations/206258876/":{ "type":"card", "name":"Massachusetts Institute of Technology (MIT)", "url":"https://www.instagram.com/explore/locations/206258876/", "latitude":42.360011410484, "longitude":-71.091869836761 } }}In addition to my website using this for reposts and comments, when I paste that URL into IRC, Loqi uses XRay to expand it and provide a little text preview.
continue reading...

1 like 1 reply 4 mentions

Mon, Jan 9, 2017 9:25am -08:00 #100daysofindieweb #100daysofcode #indieweb #xray #instagram
Day 2: Handling URLs with Fragment IDs #100DaysOfIndieWeb

Earlier this year when I launched XRay, I connected Loqi the IRC bot to it so that we would get inline IRC text previews when people linked to web pages in IRC. XRay works by finding an h-entry on the page, and getting the content and author information from it. Here's what it normally looks like in IRC.
continue reading...

2 likes 1 mention

Thu, Dec 22, 2016 8:47am -08:00 #100daysofindieweb #indieweb #xray #quill #microformats
Root canal complete, with temporary filling. Considering how to find an RFID chip small enough to fit under the crown I'm going to get. #cyborg #dentist #tooth #xray

8 likes 5 replies

Wed, Jan 7, 2015 9:41am -08:00 #xray #cyborg #dentist #tooth

older