Since webmention is just a simplified version of pingback, it is just as susceptible to the same type of spam as pingback. With webmention, we have the advantage of it not being a huge spam target just yet, but it's only a matter of time. To help get a head start on fighting the coming spam storm, a few of us wanted to try creating webmention endpoints that expire after a short period of time, to see if that helped mitigate spam.
I implemented expiring webmention endpoints on my website in September 2014. The way this worked is it would generate a token that encoded the expiration date and include that as a query string parameter to the webmention endpoint. The endpoints expired 5 minutes after they were fetched, so if you wanted to send a webmention to one of my pages, you would do the endpoint discovery to find the endpoint, and as long as you made the webmention request within 5 minutes, it would be accepted.
I use webmention.io to forward pingbacks to my site as webmentions, but it uses the same expiring endpoint technique, since the forwarding target is an expiring endpoint. As such, my stats are a mix of pingbacks and webmentions.
I implemented logging of all requests so I could see how effective this technique was. I was logging the time of the webmention and the number of seconds until that specific endpoint expired. Over the past 6 months, I found that 70% of all webmentions (valid or not) were sent within 10 seconds of fetching the endpoint, and 87% were sent within 60 seconds.
Of all the webmentions I received, 51% were successful, 30% were "no mention found" (the most common type of spam request sent), 9% were sent to expired endpoints, 4% were sent with an invalid source URL (usually plain text such as when someone mistakes my webmention form for a comment box) and the rest were other errors.
Out of all the failed webmentions (either an invalid source URL was sent, or there was no link found), 53% of them were sent to a valid endpoint, and the other 47% were sent to an expired endpoint.
Of all the requests sent to endpoints that had expired, 50% of them were sent when the endpoint was less than an hour old. 18% were sent between 1-3 hours after it expired, and the rest were sent more than 3 hours after expiration. However, the distribution of failed requests within the first hour is relatively flat. This means I would need to keep the expiration time relatively small, like 5-10 minutes in order to be effective.
Ultimately the expiring endpoints seemed to have prevented processing about half of the spam webmentions I received. If I had not had the expiring endpoint, my site would have gone and fetched the source URL to look for a link to my site, and it would have been rejected at that point. If I were getting an extremely high volume of webmentions, this might actually be useful, since it would prevent a lot of unnecessary network traffic.
Recent Pingback Spam
Starting on Friday April 17th, I started seeing a large increase in the number of spam pingbacks in my logs. (These were pingbacks forwarded to my webmention endpoint using webmention.io). I was getting these at a rate of 1-2 per hour, whereas before it was maybe 1-2 per day. Looking at the stats from the expiring endpoint, 75% were sent within 10 seconds of getting the endpoint, and all were sent within 90 seconds of getting the endpoint. None at all were sent to an endpoint that had expired.
Since all of my recent spam has been sent to non-expired endpoints, and since I don't get a high number of webmentions right now, I decided to remove the expiring endpoints from my site for the time being. It was a lot of code to deal with validating and logging expired endpoints, so removing it simplified the code a great deal. We'll continue to brainstorm and test other spam prevention techniques, documenting them on the indiewebcamp wiki.