84°F

Aaron Parecki

  • Articles
  • Notes
  • Photos
  • Obfuscating Emails on Websites

    February 5, 2010

    @twaddington sparked a Twitter debate when he tweeted

    Why do people still insist on posting their emails as "something [at] domain [dot] com?" #petpeve

    Below is the conversation that ensued.

    • twaddington: Why do people still insist on posting their emails as "something [at] domain [dot] com?" #petpeve
    • nickcummings: @twaddington I'm sure most crawlers can parse those sorts of things by now, so what's a more secure way to list an email address online?
    • aaronpk: @twaddington Agreed. I stopped doing that and other obfuscation techniques when I started forwarding everything to Gmail.
    • aaronpk: @nickcummings I've seen people write something like "email 'aaron' at the domain you're looking at"
    • nickcummings: @aaronpk That's what I've been doing on @sasquatchgaming, and so far we haven't received any spam. :)
    • nickcummings: @aaronpk To clarify: We're not doing the stupid ___ [at] ____.com thing. We've evolved past that. Take that, robots!
    • kchrist: @twaddington I'll stop doing that when spammers stop harvesting email addresses.
    • twaddington: .@kchrist you think a simple spider using a regex can't figure that out? @nickcummings I've never gotten spam from listing my email.
    • twaddington: Mainly it's a usability barrier. Links were designed to be clicked!
    • kchrist: @twaddington Not if there are hidden markup tags in the middle of it.
    • aaronpk: @kchrist can't hidden markup tags in an email address just be removed by a call to strip_tags() or equivalent? /cc @twaddington
    • lvidmar: @aaronpk Nobody ever claimed that spammers were geniuses. /cc @twaddington @kchrist
    • kchrist: @aaronpk They need to identify the email addr in the text first. "<span>user</span> at <span>domain</span>. com" doesn't match any pattern.
    • twaddington: @kchrist, @aaronpk, @lvidmar but have any of you ever received spam from publishing your email on your site?
    • aaronpk: @twaddington i'm not sure how much spam is a result of publishing my address, but gmail filters everything out 99.9% perfectly.
    • twaddington: @kchrist strpos("contact") or strpos("email") then strip_tags() then regex lookaround for something "at" somethingelse "tld" /cc @aaronpk
    • twaddington: @aaronpk I haven't gotten much spam at my new account since I switched. I think forums are a big target for email harvesting. /cc @kchrist

    Afterwards, I wrote a quick bit of PHP to test scraping email addresses from a website.

    <?php
    $plain = strip_tags($html);
    
    $plain = preg_replace(array('/&#64;/', '/\s+at\s+/', '/&#46;/', '/\s+dot\s+/'), array('@', '@', '.', '.'), $plain);
    
    if(preg_match_all('/[a-z0-9-_]+@[a-z0-9-_]+\.[a-z0-9]{2,4}/i', $plain, $matches))
    {
        print_r($matches);
    }
    ?>
    

    This works surprisingly well, and only needs a few additional cases put into the preg_replace line to match things like [at] instead of just "at".

    The point being that word-substitution-based obfuscation techniques are relatively easy to crack. Some better techniques are

    • hiding text in an image (only works on low-profile websites, otherwise it becomes a target for running OCR)
    • decoding the email address in javascript will prevent most spiders from finding the address (unless they run javascript too)
    • puzzle-based obfuscation such as "our email addresses are our first names at our domain name." would be very difficult to automatically find

    Alternatively, and what I end up doing, is I forward all my email to Gmail and let them sort out the spam. It is nearly 100% effective after a little bit of training. The only messages that end up falsely in spam now are emails from automatic scripts on my servers where I didn't set the proper headers.

    Fri, Feb 5, 2010 9:30pm -08:00
Posted in /articles

Hi, I'm Aaron Parecki, Director of Identity Standards at Okta, and co-founder of IndieWebCamp. I maintain oauth.net, write and consult about OAuth, and participate in the OAuth Working Group at the IETF. I also help people learn about video production and livestreaming. (detailed bio)

I've been tracking my location since 2008 and I wrote 100 songs in 100 days. I've spoken at conferences around the world about owning your data, OAuth, quantified self, and explained why R is a vowel. Read more.

  • Director of Identity Standards at Okta
  • IndieWebCamp Founder
  • OAuth WG Editor
  • OpenID Board Member

  • 🎥 YouTube Tutorials and Reviews
  • 🏠 We're building a triplex!
  • ⭐️ Life Stack
  • ⚙️ Home Automation
  • All
  • Articles
  • Bookmarks
  • Notes
  • Photos
  • Replies
  • Reviews
  • Trips
  • Videos
  • Contact
© 1999-2025 by Aaron Parecki. Powered by p3k. This site supports Webmention.
Except where otherwise noted, text content on this site is licensed under a Creative Commons Attribution 3.0 License.
IndieWebCamp Microformats Webmention W3C HTML5 Creative Commons
WeChat ID
aaronpk_tv