Skip to content

Instantly share code, notes, and snippets.

@acslater00
Last active August 29, 2015 14:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save acslater00/0e61554a9bddad9c3df0 to your computer and use it in GitHub Desktop.
Save acslater00/0e61554a9bddad9c3df0 to your computer and use it in GitHub Desktop.
Bot Traffic

First, most bot traffic doesn't use a browser, it just makes raw web requests to grab whatever content it is interested in. This means it does not request javascript or images, and as a result, most analytics packages use javascript or tiny fake images to track traffic/impressions. This is first & most obvious line of defense.

Second is something called a user agent, which identifies the source of traffic via a label. This is easy to fake -- I can make a bot request with a user agent that looks like an iphone safari browser, but much bot traffic doesn't bother.

Those two things catch ~all cases where bot is not trying to hide the fact that it is a bot. The alternative is surprisingly rare, at least in my experience. Beyond that you can do things like analyze user behavior to look for unusual patterns (e.g. hitting every article on a site exactly once) to get the rest.

Not perfect, but not something we worry about on a daily basis in the industry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment