Skip to content

Instantly share code, notes, and snippets.

@aschweer
Created September 23, 2011 03:12
Show Gist options
  • Save aschweer/1236662 to your computer and use it in GitHub Desktop.
Save aschweer/1236662 to your computer and use it in GitHub Desktop.
DSpace org.dspace.statistics.util.SpiderDetector snippet to exclude more bots
/**
* Static Service Method for testing spiders against existing spider files.
* <p/>
* In the future this will be extended to support User Agent and
* domain Name detection.
* <p/>
* In future spiders HashSet may be optimized as byte offset array to
* improve performance and memory footprint further.
*
* @param request
* @return true|false if the request was detected to be from a spider
*/
public static boolean isSpider(HttpServletRequest request) {
// LCoNZ customisation: check whether user agent contains 'bot' or 'crawler' (case insensitive)
if (request.getHeader("User-Agent") != null)
{
String userAgent = request.getHeader("User-Agent").toLowerCase();
if (userAgent.contains("bot") || userAgent.contains("crawl") || userAgent.contains("spider"))
{
return true;
}
}
if (SolrLogger.isUseProxies() && request.getHeader("X-Forwarded-For") != null) {
/* This header is a comma delimited list */
for (String xfip : request.getHeader("X-Forwarded-For").split(",")) {
if (isSpider(xfip))
{
return true;
}
}
}
return isSpider(request.getRemoteAddr());
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment