Skip to content

Instantly share code, notes, and snippets.

@magnetikonline
Created November 4, 2012 04:31
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save magnetikonline/4010244 to your computer and use it in GitHub Desktop.
Save magnetikonline/4010244 to your computer and use it in GitHub Desktop.
Webalizer IgnoreAgent/SearchEngine rule sets conf - 2012-11
IgnoreAgent Aboundex/*
IgnoreAgent AdsBot-Google*
IgnoreAgent Aghaven/*
IgnoreAgent Alltop/*
IgnoreAgent AppEngine-Google*
IgnoreAgent Apple-PubSub/*
IgnoreAgent AppleSyndication/*
IgnoreAgent BacklinkCrawler*
IgnoreAgent Baiduspider*
IgnoreAgent bitlybot
IgnoreAgent blogged_crawl/*
IgnoreAgent Bloglovin/*
IgnoreAgent BSCOPES_AGENT
IgnoreAgent CatchBot/*
IgnoreAgent CCBot/*
IgnoreAgent Content Crawler
IgnoreAgent Covario-IDS/1.0*
IgnoreAgent DoCoMo/*
IgnoreAgent DomainCrawler/*
IgnoreAgent Domnutch-Bot/*
IgnoreAgent EC2LinkFinder
IgnoreAgent eCairn-Grabber/*
IgnoreAgent EdisterBot*
IgnoreAgent Eurobot/*
IgnoreAgent facebookexternalhit/*
IgnoreAgent facebookplatform/*
IgnoreAgent Feedfetcher-Google*
IgnoreAgent Feedshow/*
IgnoreAgent findfiles.net/*
IgnoreAgent findlinks/*
IgnoreAgent Gigabot/*
IgnoreAgent Gist Server
IgnoreAgent Google-Site-Verification/*
IgnoreAgent Googlebot*
IgnoreAgent Gootkit auto-rooter scanner
IgnoreAgent GSLFbot
IgnoreAgent HolmesBot (http://holmes.ge)
IgnoreAgent Huaweisymantecspider*
IgnoreAgent HuaweiSymantecSpider/*
IgnoreAgent ia_archiver*
IgnoreAgent ichiro/*
IgnoreAgent intelium_bot
IgnoreAgent InternetSeer.com
IgnoreAgent Jakarta Commons-HttpClient/*
IgnoreAgent Java*
IgnoreAgent JumbleBot/*
IgnoreAgent LexxeBot/*
IgnoreAgent librabot/*
IgnoreAgent libwww-perl/*
IgnoreAgent Linguee Bot*
IgnoreAgent linkdex.com/*
IgnoreAgent LinkedInBot/*
IgnoreAgent LinksManager.com_bot
IgnoreAgent LSSRocketCrawler/*
IgnoreAgent Made by ZmEu*
IgnoreAgent magpie-crawler/*
IgnoreAgent MagpieRSS/*
IgnoreAgent MetaURI API/*
IgnoreAgent Microsoft Data Access Internet Publishing Provider DAV 1.1
IgnoreAgent Microsoft Office Protocol Discovery
IgnoreAgent Microsoft URL Control*
IgnoreAgent MLBot*
IgnoreAgent Mozilla/4.0 (compatible; Vagabondo/*
IgnoreAgent Mozilla/4.0 (compatible; http://search.thunderstone.com/texis/websearch/about.html)
IgnoreAgent Mozilla/5.0 (compatible; 008/*
IgnoreAgent Mozilla/5.0 (compatible; AhrefsBot/*
IgnoreAgent Mozilla/5.0 (compatible; aiHitBot-BP/*
IgnoreAgent Mozilla/5.0 (compatible; aiHitBot/*
IgnoreAgent Mozilla/5.0 (compatible; ApptusBot/*
IgnoreAgent Mozilla/5.0 (compatible; archive.org_bot*
IgnoreAgent Mozilla/5.0 (compatible; Ask Jeeves/*
IgnoreAgent Mozilla/5.0 (compatible; Baiduspider/*
IgnoreAgent Mozilla/5.0 (compatible; bingbot/*
IgnoreAgent Mozilla/5.0 (compatible; Birubot/*
IgnoreAgent Mozilla/5.0 (compatible; Blekkobot*
IgnoreAgent Mozilla/5.0 (compatible; BlogScope/*
IgnoreAgent Mozilla/5.0 (compatible; Butterfly/*
IgnoreAgent Mozilla/5.0 (compatible; DCPbot/*
IgnoreAgent Mozilla/5.0 (compatible; discobot/*
IgnoreAgent Mozilla/5.0 (compatible; discoverybot/*
IgnoreAgent Mozilla/5.0 (compatible; DotBot/*
IgnoreAgent Mozilla/5.0 (compatible; Embedly/*
IgnoreAgent Mozilla/5.0 (compatible; Exabot*
IgnoreAgent Mozilla/5.0 (compatible; Ezooms/*
IgnoreAgent Mozilla/5.0 (compatible; Funnelback*
IgnoreAgent Mozilla/5.0 (compatible; Genieo/*
IgnoreAgent Mozilla/5.0 (compatible; Google Desktop/*
IgnoreAgent Mozilla/5.0 (compatible; Googlebot/*
IgnoreAgent Mozilla/5.0 (compatible; heritrix/*
IgnoreAgent Mozilla/5.0 (compatible; JikeSpider*
IgnoreAgent Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (like Gecko) (Exabot-Thumbnails)
IgnoreAgent Mozilla/5.0 (compatible; lemurwebcrawler*
IgnoreAgent Mozilla/5.0 (compatible; LinksManager.com_bot*
IgnoreAgent Mozilla/5.0 (compatible; Linux; Socialradarbot/*
IgnoreAgent Mozilla/5.0 (compatible; Lipperhey*
IgnoreAgent Mozilla/5.0 (compatible; ltbot/*
IgnoreAgent Mozilla/5.0 (compatible; MJ12bot/*
IgnoreAgent Mozilla/5.0 (compatible; MSIE 6.0b; Windows NT 5.0) Gecko/2009011913 Firefox/3.0.6 TweetmemeBot
IgnoreAgent Mozilla/5.0 (compatible; MSIE or Firefox mutant*
IgnoreAgent Mozilla/5.0 (compatible; NerdByNature.Bot*
IgnoreAgent Mozilla/5.0 (compatible; oBot/*
IgnoreAgent Mozilla/5.0 (compatible; OpenindexDeepSpider/*
IgnoreAgent Mozilla/5.0 (compatible; OpenindexShallowSpider/*
IgnoreAgent Mozilla/5.0 (compatible; PaperLiBot/*
IgnoreAgent Mozilla/5.0 (compatible; Plukkie/*
IgnoreAgent Mozilla/5.0 (compatible; PrintfulBot/*
IgnoreAgent Mozilla/5.0 (compatible; ProCogBot/*
IgnoreAgent Mozilla/5.0 (compatible; Purebot/*
IgnoreAgent Mozilla/5.0 (compatible; ScoutJet*
IgnoreAgent Mozilla/5.0 (compatible; Search17Bot/*
IgnoreAgent Mozilla/5.0 (compatible; SEOkicks-Robot*
IgnoreAgent Mozilla/5.0 (compatible; Seznam*
IgnoreAgent Mozilla/5.0 (compatible; sindice-fetcher/*
IgnoreAgent Mozilla/5.0 (compatible; SISTRIX Crawler; http://crawler.sistrix.net/)
IgnoreAgent Mozilla/5.0 (compatible; SiteBot/*
IgnoreAgent Mozilla/5.0 (compatible; spbot/*
IgnoreAgent Mozilla/5.0 (compatible; SpiderLing*
IgnoreAgent Mozilla/5.0 (compatible; SWEBot/*
IgnoreAgent Mozilla/5.0 (compatible; TweetedTimes Bot/*
IgnoreAgent Mozilla/5.0 (compatible; TweetmemeBot/*
IgnoreAgent Mozilla/5.0 (compatible; Voluniabot/*
IgnoreAgent Mozilla/5.0 (compatible; WBSearchBot/*
IgnoreAgent Mozilla/5.0 (compatible; woriobot*
IgnoreAgent Mozilla/5.0 (compatible; XML Sitemaps Generator*
IgnoreAgent Mozilla/5.0 (compatible; Yahoo*
IgnoreAgent Mozilla/5.0 (compatible; Yandex*
IgnoreAgent Mozilla/5.0 (compatible; YodaoBot/*
IgnoreAgent Mozilla/5.0 (compatible; YoudaoBot/*
IgnoreAgent Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_1 like Mac OS X; en-us) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8B117 Safari/6531.22.7 (compatible; Googlebot-Mobile/*
IgnoreAgent Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) Speedy Spider*
IgnoreAgent Mozilla/5.0 (Windows; U; Windows NT 5.1; en; rv:1.9.0.13) Gecko/2009073022 Firefox/3.5.2 (.NET CLR 3.5.30729) SurveyBot/*
IgnoreAgent Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.1) VoilaBot BETA 1.2 (support.voilabot@orange-ftgroup.com)
IgnoreAgent Mozilla/5.0 (Yahoo-MMCrawler/*
IgnoreAgent Mozilla/5.0(compatible; Sosospider/*
IgnoreAgent Mozilla/5.0+(compatible;+PiplBot*
IgnoreAgent msnbot*
IgnoreAgent NetNewsWire/*
IgnoreAgent NetworkedBlogs*
IgnoreAgent NextGenSearchBot*
IgnoreAgent nutch*
IgnoreAgent Nutraspace/*
IgnoreAgent OctoBot/*
IgnoreAgent OPENSEEMOX*
IgnoreAgent Outlook*
IgnoreAgent panscient.com
IgnoreAgent PHP/*
IgnoreAgent PostPost*
IgnoreAgent PostRank/*
IgnoreAgent psbot/*
IgnoreAgent PycURL/*
IgnoreAgent Python-urllib/*
IgnoreAgent quickobot/*
IgnoreAgent R6_CommentReader*
IgnoreAgent R6_FeedFetcher*
IgnoreAgent radian6_default_(www.radian6.com/crawler)
IgnoreAgent SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; Googlebot-Mobile/*
IgnoreAgent SBIder/*
IgnoreAgent SE/SE-0.1 (Aussie Search Spider*
IgnoreAgent SemrushBot/*
IgnoreAgent SeznamBot/*
IgnoreAgent SimplePie/*
IgnoreAgent SiteSnagger
IgnoreAgent Sogou*
IgnoreAgent SolomonoBot/*
IgnoreAgent SolomonoLinkChecker/*
IgnoreAgent Sosospider*
IgnoreAgent Summify (Summify/*
IgnoreAgent Superfeedr: Superparser bot/*
IgnoreAgent TestNutch/*
IgnoreAgent Trapit/*
IgnoreAgent TurnitinBot/*
IgnoreAgent TwengaBot*
IgnoreAgent Twitterbot/*
IgnoreAgent UniversalFeedParser/*
IgnoreAgent UnwindFetchor/*
IgnoreAgent Web Crawler*
IgnoreAgent Wget/*
IgnoreAgent Windows-RSS-Platform/*
IgnoreAgent woobot/*
IgnoreAgent WordPress/*
IgnoreAgent Wotbox/*
IgnoreAgent www.webwombat.com.au
IgnoreAgent Xenu Link Sleuth/*
IgnoreAgent Xydo*
IgnoreAgent yacybot*
IgnoreAgent Yahoo*
IgnoreAgent Yandex/*
IgnoreAgent Yeti/*
IgnoreAgent ZmEu
IgnoreAgent ZumBot/*
SearchEngine ask.com q=
SearchEngine bing. q=
SearchEngine google. q=
SearchEngine image.youdao.com q=
SearchEngine m.yahoo. p=
SearchEngine search-results.com q=
SearchEngine search.alot. q=
SearchEngine search.aol. q=
SearchEngine search.avg.com q=
SearchEngine search.comcast.net q=
SearchEngine search.conduit. q=
SearchEngine search.yahoo. p=
SearchEngine webcache.googleusercontent.com q=
SearchEngine yandex.ru text=
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment