berlinbrown/web_crawler_funblo1

## web_crawler_funblo1
Octane Crawler is a fun/safe/friendly crawler.  I am barelling/requesting at about 10-15 seconds a host.  So, I am gathering about 100 requests a day.

mysql> select count(1) from bot_crawler_links;
+----------+
| count(1) |
+----------+
|     4746 |
+----------+
1 row in set (0.01 sec)

More Notes:
http://berlin2research.com/
http://code.google.com/p/octane-crawler/

Here are some of the more popular links:

| blogs.detroitnews.com                     |    13 |
| videocafe.crooksandliars.com              |    14 |
| whos.amung.us                             |    14 |
| www.realclearworld.com                    |    15 |
| supremecourt.c-span.org                   |    15 |
| www.bbc.co.uk                             |    17 |
| www.townhall.com                          |    19 |
| diversity.mit.edu                         |    21 |
| blueamerica.crooksandliars.com            |    27 |
| www.realclearreligion.org                 |    28 |
| www.wikidot.com                           |    28 |
| blogs.reuters.com                         |    29 |
| www.marco.org                             |    31 |
| www.edx.org                               |    32 |
| www.detroitnews.com                       |    32 |
| www.blogger.com                           |    33 |
| www.realclearpolitics.com                 |    33 |
| npr.org                                   |    33 |
| www.abcnews.com                           |    34 |
| www.wendymcelroy.com                      |    35 |
| www.publicagenda.org                      |    38 |
| creativecommons.org                       |    38 |
| www.hlntv.com                             |    40 |
| www.foxbusiness.com                       |    40 |
| ureport.foxnews.com                       |    41 |
| techcrunch.com                            |    43 |
| www.c-spanvideo.org                       |    43 |
| www.npr.org                               |    45 |
| www.africanews.com                        |    46 |
| reuters.com                               |    47 |
| web.mit.edu                               |    48 |
| www.japantoday.com                        |    52 |
| www.wired.com                             |    52 |
| wiki.creativecommons.org                  |    52 |
| latino.foxnews.com                        |    53 |
| news.bbc.co.uk                            |    53 |
| cspan.org                                 |    54 |
| bloomberg.com                             |    54 |
| www.deadline.com                          |    56 |
| cnn.com                                   |    56 |
| blog.markwatson.com                       |    57 |
| mises.org                                 |    59 |
| www.huffingtonpost.com                    |    59 |
| wordpress.org                             |    60 |
| www.economist.com                         |    65 |
| ocw.mit.edu                               |    66 |
| www.johnthavis.com                        |    68 |
| www.newscientist.com                      |    73 |
| www.anncoulter.com                        |    76 |
| www.foxnews.com                           |    78 |
| www.hooktheory.com                        |    79 |
| www.amazon.com                            |    85 |
| cdn.breitbart.com                         |    91 |
| jamescarlin.wikidot.com                   |    93 |
| betterimmigration.com                     |    93 |
| www.theverge.com                          |   101 |
| www.nytimes.com                           |   102 |
| abcnews.go.com                            |   103 |
| crooksandliars.com                        |   104 |
| www.c-span.org                            |   106 |
| www.guardian.co.uk                        |   119 |
| dailyanarchist.com                        |   129 |
| www.breitbart.com                         |   285 |
| www.usatoday.com                          |   340 |
+-------------------------------------------+-------+
578 rows in set (0.01 sec)
	Octane Crawler is a fun/safe/friendly crawler. I am barelling/requesting at about 10-15 seconds a host. So, I am gathering about 100 requests a day.

	mysql> select count(1) from bot_crawler_links;
	+----------+
	\| count(1) \|
	+----------+
	\| 4746 \|
	+----------+
	1 row in set (0.01 sec)

	More Notes:
	http://berlin2research.com/
	http://code.google.com/p/octane-crawler/

	Here are some of the more popular links:

	\| blogs.detroitnews.com \| 13 \|
	\| videocafe.crooksandliars.com \| 14 \|
	\| whos.amung.us \| 14 \|
	\| www.realclearworld.com \| 15 \|
	\| supremecourt.c-span.org \| 15 \|
	\| www.bbc.co.uk \| 17 \|
	\| www.townhall.com \| 19 \|
	\| diversity.mit.edu \| 21 \|
	\| blueamerica.crooksandliars.com \| 27 \|
	\| www.realclearreligion.org \| 28 \|
	\| www.wikidot.com \| 28 \|
	\| blogs.reuters.com \| 29 \|
	\| www.marco.org \| 31 \|
	\| www.edx.org \| 32 \|
	\| www.detroitnews.com \| 32 \|
	\| www.blogger.com \| 33 \|
	\| www.realclearpolitics.com \| 33 \|
	\| npr.org \| 33 \|
	\| www.abcnews.com \| 34 \|
	\| www.wendymcelroy.com \| 35 \|
	\| www.publicagenda.org \| 38 \|
	\| creativecommons.org \| 38 \|
	\| www.hlntv.com \| 40 \|
	\| www.foxbusiness.com \| 40 \|
	\| ureport.foxnews.com \| 41 \|
	\| techcrunch.com \| 43 \|
	\| www.c-spanvideo.org \| 43 \|
	\| www.npr.org \| 45 \|
	\| www.africanews.com \| 46 \|
	\| reuters.com \| 47 \|
	\| web.mit.edu \| 48 \|
	\| www.japantoday.com \| 52 \|
	\| www.wired.com \| 52 \|
	\| wiki.creativecommons.org \| 52 \|
	\| latino.foxnews.com \| 53 \|
	\| news.bbc.co.uk \| 53 \|
	\| cspan.org \| 54 \|
	\| bloomberg.com \| 54 \|
	\| www.deadline.com \| 56 \|
	\| cnn.com \| 56 \|
	\| blog.markwatson.com \| 57 \|
	\| mises.org \| 59 \|
	\| www.huffingtonpost.com \| 59 \|
	\| wordpress.org \| 60 \|
	\| www.economist.com \| 65 \|
	\| ocw.mit.edu \| 66 \|
	\| www.johnthavis.com \| 68 \|
	\| www.newscientist.com \| 73 \|
	\| www.anncoulter.com \| 76 \|
	\| www.foxnews.com \| 78 \|
	\| www.hooktheory.com \| 79 \|
	\| www.amazon.com \| 85 \|
	\| cdn.breitbart.com \| 91 \|
	\| jamescarlin.wikidot.com \| 93 \|
	\| betterimmigration.com \| 93 \|
	\| www.theverge.com \| 101 \|
	\| www.nytimes.com \| 102 \|
	\| abcnews.go.com \| 103 \|
	\| crooksandliars.com \| 104 \|
	\| www.c-span.org \| 106 \|
	\| www.guardian.co.uk \| 119 \|
	\| dailyanarchist.com \| 129 \|
	\| www.breitbart.com \| 285 \|
	\| www.usatoday.com \| 340 \|
	+-------------------------------------------+-------+
	578 rows in set (0.01 sec)