Skip to content

Instantly share code, notes, and snippets.

@pypt
Created February 5, 2020 00:41
Show Gist options
  • Save pypt/d117d1677f973d0e882b170197e7295f to your computer and use it in GitHub Desktop.
Save pypt/d117d1677f973d0e882b170197e7295f to your computer and use it in GitHub Desktop.
test_tm_mine.t log
./dev/run_test.py apps/topics-mine/tests/perl/test_tm_mine.t ✭master
WARNING: The MC_CRIMSON_HEXAGON_API_KEY variable is not set. Defaulting to a blank string.
WARNING: The MC_TWITTER_CONSUMER_KEY variable is not set. Defaulting to a blank string.
WARNING: The MC_TWITTER_CONSUMER_SECRET variable is not set. Defaulting to a blank string.
WARNING: The MC_TWITTER_ACCESS_TOKEN variable is not set. Defaulting to a blank string.
WARNING: The MC_TWITTER_ACCESS_TOKEN_SECRET variable is not set. Defaulting to a blank string.
/opt/mediacloud/tests/perl/test_tm_mine.t .. main: starting hash server 0
main: starting hash server 1
main: starting hash server 2
main: starting hash server 3
main: starting hash server 4
main: testing pages for site 0
172.30.0.9 - - [04/Feb/2020 19:39:04] "GET / HTTP/1.1" 200 -
/opt/mediacloud/tests/perl/test_tm_mine.t .. 1/? 172.30.0.9 - - [04/Feb/2020 19:39:04] "GET /page-0 HTTP/1.1" 200 -
172.30.0.9 - - [04/Feb/2020 19:39:04] "GET /page-1 HTTP/1.1" 200 -
172.30.0.9 - - [04/Feb/2020 19:39:04] "GET /page-2 HTTP/1.1" 200 -
172.30.0.9 - - [04/Feb/2020 19:39:04] "GET /page-3 HTTP/1.1" 200 -
172.30.0.9 - - [04/Feb/2020 19:39:04] "GET /page-4 HTTP/1.1" 200 -
172.30.0.9 - - [04/Feb/2020 19:39:04] "GET /page-5 HTTP/1.1" 200 -
172.30.0.9 - - [04/Feb/2020 19:39:04] "GET /page-6 HTTP/1.1" 200 -
172.30.0.9 - - [04/Feb/2020 19:39:04] "GET /page-7 HTTP/1.1" 200 -
main: testing pages for site 1
172.30.0.9 - - [04/Feb/2020 19:39:04] "GET / HTTP/1.1" 200 -
172.30.0.9 - - [04/Feb/2020 19:39:04] "GET /page-0 HTTP/1.1" 200 -
172.30.0.9 - - [04/Feb/2020 19:39:04] "GET /page-1 HTTP/1.1" 200 -
172.30.0.9 - - [04/Feb/2020 19:39:04] "GET /page-2 HTTP/1.1" 200 -
172.30.0.9 - - [04/Feb/2020 19:39:04] "GET /page-3 HTTP/1.1" 200 -
172.30.0.9 - - [04/Feb/2020 19:39:04] "GET /page-4 HTTP/1.1" 200 -
172.30.0.9 - - [04/Feb/2020 19:39:04] "GET /page-5 HTTP/1.1" 200 -
172.30.0.9 - - [04/Feb/2020 19:39:04] "GET /page-6 HTTP/1.1" 200 -
main: testing pages for site 2
172.30.0.9 - - [04/Feb/2020 19:39:04] "GET / HTTP/1.1" 200 -
172.30.0.9 - - [04/Feb/2020 19:39:04] "GET /page-0 HTTP/1.1" 200 -
/opt/mediacloud/tests/perl/test_tm_mine.t .. 37/? 172.30.0.9 - - [04/Feb/2020 19:39:05] "GET /page-1 HTTP/1.1" 200 -
172.30.0.9 - - [04/Feb/2020 19:39:05] "GET /page-2 HTTP/1.1" 200 -
172.30.0.9 - - [04/Feb/2020 19:39:05] "GET /page-3 HTTP/1.1" 200 -
172.30.0.9 - - [04/Feb/2020 19:39:05] "GET /page-4 HTTP/1.1" 200 -
main: testing pages for site 3
172.30.0.9 - - [04/Feb/2020 19:39:05] "GET / HTTP/1.1" 200 -
172.30.0.9 - - [04/Feb/2020 19:39:05] "GET /page-0 HTTP/1.1" 200 -
172.30.0.9 - - [04/Feb/2020 19:39:05] "GET /page-1 HTTP/1.1" 200 -
172.30.0.9 - - [04/Feb/2020 19:39:05] "GET /page-2 HTTP/1.1" 200 -
172.30.0.9 - - [04/Feb/2020 19:39:05] "GET /page-3 HTTP/1.1" 200 -
172.30.0.9 - - [04/Feb/2020 19:39:05] "GET /page-4 HTTP/1.1" 200 -
172.30.0.9 - - [04/Feb/2020 19:39:05] "GET /page-5 HTTP/1.1" 200 -
172.30.0.9 - - [04/Feb/2020 19:39:05] "GET /page-6 HTTP/1.1" 200 -
172.30.0.9 - - [04/Feb/2020 19:39:05] "GET /page-7 HTTP/1.1" 200 -
172.30.0.9 - - [04/Feb/2020 19:39:05] "GET /page-8 HTTP/1.1" 200 -
172.30.0.9 - - [04/Feb/2020 19:39:05] "GET /page-9 HTTP/1.1" 200 -
main: testing pages for site 4
172.30.0.9 - - [04/Feb/2020 19:39:05] "GET / HTTP/1.1" 200 -
172.30.0.9 - - [04/Feb/2020 19:39:05] "GET /page-0 HTTP/1.1" 200 -
172.30.0.9 - - [04/Feb/2020 19:39:05] "GET /page-1 HTTP/1.1" 200 -
main: seed url: http://b16fd965224f:8890/page-0
main: seed url: http://b16fd965224f:8890/page-1
main: seed url: http://b16fd965224f:8890/page-2
main: non seeded url: http://b16fd965224f:8890/page-3
main: non seeded url: http://b16fd965224f:8890/page-4
main: seed url: http://b16fd965224f:8890/page-5
main: non seeded url: http://b16fd965224f:8890/page-6
main: non seeded url: http://b16fd965224f:8890/page-7
main: seed url: http://b16fd965224f:8891/page-0
main: seed url: http://b16fd965224f:8891/page-1
main: non seeded url: http://b16fd965224f:8891/page-2
main: non seeded url: http://b16fd965224f:8891/page-3
main: seed url: http://b16fd965224f:8891/page-4
main: seed url: http://b16fd965224f:8891/page-5
main: seed url: http://b16fd965224f:8891/page-6
main: seed url: http://b16fd965224f:8892/page-0
main: non seeded url: http://b16fd965224f:8892/page-1
main: seed url: http://b16fd965224f:8892/page-2
main: seed url: http://b16fd965224f:8892/page-3
main: non seeded url: http://b16fd965224f:8892/page-4
main: seed url: http://b16fd965224f:8893/page-0
main: non seeded url: http://b16fd965224f:8893/page-1
main: seed url: http://b16fd965224f:8893/page-2
main: seed url: http://b16fd965224f:8893/page-3
main: non seeded url: http://b16fd965224f:8893/page-4
main: seed url: http://b16fd965224f:8893/page-5
main: seed url: http://b16fd965224f:8893/page-6
main: non seeded url: http://b16fd965224f:8893/page-7
main: seed url: http://b16fd965224f:8893/page-8
main: seed url: http://b16fd965224f:8893/page-9
main: seed url: http://b16fd965224f:8894/page-0
main: seed url: http://b16fd965224f:8894/page-1
INFO mediawords.util.mail: Test mode is enabled, not actually sending any email.
INFO mediawords.util.mail: Test mode is enabled, not actually sending any email.
MediaWords.TM.Mine: update topic state: importing seed urls
MediaWords.JobManager.AbstractStatefulJob: Not called from MediaWords::JobManager::AbstractStatefulJob::run
MediaWords.TM.Mine: update topic state: importing solr seed query
MediaWords.JobManager.AbstractStatefulJob: Not called from MediaWords::JobManager::AbstractStatefulJob::run
MediaWords.TM.Mine: import solr seed query
MediaWords.TM.Mine: update topic state: setting stories respidering...
MediaWords.JobManager.AbstractStatefulJob: Not called from MediaWords::JobManager::AbstractStatefulJob::run
MediaWords.TM.Mine: update topic state: importing seed urls
MediaWords.JobManager.AbstractStatefulJob: Not called from MediaWords::JobManager::AbstractStatefulJob::run
MediaWords.TM.Mine: import seed urls
MediaWords.TM.Mine: update topic state: importing seed urls: 0 / 32
MediaWords.JobManager.AbstractStatefulJob: Not called from MediaWords::JobManager::AbstractStatefulJob::run
MediaWords.TM.Mine: add_new_links_chunk: fetch_links
MediaWords.TM.Mine: fetch_links: queue links
MediaWords.JobManager.Broker.RabbitMQ: Connecting to RabbitMQ (hostname: rabbitmq-server, port: 5672, username: mediacloud)...
MediaWords.JobManager.Broker.RabbitMQ: Unable to connect to RabbitMQ, will retry: opening socket failed because AMQP socket connection was closed. at /opt/mediacloud/src/common/perl/MediaWords/JobManager/Broker/RabbitMQ.pm line 154.
MediaWords.JobManager.Broker.RabbitMQ: Retrying #1...
MediaWords.JobManager.Broker.RabbitMQ: Unable to connect to RabbitMQ, will retry: opening socket failed because AMQP socket connection was closed. at /opt/mediacloud/src/common/perl/MediaWords/JobManager/Broker/RabbitMQ.pm line 154.
MediaWords.JobManager.Broker.RabbitMQ: Retrying #2...
MediaWords.JobManager.Broker.RabbitMQ: Unable to connect to RabbitMQ, will retry: opening socket failed because AMQP socket connection was closed. at /opt/mediacloud/src/common/perl/MediaWords/JobManager/Broker/RabbitMQ.pm line 154.
MediaWords.JobManager.Broker.RabbitMQ: Retrying #3...
MediaWords.JobManager.Broker.RabbitMQ: Unable to connect to RabbitMQ, will retry: opening socket failed because AMQP socket connection was closed. at /opt/mediacloud/src/common/perl/MediaWords/JobManager/Broker/RabbitMQ.pm line 154.
MediaWords.JobManager.Broker.RabbitMQ: Retrying #4...
MediaWords.TM.Mine: waiting for fetch link queue: 32 queued
MediaWords.TM.Mine: waiting for fetch link queue: 32 links remaining ...
172.30.0.8 - - [04/Feb/2020 19:39:12] "GET /page-1 HTTP/1.1" 200 -
172.30.0.8 - - [04/Feb/2020 19:39:12] "GET /page-0 HTTP/1.1" 200 -
172.30.0.8 - - [04/Feb/2020 19:39:13] "GET /page-9 HTTP/1.1" 200 -
172.30.0.8 - - [04/Feb/2020 19:39:13] "GET /page-6 HTTP/1.1" 200 -
172.30.0.8 - - [04/Feb/2020 19:39:13] "GET /page-2 HTTP/1.1" 200 -
172.30.0.8 - - [04/Feb/2020 19:39:13] "GET /page-3 HTTP/1.1" 200 -
172.30.0.8 - - [04/Feb/2020 19:39:13] "GET /page-0 HTTP/1.1" 200 -
172.30.0.8 - - [04/Feb/2020 19:39:13] "GET /page-4 HTTP/1.1" 200 -
172.30.0.8 - - [04/Feb/2020 19:39:13] "GET /page-8 HTTP/1.1" 200 -
172.30.0.8 - - [04/Feb/2020 19:39:13] "GET /page-6 HTTP/1.1" 200 -
172.30.0.8 - - [04/Feb/2020 19:39:13] "GET /page-3 HTTP/1.1" 200 -
172.30.0.8 - - [04/Feb/2020 19:39:14] "GET /page-2 HTTP/1.1" 200 -
172.30.0.8 - - [04/Feb/2020 19:39:14] "GET /page-4 HTTP/1.1" 200 -
172.30.0.8 - - [04/Feb/2020 19:39:14] "GET /page-1 HTTP/1.1" 200 -
172.30.0.8 - - [04/Feb/2020 19:39:14] "GET /page-5 HTTP/1.1" 200 -
172.30.0.8 - - [04/Feb/2020 19:39:14] "GET /page-2 HTTP/1.1" 200 -
172.30.0.8 - - [04/Feb/2020 19:39:15] "GET /page-5 HTTP/1.1" 200 -
MediaWords.TM.Mine: waiting for fetch link queue: 15 links remaining ...
172.30.0.8 - - [04/Feb/2020 19:39:15] "GET /page-1 HTTP/1.1" 200 -
172.30.0.8 - - [04/Feb/2020 19:39:15] "GET /page-1 HTTP/1.1" 200 -
172.30.0.8 - - [04/Feb/2020 19:39:15] "GET /page-7 HTTP/1.1" 200 -
172.30.0.8 - - [04/Feb/2020 19:39:15] "GET /page-0 HTTP/1.1" 200 -
172.30.0.8 - - [04/Feb/2020 19:39:15] "GET /page-5 HTTP/1.1" 200 -
172.30.0.8 - - [04/Feb/2020 19:39:16] "GET /page-0 HTTP/1.1" 200 -
172.30.0.8 - - [04/Feb/2020 19:39:16] "GET /page-1 HTTP/1.1" 200 -
172.30.0.8 - - [04/Feb/2020 19:39:16] "GET /page-4 HTTP/1.1" 200 -
172.30.0.8 - - [04/Feb/2020 19:39:16] "GET /page-2 HTTP/1.1" 200 -
172.30.0.8 - - [04/Feb/2020 19:39:16] "GET /page-7 HTTP/1.1" 200 -
172.30.0.8 - - [04/Feb/2020 19:39:17] "GET /page-3 HTTP/1.1" 200 -
172.30.0.8 - - [04/Feb/2020 19:39:17] "GET /page-0 HTTP/1.1" 200 -
172.30.0.8 - - [04/Feb/2020 19:39:17] "GET /page-4 HTTP/1.1" 200 -
172.30.0.8 - - [04/Feb/2020 19:39:17] "GET /page-3 HTTP/1.1" 200 -
172.30.0.8 - - [04/Feb/2020 19:39:17] "GET /page-6 HTTP/1.1" 200 -
MediaWords.TM.Mine: waiting for fetch link queue: 0 links remaining ...
MediaWords.TM.Mine: fetch_links: update topic seed urls
MediaWords.TM.Mine: completed fetch link queue
MediaWords.TM.Mine: add_new_links_chunk: mark topic links spidered
MediaWords.TM.Mine: update topic state: running spider
MediaWords.JobManager.AbstractStatefulJob: Not called from MediaWords::JobManager::AbstractStatefulJob::run
MediaWords.TM.Mine: run spider
MediaWords.TM.Mine: mine topic stories
MediaWords.TM.Mine: mine topic stories: chunked 1000 ...
MediaWords.TM.Mine: generate topic links: 12
MediaWords.TM.Mine: waiting for 12 link extraction jobs to finish
MediaWords.TM.Mine: 12 stories left in link extraction pool....
MediaWords.TM.Mine: spider new links chunk: 0
MediaWords.TM.Mine: add new links
MediaWords.TM.Mine: get spider progress description
MediaWords.TM.Mine: update topic state: spidering iteration: 15; stories last iteration / total: 0 / 12; links queued: 17; iteration links: 17; iteration links: 0 / 17
MediaWords.JobManager.AbstractStatefulJob: Not called from MediaWords::JobManager::AbstractStatefulJob::run
MediaWords.TM.Mine: add_new_links_chunk: fetch_links
MediaWords.TM.Mine: fetch_links: queue links
172.30.0.8 - - [04/Feb/2020 19:39:25] "GET /page-1/dead HTTP/1.1" 404 -
MediaWords.TM.Mine: waiting for fetch link queue: 17 queued
MediaWords.TM.Mine: waiting for fetch link queue: 15 links remaining ...
172.30.0.8 - - [04/Feb/2020 19:39:25] "GET /page-5/dead HTTP/1.1" 404 -
172.30.0.8 - - [04/Feb/2020 19:39:25] "GET /page-3/dead HTTP/1.1" 404 -
172.30.0.8 - - [04/Feb/2020 19:39:26] "GET /page-0/dead HTTP/1.1" 404 -
172.30.0.8 - - [04/Feb/2020 19:39:26] "GET /page-6/dead HTTP/1.1" 404 -
172.30.0.8 - - [04/Feb/2020 19:39:26] "GET /page-4/dead HTTP/1.1" 404 -
172.30.0.8 - - [04/Feb/2020 19:39:26] "GET /page-7/dead HTTP/1.1" 404 -
172.30.0.8 - - [04/Feb/2020 19:39:26] "GET /page-0/dead HTTP/1.1" 404 -
172.30.0.8 - - [04/Feb/2020 19:39:26] "GET /page-3/dead HTTP/1.1" 404 -
172.30.0.8 - - [04/Feb/2020 19:39:26] "GET /page-7/dead HTTP/1.1" 404 -
172.30.0.8 - - [04/Feb/2020 19:39:26] "GET /page-1/dead HTTP/1.1" 404 -
172.30.0.8 - - [04/Feb/2020 19:39:26] "GET /page-4/dead HTTP/1.1" 404 -
MediaWords.TM.Mine: waiting for fetch link queue: 0 links remaining ...
MediaWords.TM.Mine: fetch_links: update topic seed urls
MediaWords.TM.Mine: completed fetch link queue
MediaWords.TM.Mine: add_new_links_chunk: mark topic links spidered
MediaWords.TM.Mine: mine topic stories
MediaWords.TM.Mine: mine topic stories: chunked 1000 ...
MediaWords.TM.Mine: generate topic links: 1
MediaWords.TM.Mine: waiting for 1 link extraction jobs to finish
MediaWords.TM.Mine: 1 stories left in link extraction pool....
MediaWords.TM.Mine: spider new links chunk: 1
MediaWords.TM.Mine: add new links
MediaWords.TM.Mine: get spider progress description
MediaWords.TM.Mine: update topic state: spidering iteration: 15; stories last iteration / total: 0 / 13; links queued: 1; iteration links: 1; iteration links: 0 / 1
MediaWords.JobManager.AbstractStatefulJob: Not called from MediaWords::JobManager::AbstractStatefulJob::run
MediaWords.TM.Mine: add_new_links_chunk: fetch_links
MediaWords.TM.Mine: fetch_links: queue links
MediaWords.TM.Mine: waiting for fetch link queue: 1 queued
MediaWords.TM.Mine: waiting for fetch link queue: 1 links remaining ...
MediaWords.TM.Mine: pending url: http://b16fd965224f:8891/page-2/dead [pending: null]
172.30.0.8 - - [04/Feb/2020 19:39:35] "GET /page-2/dead HTTP/1.1" 404 -
MediaWords.TM.Mine: waiting for fetch link queue: 0 links remaining ...
MediaWords.TM.Mine: fetch_links: update topic seed urls
MediaWords.TM.Mine: completed fetch link queue
MediaWords.TM.Mine: add_new_links_chunk: mark topic links spidered
MediaWords.TM.Mine: mine topic stories
MediaWords.TM.Mine: mine topic stories: chunked 1000 ...
MediaWords.TM.Mine: spider new links chunk: 2
MediaWords.TM.Mine: spider new links chunk: 0
MediaWords.TM.Mine: spider new links chunk: 0
MediaWords.TM.Mine: spider new links chunk: 0
MediaWords.TM.Mine: spider new links chunk: 0
MediaWords.TM.Mine: spider new links chunk: 0
MediaWords.TM.Mine: spider new links chunk: 0
MediaWords.TM.Mine: spider new links chunk: 0
MediaWords.TM.Mine: spider new links chunk: 0
MediaWords.TM.Mine: spider new links chunk: 0
MediaWords.TM.Mine: spider new links chunk: 0
MediaWords.TM.Mine: spider new links chunk: 0
MediaWords.TM.Mine: spider new links chunk: 0
MediaWords.TM.Mine: spider new links chunk: 0
MediaWords.TM.Mine: spider new links chunk: 0
MediaWords.TM.Mine: check job error rate
MediaWords.TM.Mine: Fetch error rate: 0 (0 / 50)
MediaWords.TM.Mine: Link error rate: 0 (0 / 13)
MediaWords.TM.Mine: update topic state: merging duplicate stories
MediaWords.JobManager.AbstractStatefulJob: Not called from MediaWords::JobManager::AbstractStatefulJob::run
INFO topics_base.stories: adding normalized titles ...
INFO topics_base.stories: adding normalized story titles ...
INFO topics_base.stories: finding duplicate stories ...
INFO topics_base.stories: merging 0 duplicate story groups ...
MediaWords.TM.Mine: update topic state: merging duplicate media stories
MediaWords.JobManager.AbstractStatefulJob: Not called from MediaWords::JobManager::AbstractStatefulJob::run
INFO topics_base.stories: merge dup media stories
MediaWords.TM.Mine: update topic state: adding source link dates
MediaWords.JobManager.AbstractStatefulJob: Not called from MediaWords::JobManager::AbstractStatefulJob::run
MediaWords.TM.Mine: add source link dates
main: ALL PAGES: 32
/opt/mediacloud/tests/perl/test_tm_mine.t .. 75/? main: TOPIC PAGES: 19
main: No more pending URLs, continuing
# Failed test 'dead link count'
# at /opt/mediacloud/tests/perl/test_tm_mine.t line 406.
# got: '13'
# expected: '19'
main: fetch states: $VAR1 = [
{
'count' => 19,
'state' => 'story added'
},
{
'count' => 13,
'state' => 'request failed'
},
{
'count' => 3,
'state' => 'story match'
},
{
'count' => 15,
'state' => 'content match failed'
}
];
main: fetch errors: $VAR1 = [];
# Failed test 'number of topic_links for http://b16fd965224f:8890/page-0 -> http://b16fd965224f:8890/page-4'
# at /opt/mediacloud/tests/perl/test_tm_mine.t line 449.
# got: '0'
# expected: '1'
# Failed test 'number of topic_links for http://b16fd965224f:8890/page-1 -> http://b16fd965224f:8890/page-7'
# at /opt/mediacloud/tests/perl/test_tm_mine.t line 449.
# got: '0'
# expected: '1'
# Failed test 'number of topic_links for http://b16fd965224f:8890/page-3 -> http://b16fd965224f:8891/page-3'
# at /opt/mediacloud/tests/perl/test_tm_mine.t line 449.
# got: '0'
# expected: '1'
# Failed test 'number of topic_links for http://b16fd965224f:8890/page-4 -> http://b16fd965224f:8890/page-3'
# at /opt/mediacloud/tests/perl/test_tm_mine.t line 449.
# got: '0'
# expected: '1'
# Failed test 'number of topic_links for http://b16fd965224f:8891/page-3 -> http://b16fd965224f:8892/page-4'
# at /opt/mediacloud/tests/perl/test_tm_mine.t line 449.
# got: '0'
# expected: '1'
/opt/mediacloud/tests/perl/test_tm_mine.t .. 97/? WARNING mediawords.test.hash_server: Port 8894 is not open.
WARNING mediawords.test.hash_server: Port 8893 is not open.
WARNING mediawords.test.hash_server: Port 8892 is not open.
WARNING mediawords.test.hash_server: Port 8891 is not open.
WARNING mediawords.test.hash_server: Port 8890 is not open.
# Looks like you failed 6 tests of 102.
/opt/mediacloud/tests/perl/test_tm_mine.t .. Dubious, test returned 6 (wstat 1536, 0x600)
Failed 6/102 subtests
Test Summary Report
-------------------
/opt/mediacloud/tests/perl/test_tm_mine.t (Wstat: 1536 Tests: 102 Failed: 6)
Failed tests: 90-94, 96
Non-zero exit status: 6
Files=1, Tests=102, 45 wallclock secs ( 0.07 usr 0.05 sys + 2.77 cusr 1.34 csys = 4.23 CPU)
Result: FAIL
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment