Last active
December 11, 2015 10:19
-
-
Save seraphr/4586164 to your computer and use it in GitHub Desktop.
fluentdで起こっている問題
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
起動時 | |
2013-01-21 07:04:12 +0900: starting fluentd-0.10.30 | |
1/22 追記 | |
aggregate側のすべてのマシンのhostsファイルに関連するアドレスをすべて記入しましたが、この問題は依然発生しています。 | |
問題が起きた時のログ | |
このログの出力後、全てのsource(access-log以外も含む)から、このfluentdへのデータ転送が止まります(止まらない場合もあります)。 | |
また、同時にsourceのうち1台から、全てのaggregateへの送信が停止します。 | |
この状態になった後、td-agentをstopすると、td-agentの(恐らく)子プロセスが残った状態となります。 | |
この子プロセスをkillすると、停止していたsourceから、一気にログが送信されます。 | |
2013-01-21 07:14:14 +0900: failed to communicate hdfs cluster, path: /path/to/dir/access-log/2013012107/[このマシンのホスト名] | |
2013-01-21 07:14:14 +0900: temporarily failed to flush the buffer, next retry will be at 2013-01-21 07:14:07 +0900. error="getaddrinfo: No address associated with hostname" instance=69896310930920 | |
2013-01-21 07:14:14 +0900: /usr/lib64/fluent/ruby/lib/ruby/1.9.1/net/http.rb:762:in `initialize' | |
2013-01-21 07:14:14 +0900: /usr/lib64/fluent/ruby/lib/ruby/1.9.1/net/http.rb:762:in `open' | |
2013-01-21 07:14:14 +0900: /usr/lib64/fluent/ruby/lib/ruby/1.9.1/net/http.rb:762:in `block in connect' | |
2013-01-21 07:14:14 +0900: /usr/lib64/fluent/ruby/lib/ruby/1.9.1/timeout.rb:68:in `timeout' | |
2013-01-21 07:14:14 +0900: /usr/lib64/fluent/ruby/lib/ruby/1.9.1/timeout.rb:99:in `timeout' | |
2013-01-21 07:14:14 +0900: /usr/lib64/fluent/ruby/lib/ruby/1.9.1/net/http.rb:762:in `connect' | |
2013-01-21 07:14:14 +0900: /usr/lib64/fluent/ruby/lib/ruby/1.9.1/net/http.rb:755:in `do_start' | |
2013-01-21 07:14:14 +0900: /usr/lib64/fluent/ruby/lib/ruby/1.9.1/net/http.rb:744:in `start' | |
2013-01-21 07:14:14 +0900: /usr/lib64/fluent/ruby/lib/ruby/1.9.1/net/http.rb:1284:in `request' | |
2013-01-21 07:14:14 +0900: /usr/lib64/fluent/ruby/lib/ruby/1.9.1/net/http.rb:1264:in `send_request' | |
2013-01-21 07:14:14 +0900: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/webhdfs-0.5.1/lib/webhdfs/client_v1.rb:259:in `request' | |
2013-01-21 07:14:14 +0900: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/webhdfs-0.5.1/lib/webhdfs/client_v1.rb:232:in `operate_requests' | |
2013-01-21 07:14:14 +0900: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/webhdfs-0.5.1/lib/webhdfs/client_v1.rb:46:in `append' | |
2013-01-21 07:14:14 +0900: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluent-plugin-webhdfs-0.1.0/lib/fluent/plugin/out_webhdfs.rb:100:in `send_data' | |
2013-01-21 07:14:14 +0900: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluent-plugin-webhdfs-0.1.0/lib/fluent/plugin/out_webhdfs.rb:109:in `write' | |
2013-01-21 07:14:14 +0900: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.30/lib/fluent/buffer.rb:279:in `write_chunk' | |
2013-01-21 07:14:14 +0900: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.30/lib/fluent/buffer.rb:263:in `pop' | |
2013-01-21 07:14:14 +0900: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.30/lib/fluent/output.rb:303:in `try_flush' | |
2013-01-21 07:14:14 +0900: /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/fluentd-0.10.30/lib/fluent/output.rb:120:in `run' | |
2013-01-21 07:14:14 +0900: retry succeeded. instance=69896310930920 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
上記で、ログの転送が止まったsourceのログです。 | |
起動時 | |
2013-01-18 15:27:18 +0900: starting fluentd-0.10.30 | |
2013-01-21 07:14:46 +0900: detached forwarding server '[問題が起きたマシンのIPアドレス]:24224' host="[問題が起きたマシンのIPアドレス]" port=24224 phi=16.237346895039206 | |
その後fluentdの再起動時に復旧ログ | |
2013-01-21 08:04:27 +0900: recovered forwarding server '[問題が起きたマシンのIPアドレス]:24224' host="[問題が起きたマシンのIPアドレス]" port=24224 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
source(10台) -> aggregate(5台) -> hdfs | |
上記15台のfluentdのversionは、0.10.30です。 | |
ただし、sourceは、他の種類(タグ)のログを送っているものが存在します。 | |
確認したところ、それらのfluentdのバージョンは『fluentd-0.10.25』でした。 | |
OSは恐らく全台以下のバージョンです | |
$ cat /proc/version | |
Linux version 2.6.32-220.4.2.el6.x86_64 (mockbuild@x86-003.build.bos.redhat.com) (gcc version 4.4.6 20110731 (Red Hat 4.4.6-3) (GCC) ) #1 SMP Mon Feb 6 16:39:28 EST 2012 | |
この問題が起こると、fluentd(td-agent)をkillして再起動しないと復旧できなくなり、現在定期的に再起動を指定ます。 | |
そして、現在データの欠落が発生しています。 | |
1/24 ログなどupload 有効期限3間? 一応パスワードかかっています。 | |
https://www.datadeliver.net/receiver/file_box.do?fb=668a83449ae04590b2a3484cdf47c738&rc=615ddd93ff7c48e19180c53f3a2eacd0&lang=ja |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
止まっていない時の、ファイルopen数
aggregate 4号機
$ ps aux | grep td-agent
root 22244 0.0 0.0 210952 18904 ? Sl 01:04 0:00 /usr/lib64/fluent/ruby/bin/ruby /usr/sbin/td-agent --user td-agent --group td-agent -vv --log /var/log/td-agent/td-agent.log --daemon /var/run/td-agent/td-agent.pid
td-agent 22250 5.2 0.5 709860 171756 ? Sl 01:04 1:52 /usr/lib64/fluent/ruby/bin/ruby /usr/sbin/td-agent --user td-agent --group td-agent -vv --log /var/log/td-agent/td-agent.log --daemon /var/run/td-agent/td-agent.pid
$ sudo lsof -p 22244 | wc -l
51
$ sudo lsof -p 22250 | wc -l
64
source 4号機
$ ps aux | grep td-agent
root 8554 0.0 0.0 210952 18776 ? Sl Jan18 0:00 /usr/lib64/fluent/ruby/bin/ruby /usr/sbin/td-agent --user td-agent --group td-agent --log /var/log/td-agent/td-agent.log --daemon /var/run/td-agent/td-agent.pid
td-agent 8557 1.1 0.1 397928 156428 ? Sl Jan18 92:17 /usr/lib64/fluent/ruby/bin/ruby /usr/sbin/td-agent --user td-agent --group td-agent --log /var/log/td-agent/td-agent.log --daemon /var/run/td-agent/td-agent.pid
$ sudo lsof -p 8554 | wc -l
51
$ sudo lsof -p 8557 | wc -l
61