Skip to content

Instantly share code, notes, and snippets.

@goakley
Last active March 12, 2018 18:00
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save goakley/d0968cb9411841e33eda7ccb896cde36 to your computer and use it in GitHub Desktop.
Save goakley/d0968cb9411841e33eda7ccb896cde36 to your computer and use it in GitHub Desktop.

Process List

$ sudo ps aux | grep td-agent
root     13739  0.0  0.0 150224 36724 ?        Sl    2017  28:55 /opt/td-agent/embedded/bin/ruby /usr/sbin/td-agent --log /var/log/td-agent/td-agent.log --daemon /var/run/td-agent/td-agent.pid
root     31144 26.2  0.2 627576 162900 ?       Sl   Mar09 1086:37 /opt/td-agent/embedded/bin/ruby -Eascii-8bit:ascii-8bit /usr/sbin/td-agent --log /var/log/td-agent/td-agent.log --daemon /var/run/td-agent/td-agent.pid --under-supervisor

Latest fluentd log entries

The current time is 2018-03-12 17:45:00 +0000, so logs stopped being written over 24 hours ago.

2018-03-11 01:00:00 +0000 [info]: #0 detected rotation of /var/log/postgresql/postgresql-01.csv
2018-03-11 01:00:00 +0000 [info]: #0 following tail of /var/log/postgresql/postgresql-01.csv
2018-03-11 01:17:04 +0000 [info]: #0 detected rotation of /var/log/syslog; waiting 5 seconds
2018-03-11 01:17:04 +0000 [info]: #0 following tail of /var/log/syslog
2018-03-11 02:00:00 +0000 [info]: #0 detected rotation of /var/log/postgresql/postgresql-02.csv
2018-03-11 02:00:00 +0000 [info]: #0 following tail of /var/log/postgresql/postgresql-02.csv

strace of Child Process

The following strace hangs indefinintely on the one futex call:

$ sudo strace -p 31144
Process 31144 attached
futex(0x7f11ec248044, FUTEX_WAIT_PRIVATE, 40037490, NULL

SIGDUMP of td-agent

Sigdump at 2018-03-12 17:31:52 +0000 process 13739 (/usr/sbin/td-agent)
  Thread #<Thread:0x007f6811c1a630> status=sleep priority=0
      /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/serverengine-2.0.5/lib/serverengine/process_manager.rb:252:in `select'
      /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/serverengine-2.0.5/lib/serverengine/process_manager.rb:252:in `tick'
      /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/serverengine-2.0.5/lib/serverengine/multi_spawn_server.rb:91:in `wait_tick'
      /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/serverengine-2.0.5/lib/serverengine/multi_worker_server.rb:60:in `run'
      /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/serverengine-2.0.5/lib/serverengine/multi_spawn_server.rb:57:in `run'
      /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/serverengine-2.0.5/lib/serverengine/server.rb:123:in `main'
      /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/serverengine-2.0.5/lib/serverengine/daemon.rb:172:in `block (2 levels) in daemonize_with_double_fork'
      /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/serverengine-2.0.5/lib/serverengine/daemon.rb:150:in `fork'
      /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/serverengine-2.0.5/lib/serverengine/daemon.rb:150:in `block in daemonize_with_double_fork'
      /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/serverengine-2.0.5/lib/serverengine/daemon.rb:142:in `fork'
      /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/serverengine-2.0.5/lib/serverengine/daemon.rb:142:in `daemonize_with_double_fork'
      /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/serverengine-2.0.5/lib/serverengine/daemon.rb:107:in `main'
      /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/serverengine-2.0.5/lib/serverengine/daemon.rb:68:in `run'
      /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.23/lib/fluent/supervisor.rb:606:in `supervise'
      /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.23/lib/fluent/supervisor.rb:476:in `run_supervisor'
      /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.23/lib/fluent/command/fluentd.rb:310:in `<top (required)>'
      /opt/td-agent/embedded/lib/ruby/site_ruby/2.1.0/rubygems/core_ext/kernel_require.rb:55:in `require'
      /opt/td-agent/embedded/lib/ruby/site_ruby/2.1.0/rubygems/core_ext/kernel_require.rb:55:in `require'
      /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.23/bin/fluentd:5:in `<top (required)>'
      /opt/td-agent/embedded/bin/fluentd:23:in `load'
      /opt/td-agent/embedded/bin/fluentd:23:in `<top (required)>'
      /usr/sbin/td-agent:7:in `load'
      /usr/sbin/td-agent:7:in `<main>'
  Thread #<ServerEngine::SignalThread:0x007f680c49aa68> status=run priority=0
      /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/sigdump-0.2.4/lib/sigdump.rb:52:in `backtrace'
      /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/sigdump-0.2.4/lib/sigdump.rb:52:in `dump_backtrace'
      /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/sigdump-0.2.4/lib/sigdump.rb:34:in `block in dump_all_thread_backtrace'
      /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/sigdump-0.2.4/lib/sigdump.rb:33:in `each'
      /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/sigdump-0.2.4/lib/sigdump.rb:33:in `dump_all_thread_backtrace'
      /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/sigdump-0.2.4/lib/sigdump.rb:16:in `block in dump'
      /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/sigdump-0.2.4/lib/sigdump.rb:136:in `open'
      /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/sigdump-0.2.4/lib/sigdump.rb:136:in `_open_dump_path'
      /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/sigdump-0.2.4/lib/sigdump.rb:14:in `dump'
      /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/serverengine-2.0.5/lib/serverengine/server.rb:107:in `block (2 levels) in install_signal_handlers'
      /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/serverengine-2.0.5/lib/serverengine/signal_thread.rb:96:in `call'
      /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/serverengine-2.0.5/lib/serverengine/signal_thread.rb:96:in `main'
  Thread #<Thread:0x007f680c4992d0> status=sleep priority=0
      /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/serverengine-2.0.5/lib/serverengine/socket_manager_unix.rb:77:in `accept'
      /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/serverengine-2.0.5/lib/serverengine/socket_manager_unix.rb:77:in `block in start_server'
  GC stat:
      count: 2687
      heap_used: 261
      heap_length: 261
      heap_increment: 0
      heap_live_slot: 106217
      heap_free_slot: 170
      heap_final_slot: 0
      heap_swept_slot: 12156
      heap_eden_page_length: 261
      heap_tomb_page_length: 0
      total_allocated_object: 114443554
      total_freed_object: 114337337
      malloc_increase: 1079104
      malloc_limit: 16777216
      minor_gc_count: 2680
      major_gc_count: 7
      remembered_shady_object: 1123
      remembered_shady_object_limit: 1716
      old_object: 59226
      old_object_limit: 95890
      oldmalloc_increase: 5480352
      oldmalloc_limit: 16777216
  Built-in objects:
   106,387: TOTAL
    32,552: T_ARRAY
    27,135: T_STRING
    23,976: T_NODE
    15,024: T_DATA
     2,818: T_CLASS
     2,471: T_OBJECT
       909: T_HASH
       612: T_REGEXP
       362: T_ICLASS
       274: T_MODULE
        80: FREE
        68: T_STRUCT
        59: T_RATIONAL
        28: T_FILE
         9: T_FLOAT
         7: T_BIGNUM
         2: T_MATCH
         1: T_COMPLEX
  All objects:
    27,323: String
    14,423: Array
     7,115: RubyVM::InstructionSequence
     2,135: Time
     1,222: Class
       936: Gem::Requirement
       799: Hash
       656: Gem::Dependency
       612: Regexp
       405: Proc
       343: RubyVM::Env
       274: Module
       254: Gem::Version
       128: Gem::StubSpecification
       128: Gem::Specification
       120: Gem::StubSpecification::StubLine
       100: Encoding
        59: Rational
        57: Range
        53: Fluent::Config::ConfigureProxy
        28: Mutex
        24: IO
        22: Fluent::Config::Element
        19: ServerEngine::Worker
        19: ServerEngine::MultiProcessServer::WorkerMonitor
        19: ServerEngine::ProcessManager::Monitor
        17: OptionParser::Switch::RequiredArgument
        16: MatchData
        14: Fluent::Config::Section
        13: OptionParser::Switch::NoArgument
         9: Float
         8: Method
         8: Fluent::Registry
         8: Fluent::Plugin::Base::State
         7: Thread::Backtrace
         7: Bignum
         6: OptionParser::OptionMap
         6: Object
         5: Monitor
         4: Set
         4: JSON::Ext::Generator::State
         3: File
         3: ServerEngine::DaemonLogger
         3: Logger::Formatter
         3: Fluent::Log
         3: IRB::Notifier::LeveledNotifier
         3: Thread
         3: Fluent::EventRouter::Rule
         3: Fluent::GlobMatchPattern
         3: OptionParser::List
         2: OptionParser::Switch::OptionalArgument
         2: Fluent::Plugin::RegexpParser
         2: OptionParser::Switch::PlacedArgument
         2: Strptime
         2: MessagePack::Factory
         2: Fluent::Plugin::TailInput
         2: Coolio::Loop
         2: LoadError
         2: Fluent::Supervisor::LoggerInitializer
         2: Fluent::Plugin::RecordTransformerFilter
         2: Fluent::Plugin::RecordTransformerFilter::PlaceholderExpander
         2: BigDecimal
         2: ThreadSafe::Cache
         2: UnboundMethod
         2: Fluent::TimeParser
         1: ArgumentError
         1: ServerEngine::Daemon
         1: Fluent::Compat::NullOutputChain
         1: ServerEngine::MultiSpawnServer
         1: ServerEngine::ProcessManager
         1: ServerEngine::SignalThread
         1: WEBrick::HTTPVersion
         1: OpenSSL::X509::Store
         1: Thread::ConditionVariable
         1: Resolv::DNS::Config
         1: Resolv::DNS
         1: Resolv::Hosts
         1: Resolv
         1: ServerEngine::SocketManager::Server
         1: Fluent::PluginLogger
         1: Fluent::Plugin::ElasticsearchOutput
         1: UNIXServer
         1: Fluent::EngineClass
         1: OptionParser
         1: Fluent::Supervisor
         1: Fluent::SystemConfig
         1: EOFError
         1: Gem::Platform
         1: Fluent::Plugin::FileBuffer
         1: Gem::PathSupport
         1: Fluent::RootAgent
         1: Fluent::NoMatchMatch
         1: #<Class:0x007f6811399510>
         1: Fluent::EventRouter
         1: IRB::Notifier::CompositeNotifier
         1: IRB::StdioOutputMethod
         1: IRB::Notifier::NoMsgNotifier
         1: Fluent::EventRouter::MatchCache
         1: Complex
         1: ThreadGroup
         1: IOError
         1: Binding
         1: RubyVM
         1: NoMemoryError
         1: SystemStackError
         1: Random
         1: ARGF.class
         1: fatal
         1: URI::Parser
         1: Data
         1: OptionParser::CompletingHash
         1: #<Class:0x007f6811ff5228>
  String 574,107 bytes
   Array 1 elements
    Hash 7 pairs
##
## Managed by Puppet -- do not edit!
##
# Master configuration file for td-agent
# Include any configuration files in the config.d directory.
@include td-agent.conf.d/*.conf
# Expose fluentd metrics
<source>
@type monitor_agent
bind 127.0.0.1
port 24221
</source>
# Configure output to Elasticsearch
<match elasticsearch.**>
@id elasticsearch
@type copy
<store>
@id elasticsearch_http_production_es5_elk_9200
@type elasticsearch
hosts http://production-es5-elk:9200
include_tag_key false
logstash_format true
request_timeout 16s
<buffer>
@type file
path /var/log/td-agent/buffer/elasticsearch_http_production_es5_elk_9200
chunk_limit_size 16m # 100m limit in elasticsearch
total_limit_size 1024m
flush_mode interval
flush_interval 4s
flush_thread_count 16
retry_max_interval 16s
</buffer>
</store>
</match>
# GIST NOTE - this file repeats 12 times, for files `01` through `12`
# Tail the file pattern itself
<source>
@type tail
format multiline
format_firstline /^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3} UTC,/
format1 /^(?<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3} UTC),/
format2 /(?<quote>")?(?<user_name>\w*)(?(<quote>)\k<quote>|),/
format3 /(?<quote>")?(?<database_name>\w*)(?(<quote>)\k<quote>|),/
format4 /(?<process_id>\d+),/
format5 /(?<quote>")?(?<connection_from>.*?)(?(<quote>)\k<quote>|),/
format6 /(?<session_id>[0-9a-z\.]+),/
format7 /(?<session_line_num>\d+),/
format8 /(?<quote>")?(?<command_tag>.*?)(?(<quote>)\k<quote>|),/
format9 /(?<session_start_time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2} UTC),/
format10 /(?<virtual_transaction_id>[0-9\/]*),/
format11 /(?<transaction_id>\d+),/
format12 /(?<error_severity>\w+),/
format13 /(?<sql_state_code>\w+),/
# the following field contains quite a bit of information we need to
# parse and extract -- query duration, run_id, SQL statement, actions
# such as connection/disconnection, ...
#
# the most common form is a plain query execution:
# "duration: 0.516 ms statement: /* run_id:70dc4652804f18b2 /native/bid/list/paginated */ SELECT ...."
# from which we want to extract the duration, run_id, source URL, and actual statement
#
# some clients use the extended query protocol which includes the
# Parse, Bind, and Execute steps instead of the "statement" literal
# and we match that on ?<step>
#
# other, radically different, entries may also be loggged:
# "connection authorized: user=postgres database=postgres ..."
# "disconnection: session time: 0:00:03.036 user=postgres ..."
#
# these cases are captured by the second part of the regexp (after
# the |); there's valuable information in these strings (connection
# strings, for example), but we don't know exactly what to expect so
# just capture the full ?<message> -- we really want to avoid using
# `statement` here so that we don't treat these arbitrary messages as
# potential sql queries
format14 /"((duration: (?<query_time>\d+\.\d+) ms\s+(?<step>.*?)\s+(.*?): (\/\* run_id:(?<run_id>.*?) (?<source>.*?) \*\/ )?(?<statement>.*?))|(?<message>.*?))",/
# all the following captures may or may not be quoted
format15 /(?<detail>("[^"]*")|(.*?)),/
format16 /(?<hint>("[^"]*")|(.*?)),/
format17 /(?<internal_query>("[^"]*")|(.*?)),/
format18 /(?<internal_query_pos>("[^"]*")|(.*?)),/
format19 /(?<context>("[^"]*")|(.*?)),/
# everything else on one line for fun
format20 /(?<query>.*?),(?<query_pos>.*?),(?<location>.*?),"(?<application_name>.*?)"$/
time_format %Y-%m-%d %H:%M:%S.%L %Z
types query_time:float
path /var/log/postgresql/postgresql-01.csv
path_key path
tag elasticsearch._var_log_postgresql_postgresql_01_csv
read_from_head true
refresh_interval 1
pos_file /var/log/td-agent/pos/_var_log_postgresql_postgresql_01_csv.pos
</source>
# Apply any custom filters
<filter elasticsearch._var_log_postgresql_postgresql_01_csv>
@type sql
# ignore queries with a duration key less than the specified number of milliseconds
duration_key query_time
min_duration 10
</filter>
<filter elasticsearch._var_log_postgresql_postgresql_01_csv>
@type sample
rate 1.0
# allow x logs every y seconds
max_throughput_cap 10000
max_throughput_interval 60
</filter>
# Add metadata about the host
<filter elasticsearch._var_log_postgresql_postgresql_01_csv>
@type record_transformer
enable_ruby
<record>
# add the hostname and host tags
hostname "production-pg96-master-000"
environment "production"
application "pg96"
# "role" is the puppet term, but engineers are used to seeing "label"
# because of alfred, so add both
role "master"
label "master"
# any extra filtering
statement ${record.fetch('statement', '').downcase.gsub(/\/\*(.*?)\*\//, '').gsub(/\$\d+/, '<VAL>').gsub(/[-+]?\d+/, '<INT>').gsub(/'[^']*'/, '<STR>').gsub(/\s+/, ' ').gsub(/ in ?\( ?<(INT|STR|VAL)> ?(, ?<\1> ?)*\)/, ' in (<\1>, ...)')}
</record>
</filter>
# Tail the file pattern itself
<source>
@type tail
format /^(?<time>[^ ]+ +[^ ]+ +[^ ]+) [^ ]+ (?<source>[^:\[]+)(?:\[[^\]]+\])?: (?<message>.+)/
time_format %b %d %H:%M:%S
path /var/log/syslog
path_key path
tag elasticsearch._var_log_syslog
read_from_head true
refresh_interval 1
pos_file /var/log/td-agent/pos/_var_log_syslog.pos
</source>
# Apply any custom filters
# Add metadata about the host
<filter elasticsearch._var_log_syslog>
@type record_transformer
enable_ruby
<record>
# add the hostname and host tags
hostname "production-pg96-master-000"
environment "production"
application "pg96"
# "role" is the puppet term, but engineers are used to seeing "label"
# because of alfred, so add both
role "master"
label "master"
# any extra filtering
</record>
</filter>
# Tail the file pattern itself
<source>
@type tail
format multiline
format_firstline /^[^ ]+ [^ ]+ [^ ]+/
format1 /^(?<time>[^ ]+ [^ ]+ [^ ]+) \[(?<level>[^\]]+)\]: (?<message>.+)/
time_format %Y-%m-%d %H:%M:%S %z
path /var/log/td-agent/td-agent.log
path_key path
tag elasticsearch._var_log_td_agent_td_agent_log
read_from_head true
refresh_interval 1
pos_file /var/log/td-agent/pos/_var_log_td_agent_td_agent_log.pos
</source>
# Apply any custom filters
# Add metadata about the host
<filter elasticsearch._var_log_td_agent_td_agent_log>
@type record_transformer
enable_ruby
<record>
# add the hostname and host tags
hostname "production-pg96-master-000"
environment "production"
application "pg96"
# "role" is the puppet term, but engineers are used to seeing "label"
# because of alfred, so add both
role "master"
label "master"
# any extra filtering
</record>
</filter>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment