Skip to content

Instantly share code, notes, and snippets.

@cecil
Created January 30, 2015 17:22
Show Gist options
  • Save cecil/9737176eee16c1e601ee to your computer and use it in GitHub Desktop.
Save cecil/9737176eee16c1e601ee to your computer and use it in GitHub Desktop.
collectl man page
COLLECTL(1) Collectl COLLECTL(1)
NAME
collectl - Collects data that describes the current system status.
SYNOPSIS
Record Mode - read data from live system and write to file or display
on terminal
collectl [-f file] [options]
Playback Mode - read data from one or more raw data files and display
on terminal
collectl -p file1 [file2 ...] [options]
OPTIONS
Record Mode
In this mode data is taken from a live system and either displayed on
the terminal or written to one or more files or a socket.
--align
If the HiRes modules is present, collectl sample monitoring will
be aligned such that a sample will always be taken at the top of
a minute (this does NOT mean the first sample will occur then)
so that all instances of collectl running on any systems which
have their clocks synchronized will all take samples at the same
time. Furthermore, if one is doing process monitoring, those
samples will also be taken at the top of the minute and so can
delay the start of sampling up to 2 full process monitoring
intervals.
--all
Collect summary data for ALL subsystems except slabs, since slab
monitoring requires a different monitoring interval. This also
means you won’t get any detail data which also includes pro-
cesses and environmementals. You can use this switch anywhere
-s can be used but not both together. If the system supports
lustre and/or interconnect monitoring those statistics will be
provided but the warnings produced when they are not available
you try to select them with -s will not be displayed.
--ALL
This is actually a superset of --all by adding detail statistics
as well with the exception of TCP details when displaying to a
terminal since those are only available with -P or -f.
-A, --address address[:port[:timeout]] | server[:port]
In the first form, one specifies an address, optional port and
timeout (the first colon is required to specify timeout for
default port). All data is then written to that socket prefaced
with the current host name at the named address and port until
the socket is closed, at which time collectl will exit.
In the second form one enters the text "server" and optional
port. In this form, collectl runs as a server, waiting for a
connection and once established writes data on that socket. The
key difference here is if the client exists collectl keeps run-
ning and will again look for a new connection, allowing it to
survive client restarts or crashes.
The default port is set at 2655 but can be changed - see col-
lectl.conf.
In both forms, one can additionally request local data logging
by specifying a combination of -P and -f. See man collectl-log-
ging for more details.
--comment string
Add the specified string to the end of the headers in the data
files. If any embedded spaces be sure to quote it. This can be
very useful when doing characterizations or benchmarking and
you’re frequently changing system/application parameters and
restarting collectl between tests.
-C, --config filename
Name/location of the collectl configuration file. If not speci-
fied, collectl searches for collectl.conf first in /etc (the
default), then in the same directory the collectl executable is
in, and finally the current working directory.
-c, --count Samples
The number of samples to record. This is one way of 3 ways of
describing how long collectl should run (see -r and -R ). Note
that these 3 switches are mutually exclusive.
-D, --daemon
Run collectl as a daemon, primarily used when starting as a ser-
vice. One caveat about this mode is you can only run one copy.
--export file[,options]
This requests that collectl does not print anything on the ter-
minal (or send it to a socket) using the standard brief/ver-
bose/plot formats. Instead it executes a perl "require" on the
named file, using an extension of ph if not specified. It first
looks in the current directory and if not there the directory
the executable is in. It then calls the function
"file"Init(options) towards the beginning of collectl and again
as simply "file"(@options) to generate the exported formatted
output. See the online documentation on Exporting Custom Output
and Logging for more details.
-f, --filename Filename
This is the name of a file to write the output to. For details
on how the output files are named, see the File Naming section
of the documentation on collectl.sourceforge.net OR
/usr/share/doc/collectl/FileNaming.html
-F, --flush seconds
Flush output buffers after this number of seconds. This is
equivalent to issuing kill -s USR1 at the same frequency (but a
lot easier!). If 0, a flush will occur every data collection
interval.
--grep pattern
The main purpose of this switch is for those users who have dis-
covered there is some data in the raw files that never appears
in any display and have taken to displaying it themselves with
grep. Unfortunately this method does not include timestamps and
so makes it difficult to interpret the results. Even if you
include the timestamp from the file it is in UTC and so needs to
be translated to be of any real value. This switch does just
that and then some.
Specifically, it allows you to playback a file and instead of
processing it normally it simply searches for any entries that
match the perl pattern and reports those lines prefaced with
time stamps. You can optionally change the time format with the
usual -o options and can even select the timeframe with --from
and --thru.
--home
Always start the display for the current interval at the top of
the screen also known as the home position (non-plot format
only). This generates a real-time, continously refreshing dis-
play when the data fits on a single screen.
--import file1[,options][:file2[,options]...]
This loads the named files and executes callbacks to them, which
is the API mechanism for importing additional metrics into col-
lectl. See the webpage on the API for further detail.
Since these files also include instructions for how to report
the output in all the various forms, you will also need to
include --import during playback. Finally, since the default is
to seamlessly include imported data with everything else col-
lectl reports, if you ONLY want to display imported data you
much explicitly deselect all other subsystems either by includ-
ing -s- (note the trailing minus sign) followed by all the sub-
systems were recorded OR simply say -s-all.
-i, --interval interval[:interval2[:interval3]]
This is the sampling interval in seconds. The default is 10
seconds when run as a daemon and 1 second otherwise. The pro-
cess subsystem and slabs (-sY and -sZ) are sampled at the lower
rate of interval2. Environmentals (-sE), which only apply to a
subset of hardware, are sampled at interval3. Both interval2
and interval3, if specified, must be an even multiple of inter-
val1. The daemon default is -i10:60:300 and all other modes are
-i1:60:300. To sample only processes once every 10 seconds use
-i:10.
--nohup
Whenever collectl finishes a data collection interval, it checks
to see if the starting parent has exited. This is to prevent
the case in which someone might start a copy of collectl and
then the process dies and collectl keeps running. If that is
the behavior someone actually intends, they should start col-
lectl with --nohup.
NOTE - when running as a daemon, --nohup is implied.
--quiet
Whenever collectl wants to tell the user something, it assigns a
category to it such as Informational, Warning, Error or Fatal.
When run with -m, all messages are displayed for the user and if
logging data to a file with -f, these messages are also sent to
a log file which is in the data collection directory and has an
extenion of "log". However, if -m is not specified Informa-
tional messages (such as collectl starting or stopping) are not
reported on the terminal but the other 3 are. Sometimes the
warnings can be annoying and one can suppress these with --quiet
though they will still be written to the message log in -f. You
cannot suppress Error or Fatal errors.
-r, --rolllogs time[[,days[:months]][,minutes]]
When selected, collectl runs indefinately (or at least until the
system reboots). The maximum number of raw and/or plot files
that will be retained (older ones are automatically deleted) is
controlled by the days field, the default is 7. When -m is also
specified to direct collectl to write messages to a log file in
the logging directory, the number of months to retain those logs
is controlled by the months field and its default is 12. The
increment field which is also optional (but is position depen-
dent) specifies the duration of an individual collection file in
minutes the default of which is 1440 or 1 day.
--rawdskfilt
This switch overrides the DiskFilter setting in collectl.conf
and explicitly defines a perl regx expression against which
records from /prod/diskstats are selected for processing. When
there are a lot of disks to process, this can be a handy way to
reduce the amount of data collected and actually improve perfor-
mance since there are less patterns to match each input record
against. Just remember that unlike --dskfilt which only filters
during display, records filtered with this switch are never even
recorded and so lost forever.
As a side benefit of this switch, if you really want to look at
partition level stats you can do so by leaving off the trailing
space in the default pattern.
One must be also be careful in selecting the correct pattern
since it’s easy to get it wrong and you may end up collecting
the WRONG data! To verify you are collecting what you think you
are, make a test run using -d4 to see the raw data being
recorded in real-time.
--rawdskignore
This is the opposite of the rawdskfilt switch. When specified
any disks listed are completely ignored and will not appear in
the raw file. Typically this switch is useful when you’re only
interested in recording a subset of disk statistics.
--rawnetfilt
This works just like --rawdskfilt except it applies to networks.
Unlike disk filtering which has an explicit default pattern, the
default for network filtering is to simply record all network
data from /proc/net/dev.
The -d4 switch also works here, as well as everywhere, to see
the raw data as it is being collected.
--rawnetignore
This is the opposite of the rawnetfilt switch and works just
like the rawdskignore switch. When specified any networks
listed are ignored and will not appear in the raw file. Typi-
cally this switch is useful when you’re only interested in
recording a subset of network statistics.
--rawtoo
Only available in conjunction with -P, this switch causes the
creation/logging of raw data in addition to plottable data.
While this may seem excessive, keep in mind that unlike plot-
table data, raw data can be played back with different switches
potentially providing more details. The overhead to write out
this additional data is minimal, the only real cost being that
of extra disk space.
-R, --runas uid[:gid]
This switch only works when running in daemon mode and so must
be specified in the DaemonCommands line. Its presence will
cause collectl to write the collectl.pid file into the same
directory as its other output files as specified by -f, since
/var/run does not normally grant non-privileged users write
access. Furthermore, the ownership of that directory must match
the specified ownership since collectl needs to write ALL it’s
files to that directory and can no longer assume global permis-
sions when run as root.
This WILL also require manually modifying /etc/init.d/collectl
to change the PIDFILE variable to point to the same directory
which the -f switch in the DaemonCommands line of collectl.conf
points to.
As a final note of caution, since this mechanism changes where
collectl reads/writes its pid file, once you start using
--runas, all calls to run collectl as a daemon must use it or it
may be confused and exhibit unpredictable behavior.
-R, --runtime duration
Specify the duration of data collection where the duration is a
number followed by one of wdhms, indicating how many weeks,
days, hours, minutes or seconds the collection is to be taken
for.
--sep separator
Specify the plot format separator - default is a space. If this
is a numeric field it is interpretted as the decimal value of
the associated ASCII character code. Otherwise it is interpret-
ted as the character itself. In other words, "--sep :" sets the
separator character to a colon and "--sep 9" sets it to a hori-
zontal tab. "--sep 58" would also set it to a colon.
--tworaw
The switches -G and --group have been replaced by --rawtoo,
which is more rescriptive of its function. When specified, it
tells collectl to treat process and slab data as an entirely
separate group of raw files, named with the extention "rawp".
These separate files can be played back and processed just like
any other collectl raw files and in fact one can even play back
both at the same time if that is what is desired. The only real
purpose of this switch is that on some systems with many pro-
cesses, it is possible to generate huge raw files (some have
been observerd to be >250MB!) and while collectl will happily
play back/process these files it can take a long time. By using
the --tworaw switch one still gets a huge rawp file, but the
normal raw file is a much more manageable size and as a result
will faster to process then when all data is combined into the
same file.
Playback Mode
In this mode, data is read from one or more data files that were gener-
ated in Record Mode
--export Filename
When playing back a file, use this switch to create an identical
raw file differing only in the timeframe being convered, so nat-
urally one must also include --from, --thru or both. Further,
since the resultant file will contain the exact same raw data
you cannot select a subset using -s. This switch is actually
intended for a support function for situations where somone is
having problems playing back a file and a subset of the original
raw file that covers the problem time has been requested, hope-
fully allowing a significantly file to be posted or emailed.
--extract filename
If specified, rather than actually play back the file specified
with -p, ALL raw data between the date ranges is selected and a
subset of that raw file created. The rules for how to interpret
the filename are the same as used for -f.
-f, --filename filename
If specified, this is the name of a file or directory to write
the output to (rather than the terminal). See the description
for details on the format of this field. This requires the -P
flag as well.
--from time range
Play back data starting with this time, which may optionally
include the ending time as well, which is of the format of
[date:]time[-[date:]time]. The leading 0 of the hour is
optional and if the seconds field is not specified is assumed to
be 0. If no dates specified the time(s) apply to each file
specified by -P. Otherwise the time(s) only apply to the
first/last dates and any files between those dates will have all
their data reported.
--full
Full mode is actually a superset of --verbose and if selected
will force --verbose. It will also force the RECORD separator
to be printed for every interval even if only a single subsystem
was requested and to include the actual subsystems that follow
following the utc timestamp as a parsing aid for those who may
wish to parse the text output rather than the plot data.
--offsettime seconds
This field originally was used before collectl reported the
timezone in the file headers and allowed one to compensate.
Since then it is rarely needed except in two possible cases, one
in which data on two systems is to be compared and they weren’t
synchonized with ntp. This allows all the times to be reported
as shifted by some number of seconds. The other case (and this
is very rare) is when a clock had changed in the middle of a
sample and will not be converted correctly. When this happens
one may have to play back the samples in pieces and manually set
the time offset.
--passwd filename
When reporting usernames associated with a UID, use this file
for the mapping. This is particularly important on systems run-
ning NIS where this are no user names in /etc/passwd.
-p, --playback Filename
Read data from the specified playback file(s), noting that one
can use wildcards in the filename if quoted (if playing back
multiple files to the terminal you probably want to include -m
to see the filenames as they are processed). The filename must
either end in raw or raw.gz. As an added feature, since people
sometimes automate the running of this option and don’t want to
hard code a date, you can specify the string YESTERDAY or TODAY
and they will be replaced in the filename string by the appro-
priate date.
--pname name
By default, collectl uses the file /var/run/collectl.pid to
indicate the pid of the running instance of collectl and prevent
multiple copies from being run. If you DO want to run a second
copy, this switch will cause collectl to change its process name
to collectl-name and use that name as the associated pid file as
well.
--procanalyze
When specified and there is process data in the raw file, a sum-
mary file will be generated with one entry unique process con-
taining such things as the total cpu consumed for both user and
system, min/max utilization of various memory types, total page
faults and several others.
--slabanalyze
When specified and there is slab data in the raw file, a summary
file will be generated with one entry unique slab containing
data on physical memory usage by that slab.
--thru time
Time thru which to play back a raw file. See --from for more
Common Switches - both record and playback modes
-d, --debug debug
Control the level of debugging information, not typically used.
For details see the source code.
-h, --help, -x, --helpext, -X, --helpall
Display standard, extended help message (which doesn’t include
the optional displays such as --showoptions, --showsubsys,
--showsubopts, --showtopopts) or everything.
--hr, --headerrepeat num
Sets the number of intervals to display data for before repeat-
ing the header. A value -1 will prevent any headers from being
displayed and a value of 0 will cause only a single header to be
displayed and never repeated.
--iosize
In brief mode, include iosize with disk, infiniband and network
data.
-l, --limits limit
Override one or more default exception limits. If more than one
limit they must be separated by hyphens. Current values are:
SVC:value
Report partition activity with Service times >= 30 msec
IOS:value
Report device activity with 10 or more reads or writes
per second
LusKBS:value
Report client or OSS activity greater than limit. Only
applies to Client Summary or OSS Detail reporting.
[default=100000]
LusReints:value
Report MDS activity with Reint greater than limit. Only
applies to MDS Summary reporting. [default=1000]
AND
Both the IOS and SCV limits must be reached before a
device is reported. This is the default value and is
only included for completeness.
OR
Report device activity if either IOS or SVC thresholds
are reached.
-L, --lustsvcs [c|m|o][:seconds]
This switch limits which servics lustre checks for and
the frequency of those checks. For more information see
the man page collectl-lustre.
-m, --messages
Write status to a monthly log file in the same directory as the
output file (requires -f to be specified as well). The name of
the file will be collectl-yyyymm.log and will track various mes-
sages that may get generated during every run of collectl.
-N, --nice
Set priority to a nicer one of 10.
-o, --options Options
These apply to the way output is displayed OR written to a plot
file. They do not effect the way data is selected for record-
ing. Most of these switches work in both record as well as
playback mode. If you’re not sure, just try it.
1
Data in plotting format should use 1 decimal point of
precision as appropriate.
2
Data in plotting format should use 2 decimal points of
precision as appropriate.
a
Always append data to an existing plot file. By default
if a plot file exists, the playback file will be skipped
as a way of assuring it is associated with a single
recorded file. This switch overrides that mechanism
allowing muliple recorded files to be processed and writ-
ten to a single plot file.
c
Always open newly named plot fies in create mode, over-
writing any old ones that may already exists. If one
processes multiple files for the same day in append mode
multiple times, the same data will be appended to the
same file mulitple times. This assures a new file is
created at the start of the processing.
d
For use with terminal output and brief mode. Preceed
each line with a date/time stamp, the date being in mm/dd
format. This option can also be applied to plot formatit
which will cause the date portion to also be displayed in
this format as opposed to D format.
D
For use with terminal output and brief mode. Preceed
each line with a date/time stamp, the date being in
yyyymmdd format.
g
For use with terminal output and brief mode. When dis-
playing values of 1G or greater there is limited preci-
sion for 1 digit values. This options provides a way to
display additional digits for more granularity by substi-
tuting a "g" for the decimal point rather than the trail-
ing "G".
G
For use with terminal output and brief mode. This is
similar to "g" but preserves the trailing "G" by sacri-
ficing a digit of granularity.
m
Whenever times are reported in plot format, in the normal
terminal reporting format at the bginning of each inter-
val or when when one of the time reporting options (d, D,
T or U is selected), append the milliseconds to the time.
n
Where appropriate, data such as disk KBs or transfers are
normalized to units per second by taking the change in a
counter and dividing by the number of seconds in that
interval. In the case of CPUs, utilization (calculated
in jiffies) is normalized as a percentage of the inter-
val.
Normalization can be disabled via this option, the result
being the reported values are not divided by the duration
of the interval. This can be particulary useful for
reporting values that are < 1/2 the sampling, which will
be rounded to 0.
T
For use with terminal output and brief mode, preceeds
each line with a time stamp.
u
Create plot files with unique names by include the start-
ing time of a colletion in the name. This forces multi-
ple collections taken the same day to be written to mul-
tiple files.
-U or --utc
In plot format only, report timestamps in Coordinated
Universal time which is more commonly know as UTC.
x
Report only exception records for selected subsystems.
Exception reporting also requires --verbose. Currently
this only applies to disk detail and Lustre server infor-
mation so one must select at least -s D, l or L for this
to apply. If writing to a detail file, this data will go
into a separate file with the extension X appended to the
regular detail file name.
X
Report both exceptions as well as all details for
selected subsystems, for -s D, l or L only.
z
If the compression library has been installed, all output
files will be compressed by default. This switch tells
collectl not to compress any plottable files. If col-
lectl tries to compress but cannot because the library
hasn’t been installed, it will generate a warning which
can be suppressed with this switch.
-P, --plot
Generate output in plot format. This format is space separated
data which consists of a header (prefaced with a # for easy
identification by an analysis program as well as identifying it
as a comment for programs, such as gnuplot, which honor that
convention). When written to disk, which is the typical way
this option is used, summary data elements are written to the
tab file and the detail elements written to one or more files,
one per detail subsystem. If -f is not specified, all output is
sent to the terminal. Output is always one line per sampling
interval.
--stats
This switch will cause brief data to be reported as both totals
and averages after processing one or more files for the same day
or in playback mode.
--statopts option(s)
This switch controls the way brief stats are reported, the
default is to report the totals once, at the end of a day’s
worth of raw files, if more than one.
a - include averages along with totals
i - include the interval data itself, which is the equivalent of
-oA
s - print summary stats at the end of each file processed even
if more than one per day
-s, --subsys subsystem
This field controls which subsystem data is to be collected or
played back. The default for collecting data is "cdn", which
stands for CPU, Disk and Network summary data and the default
for playback is everthing that was collected.
The rules for displaying results vary depending on the type of
data selected. If you write data for CPUs and DISKs to a raw
file and play it back with -sc, you will only see CPU data. If
you play it back with -scm you will still only see CPU data
since memory data was not collected. However, when used with
-P, collectl will always honor the subsystems specified with
this switch so in the previous example you will see CPU data
plus memory data of all 0s. To see the current set of default
subsystems, which are a subset of this full list, use -h.
You can also use + or - to add or subtract subsystems to/from
the default values. For example, "-s-cdn+N"< will remove cpu,
disk and network monitoring from the defaults while adding net-
work detail.
Refer to data definitions on the sourceforge website OR in
/usr/share/collectl/doc/collectl-xxx to see complete descrip-
tions of the data returned.
SUMMARY SUBSYSTEMS
b - buddy info (memory fragmentation)
c - CPU
d - Disk
f - NFS V3 Data
i - Inode and File System
j - Interrupts
l - Lustre
m - Memory
n - Networks
s - Sockets
t - TCP
x - Interconnect
y - Slabs (system object caches)
DETAIL SUBSYSTEMS
This is the set of detail data from which in most cases the cor-
responding summary data is derived. There are currently 2 types
that do not have corresponding summary data and those are "Envi-
ronmental" and "Process". So, if one has 3 disks and chooses
-sd, one will only see a single total taken across all 3 disks.
If one chooses -sD, individual disk totals will be reported but
no totals. Choosing -sdD will get you both.
C - CPU
D - Disk
E - Environmental data (fan, power, temp), via ipmitool
F - NFS Data
J - Interrupts
L - Lustre OST detail OR client Filesystem detail
M - Memory node data, which is also known as numa data
N - Networks
T - 65 TCP counters only available in plot format
X - Interconnect
Y - Slabs (system object caches)
Z - Processes
--showheader
In collectl mode this command will cause the header that is nor-
mally written to a data file to be displayed on the terminal and
collectl then exists. This can be a handy way to get a brief
overview of the system configuration.
--showoptions
This command shows only the portion of the help text that
desribes the -o and --options switches to save the time of wad-
ing through the entire help screen.
--showcolheaders
This command shows the first set of headers that will be printed
by collectl and exits. Doesn’t really make sense for multi-sec-
tion output like several sets of verbose or detail data. Also
note that since it requires one monitoring interval to build up
some headers which may be dynamic, it also forces the interval
to 0.
--showsubopts
List all the subsystem specifice options
--showtopopts
Show all the different values for the --top type field, which
specify the field(s) by to sort the data
--showrootslabs
This command only works on systems using the new slab allocator
and will list the root name (these are those entries in
/sys/slab which are not soft links) along with all its alias
names. If a name doesn’t have an alias, it will not appear in
this report.
--showslabaliases
This command only works on systems using the new slab allocator.
Like --showrootslabs, it will name a slab and all its aliases
but rather than show the root slab name it will show one of the
aliases to provide a more meaningful name. If there are any
slabs that only have a single (or no) alias they will not be
included in this report.
--showsubopts
Similar to --showoptions, this command summaries just the para-
maters associated with -O and --subopts.
--showsubsys
Yet another way to summare a portion of the help text, this com-
mand only shows valid subsystems.
--top [type][,num[,v]]
Include the top "num" consumers by resource for this interval.
The default number is the height of the window if it can be
determined otherwise 24, and the default resource is the total
cpu time which is taken as the sum of SysT and UsrT. See
--showtopopts for a list of other types of data you can sort on.
This switch can also be used with -s in which case a portion of
the window is reserved at the top to fill in the subsystem data,
which is currently in verbose mode though a brief format is con-
templated for some time in the future.
In interactive mode and if not specified, the process monitoring
interval will be set to that for other subsystems. The screen
will be cleared for each interval resulting in a display similar
to the "top" utility. In playback more the screen will NOT be
cleared. You cannot use this switch in "record" mode.
Finally, if v is specified as the 3rd parameter, the output
scrolls vertically (like playbak mode) rather than clearing the
screen between intervals.
--umask mask
Sets collectl’s umask to control output file permissions. Only
root can set the umask. See "man umask" for details.
--utime mask
Write periodic micro-timestamps into raw file at different
points in time for fine grained measurements of operation times.
1 - write timestamps when entering major sections
2 - write timestamps for all /proc accesses except for process
data
4 - write timestamps for /proc data for all processes including
threads
-v
Show version and whether or not Compression and/or HiResTime
modules have been installed and exit.
-V
Show default parmeter and control settings, all of which can be
changed in /etc/collectl.conf
--verbose
Display output in verbose mode. This often displays more data
than in the default mode. When displaying detail data, verbose
mode is forced. Furthermore, if summary data for a single sub-
system is to be displayed in verbose mode, the headers are only
repeated occasionally whereas if multiple subsystems are
involved each needs their own header.
-w
Disply data in wide mode. When displaying data on the terminal,
some data is formatted followed by a K, M or G as appropriate.
Selecting this switch will cause the full field to be displayed.
Note that there is no attempt to align data with the column
headings in this mode.
SUBSYSTEM OPTIONS
The following options are subsystem specific and typically filter data
for collection and/or display as well as affect the output format:
--cpufilt[^]perl-regx[,perl-regx...]
Works the same as dskfilt and netfilt, allows one to select a
subset of CPUs. These filters are also honored by interrupt
reporting as well.
--cpuopts
z - only applies to cpu details, do not report any CPUs with no
load. In other words all entries are zero except for IDLE.
--dskfilt [^]perl-regx[,perl-regx...]
NOTE - this does NOT effect data collection and ALL disk data
will always be collected, unless --rawdskfilt is specified too.
However, only data for disk names that match the pattern(s) will
be included in the summary totals and displayed when details are
requested. Alternatively, if you preface the first expression
with a caret, all names that match all strings will be excluded
from the summary totals and detail displays rather then
included. If you don’t know perl, a partial string will usually
work too.
--dskopts
f - report some columns as fractions for more precision on
detail output
i - display the i/o sizes in brief mode just like with --iosize
o - exclude unused disks from new file headers and plot data
z - only applies to disk details, do not report any lines with
values of all zeros.
--envopts Environmental Options
The default is to display ALL data but the following will cause
a subset to be displayed
f - display fan data
p - display current (power) data
t - display temperature data
C - convert temperature to Celcius if in Farenheit
F - convert temperature to Farenheit if in Celcius
M - display each type of data on separate line
T - display data truncated to whole integers (some implemena-
tions displayed them with fractional components)
9 - any number, will tell ipmitool to read on this device number
--envfilt regx If specified, this regx is evaluated against each line
of data returned by ipmitool and only those that match are retained.
All other data is lost.
--envremap perl-regx,...
If specified as a comma separated list of perl regular substitu-
tion expressions without the =~s portion, each expression is
applied to each environmental field name, thereby allowing one
to rename the column headers. This can be most useful when run-
ning on heterogeneuos systems and you want consistent column
names.
--intfilt [^]perl-regx[,perl-regx...]
NOTE - this does NOT effect data collection, ALL interrupt data
will always be collected. However, only data for interrupts
that match the pattern(s) will be included in the summary totals
and displayed when details are requested. Alternatively, if you
preface the first expression with a caret, all names that match
all strings will be excluded from the summary totals and detail
displays rather then included. If you don’t know perl, a par-
tial string will usually work too.
NOTE - these expressions are applied to the entire line one sees
in /proc/interrupts, including the interrupt number, name and
even counters so if you do want to include an interrupt number
in the pattern be sure to include the trailing colon as well.
--lustopts Lustre Options
B - For clients and servers, show buffer stats
D - For MDSs and OSTs AND running earlier versions of HPSFS,
collect disk block iostats
M - For clients, collect metadata
O - For OSTs, show detail level stats
R - For client, collect readahead stats
--memopts Memory Options
R - show memory values (including swap space) as rates of change
as opposed to absolute values. One can also show absolute
changes between intervals by including -on.
--netfilt [^]perl-regx[,perl-regx...]
NOTE - this does NOT effect data collection and ALL network data
will always be collected, unless --rawnetfilt is specified too.
Also note that by default only eth, ib, em and p1p networks when
present are included in the summary. When this switch is speci-
fied, only data for network names that match the pattern(s) will
be included in the summary and displayed when details are
requested. This switch therefore also gives you the ability to
add other, possibly new, network devices to the summary totals.
Alternatively, if you preface the first expression with a caret,
all names that match all strings will be excluded from the sum-
mary totals and detail displays rather then included. If you
don’t know perl, a partial string will usually work too.
--netopts
e - include network error counts in brief and explicit error
types elsewhere
E - only include lines with network errors in them
i - include i/o sizes in brief mode
o - exclude unused networks from new file headers and plot data
w - set width of network device name
--nfsfilt NFS Filters
Specify one or more comma separated filters as a C/S followed by
an nfs version number and only those will have data reported on.
For example, C2 says to report data on V2 Clients. As a data
collection performance optimization, if one or more client fil-
ters are specified, data will actually be collected for all
clients as is also done for servers.
--nfsopts NFS Options q.RS z - only display detail lines which have
data
--procfilt Process Filters
These filters restrict which processes are selected for collec-
tion/display. Using this filter will significanly reduce the
load on process data collection since collectl creates a black-
list of those existing processes that do not pass the filter and
so are permanently excluded from any future processing.
The format of a filter is a one charter type followed by a match
string. Multiple filters may be specified if separated by com-
mas.
c - substring of the command being executed as explicitly read
from /proc/pid/stat. Note that this can actually be a perl
expression, so if you want a command that ends in a particular
string all you need to is append a to the end of the string.
Otherwise it would match any commands containing that string.
C - any command that starts with the specified string
f - full path of the command, including arguments, as read from
/proc/pid/cmdline. Like the c modifier this too can be a perl
expression.
p - pid
P - parent pid
u - any process ownerd by this user’s UID or in the range speci-
fide by uxxx-yyy
U - any process owned by this username
caution: the process names collectl tries to match with c and C
is the second field in /proc/pid/stat which may not necessarily
be what you think! eg the name for X emacs is actually emacs-x
--procopts options
These options control the way data is displayed and can also
improve data collection performance
c - include CPU time of children who have exited (same as ps -S)
f - use cumulative totals for page faults in process data
instead of rates
i - show process I/O counters in display instead of default for-
mat
I - disable collection of I/O counters, see note below
k - remove known shells from process names, making it possible
to see actual command
m - show breakdown of memory utilization instead of default for-
mat
p - never look for new pids or threads during data collection
r - show root command name only (no directory) for narrower dis-
play
R - show ALL process priorities (’RT’ currently displayed if
realtime)
s - show process start time in hh:mm:ss format
S - show process start time in mmmdd-hh:mm:ss format
t - include ALL process threads (increases collection overhead)
u - report username as 12 chars instead of 8, noting uxx will
cause column width to be xx but cannot be less than 8
w - widen display by including whole argument string, with
optional max width
x - include extended process attributes (currently only for con-
text switches)
z - exclude any processes with 0 in sort field (in --top mode)
Process data is the most expensive type of data collected, cost-
ing as much as 3 times the CPU load as all other types of data
combined. Collecting thread data makes this even more expen-
sive. One can significantly reduce this load by over 25 percent
by disabling the collection of I/O stats. However, keep in mind
that even if you don’t try to optimize process data collection,
the overall system load by collectl can still be on the order of
about 0.2% when running as a daemon with default collection
rates. See the online documentation on measuring performance
for more information.
A security hole was identified that allowed non-priviledged
users to read /proc/pid/io and guess password lengths and noe
many distros retrict access to the owner or root. As a result,
non-priviledged users will see all 0 I/O counts for processes
that are not theirs when specifying --procopt i.
--slabfilt Slab Filters
One can specify a list of slab names separated by commas and
only those slabs whose names start with those strings will be
listed or summaried.
--slabopts Slab Options
s - exclude any slabs with an allocation of 0
S - only show those slabs whose allocations changed since last
display
--tcpfilt
These filters actually control both what is collected as well as
displayed. If one selects non-collected filters, 0s will be
reported. There is one special case and that is if one includes
T (tcp extended stats) in the filter string, there are no brief
ones and therefore --verbose will be forced.
i - ip stats
t - tcp stats
u - udp stats
c - icmp stats
I - ip extended stats
T - tcp excented stats
--xopts
i - include i/o sizes in brief mode
DESCRIPTION
The collectl utility is a system monitoring tool that records or dis-
plays specific operating system data for one or more sets of subsys-
tems. Any set of the subsystems, such as CPU, Disks, Memory or Sockets
can be included in or excluded from data collection. Data can either
be displayed back to the terminal, or stored in either a compressed or
uncompressed data file. The data files themselves can either be in raw
format (essentially a direct copy from the associated /proc structures)
or in a space separated plottable format such that it can be easily
plotted using tools such as gnuplot or excel. Data files can be read
and manipulated from the command line, or through use of command
scripts.
Upon startup, collectl.conf is read, which sets a number of default
parameters and switch values. Collectl searches for this file first in
/etc, then in the directory the collectl execuable lives in (typically
/usr/sbin) and finally the current directory. These locations can be
overriden with the -C switch. Unless you’re doing something really
special, this file need never be touched, the only exception perhaps
being when choosing to run collectl as a service and you wish to change
it’s default behavior which is set by the DaemonCommand entry.
RESTRICTIONS/PROBLEMS
Thread reporting currently only works with 2.6 kernels.
The pagesize has been hardcoded for perl 5.6 systems to 4096 for IA32
and 16384 for all others. If you are running 5.6 on a system with a
different pagesize you will see incorrect SLAB allocation sizes and
will need to scale the numbers you’re seeing accordingly.
I have recently discovered there is a bug in /proc in that an extra
line is occasionally read with the end of the previous buffer! When
this occurs a message is written (if -m enabled) and always written to
the terminal. Since this happens with a higher frequency with process
data I silently ignore those as the output can get pretty noisey. If
for any reason this is a problem, be sure to let me know.
Since collectl has no control over the frequency at which data gets
written to /proc, one can get anomolous statistics as collectl is only
reporting a snapshot of what is being recorded. For more information
see http://collectl.sourceforge.net/TheMath.html.
At least one network card occasionally generates erroneous network
stats and to try to keep the data rational, collectl tries to detect
this and when it does generates a message that bogus data has been
detected.
FILES, EXAMPLES AND MORE INFORMATION
http://collectl.sourceforge.net OR /opt/hp/collectl/docs
ACKNOWLEDGEMENTS
I would like to thank Rob Urban for his creation of the Tru64 Unix col-
lect tool, which collectl is based on.
AUTHOR
This program was written by Mark Seger (mjseger@gmail.com).
Copyright 2003-2011 Hewlett-Packard Development Company, LP
collectl may be copied only under the terms of either the Artistic
License or the GNU General Public License, which may be found in the
source kit
LOCAL APRIL 2003 COLLECTL(1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment