Created
June 18, 2015 20:21
-
-
Save Airistotal/fe1fbc8f9561b13d97c7 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?xml version="1.0"?> | |
<!-- If job_metrics.xml exists, this file will define the default job metric | |
plugin used for all jobs. Individual job_conf.xml destinations can | |
disable metric collection by setting metrics="off" on that destination. | |
The metrics attribute on destination definition elements can also be | |
a path - in which case that XML metrics file will be loaded and used for | |
that destination. Finally, the destination element may contain a job_metrics | |
child element (with all options defined below) to define job metrics in an | |
embedded manner directly in the job_conf.xml file. | |
--> | |
<job_metrics> | |
<!-- Each element in this file corresponds to a job instrumentation plugin | |
used to generate metrics in lib/galaxy/jobs/metrics/instrumenters. --> | |
<!-- Core plugin captures Galaxy slots, start and end of job (in seconds | |
since epoch) and computes runtime in seconds. --> | |
<core /> | |
<!-- Uncomment to dump processor count for each job - linux only. --> | |
<cpuinfo /> | |
<!-- Uncomment to dump information about all processors for for each | |
job - this is likely too much data. Linux only. --> | |
<!-- <cpuinfo verbose="true" /> --> | |
<!-- Uncomment to dump system memory information for each job - linux | |
only. --> | |
<meminfo /> | |
<!-- Uncomment to record operating system each job is executed on - linux | |
only. --> | |
<!-- <uname /> --> | |
<!-- Uncomment following to enable plugin dumping complete environment | |
for each job, potentially useful for debuging --> | |
<!-- <env /> --> | |
<!-- env plugin can also record more targetted, obviously useful variables | |
as well. --> | |
<!-- <env variables="HOSTNAME,SLURM_CPUS_ON_NODE,SLURM_JOBID" /> --> | |
<collectl /> | |
<!-- Collectl (http://collectl.sourceforge.net/) is a powerful monitoring | |
utility capable of gathering numerous system and process level | |
statistics of running applications. The Galaxy collectl job metrics | |
plugin by default will grab a variety of process level metrics | |
aggregated across all processes corresponding to a job, this behavior | |
is highly customiziable - both using the attributes documented below | |
or simply hacking up the code in lib/galaxy/jobs/metrics. | |
Warning: In order to use this plugin collectl must be available on the | |
compute server the job runs on and on the local Galaxy server as well | |
(unless in this latter case summarize_process_data is set to False). | |
Attributes (the follow describes attributes that can be used with | |
the collectl job metrics element above to modify its behavior). | |
'summarize_process_data': Boolean indicating whether to run collectl | |
in playback mode after jobs complete and gather process level | |
statistics for the job run. These statistics can be customized | |
with the 'process_statistics' attribute. (defaults to True) | |
'saved_logs_path': If set (it is off by default), all collectl logs | |
will be saved to the specified path after jobs complete. These | |
logs can later be replayed using collectl offline to generate | |
full time-series data corresponding to a job run. | |
'subsystems': Comma separated list of collectl subystems to collect | |
data for. Plugin doesn't currently expose all of them or offer | |
summary data for any of them except 'process' but extensions | |
would be welcome. May seem pointless to include subsystems | |
beside process since they won't be processed online by Galaxy - | |
but if 'saved_logs_path' these files can be played back at anytime. | |
Available subsystems - 'process', 'cpu', 'memory', 'network', | |
'disk', 'network'. (Default 'process'). | |
Warning: If you override this - be sure to include 'process' | |
unless 'summarize_process_data' is set to false. | |
'process_statistics': If 'summarize_process_data' this attribute can be | |
specified as a comma separated list to override the statistics | |
that are gathered. Each statistics is of the for X_Y where X | |
if one of 'min', 'max', 'count', 'avg', or 'sum' and Y is a | |
value from 'S', 'VmSize', 'VmLck', 'VmRSS', 'VmData', 'VmStk', | |
'VmExe', 'VmLib', 'CPU', 'SysT', 'UsrT', 'PCT', 'AccumT' 'WKB', | |
'RKBC', 'WKBC', 'RSYS', 'WSYS', 'CNCL', 'MajF', 'MinF'. Consult | |
lib/galaxy/jobs/metrics/collectl/processes.py for more details | |
on what each of these resource types means. | |
Defaults to 'max_VmSize,avg_VmSize,max_VmRSS,avg_VmRSS,sum_SysT,sum_UsrT,max_PCT avg_PCT,max_AccumT,sum_RSYS,sum_WSYS' | |
as variety of statistics roughly describing CPU and memory | |
usage of the program and VERY ROUGHLY describing I/O consumption. | |
'procfilt_on': By default Galaxy will tell collectl to only collect | |
'process' level data for the current user (as identified) | |
by 'username' (default) - this can be disabled by settting this | |
to 'none' - the plugin will still only aggregate process level | |
statistics for the jobs process tree - but the additional | |
information can still be used offline with 'saved_logs_path' | |
if set. Obsecurely, this can also be set 'uid' to identify | |
the current user to filter on by UID instead of username - | |
this may needed on some clusters(?). | |
'interval': The time (in seconds) between data collection points. | |
Collectl uses a variety of different defaults for different | |
subsystems if this is not set, but process information (likely | |
the most pertinent for Galaxy jobs will collect data every | |
60 seconds). | |
'flush': Interval (in seconds I think) between when collectl will | |
flush its buffer to disk. Galaxy overrides this to disable | |
flushing by default if not set. | |
'local_collectl_path', 'remote_collectl_path', 'collectl_path': | |
By default, jobs will just assume collectl is on the PATH, but | |
it can be overridden with 'local_collectl_path' and | |
'remote_collectl_path' (or simply 'collectl_path' if it is not | |
on the path but installed in the same location both locally and | |
remotely). | |
There are more and more increasingly obsecure options including - | |
log_collectl_program_output, interval2, and interval3. Consult | |
source code for more details. | |
--> | |
</job_metrics> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment