Skip to content

Instantly share code, notes, and snippets.

@vrajat
Created January 31, 2020 04:25
Show Gist options
  • Save vrajat/d7cce856d70d73594dcb361425510346 to your computer and use it in GitHub Desktop.
Save vrajat/d7cce856d70d73594dcb361425510346 to your computer and use it in GitHub Desktop.
Catalog Example for Auto Tuning Newsletter
CREATE TABLE yarn_app_result (
id VARCHAR(50) NOT NULL COMMENT 'The application id, e.g., application_1236543456321_1234567',
name VARCHAR(100) NOT NULL COMMENT 'The application name',
username VARCHAR(50) NOT NULL COMMENT 'The user who started the application',
queue_name VARCHAR(50) DEFAULT NULL COMMENT 'The queue the application was submitted to',
start_time BIGINT UNSIGNED NOT NULL COMMENT 'The time in which application started',
finish_time BIGINT UNSIGNED NOT NULL COMMENT 'The time in which application finished',
tracking_url VARCHAR(255) NOT NULL COMMENT 'The web URL that can be used to track the application',
job_type VARCHAR(20) NOT NULL COMMENT 'The Job Type e.g, Pig, Hive, Spark, HadoopJava',
severity TINYINT(2) UNSIGNED NOT NULL COMMENT 'Aggregate severity of all the heuristics. Ranges from 0(LOW) to 4(CRITICAL)',
score MEDIUMINT(9) UNSIGNED DEFAULT 0 COMMENT 'The application score which is the sum of heuristic scores',
workflow_depth TINYINT(2) UNSIGNED DEFAULT 0 COMMENT 'The application depth in the scheduled flow. Depth starts from 0',
scheduler VARCHAR(20) DEFAULT NULL COMMENT 'The scheduler which triggered the application',
job_name VARCHAR(255) NOT NULL DEFAULT '' COMMENT 'The name of the job in the flow to which this app belongs',
job_exec_id VARCHAR(800) NOT NULL DEFAULT '' COMMENT 'A unique reference to a specific execution of the job/action(job in the workflow). This should filter all applications (mapreduce/spark) triggered by the job for a particular execution.',
flow_exec_id VARCHAR(255) NOT NULL DEFAULT '' COMMENT 'A unique reference to a specific flow execution. This should filter all applications fired by a particular flow execution. Note that if the scheduler supports sub-workflows, then this ID should be the super parent flow execution id that triggered the the applications and sub-workflows.',
job_def_id VARCHAR(800) NOT NULL DEFAULT '' COMMENT 'A unique reference to the job in the entire flow independent of the execution. This should filter all the applications(mapreduce/spark) triggered by the job for all the historic executions of that job.',
flow_def_id VARCHAR(800) NOT NULL DEFAULT '' COMMENT 'A unique reference to the entire flow independent of any execution. This should filter all the historic mr jobs belonging to the flow. Note that if your scheduler supports sub-workflows, then this ID should reference the super parent flow that triggered the all the jobs and sub-workflows.',
job_exec_url VARCHAR(800) NOT NULL DEFAULT '' COMMENT 'A url to the job execution on the scheduler',
flow_exec_url VARCHAR(800) NOT NULL DEFAULT '' COMMENT 'A url to the flow execution on the scheduler',
job_def_url VARCHAR(800) NOT NULL DEFAULT '' COMMENT 'A url to the job definition on the scheduler',
flow_def_url VARCHAR(800) NOT NULL DEFAULT '' COMMENT 'A url to the flow definition on the scheduler',
PRIMARY KEY (id)
);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment