Skip to content

Instantly share code, notes, and snippets.

@robcowie
Created May 4, 2020 13:08
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save robcowie/19cb478c1ff08b350fdc40ecf1bad108 to your computer and use it in GitHub Desktop.
Save robcowie/19cb478c1ff08b350fdc40ecf1bad108 to your computer and use it in GitHub Desktop.
Airflow Task & DAG Duration Queries
-- duration of task instances
select
date(execution_date) as dt,
avg(duration) as avg_duration,
min(duration) as min_duration,
max(duration) as max_duration
from
task_instance
group by
date(execution_date)
order by
date(execution_date) desc
limit 100;
-- Total completed tasks
SELECT
COUNT(1)
FROM
task_instance
WHERE
state IS NOT NULL
AND
state NOT IN ('scheduled', 'queued');
-- avg duration per dag, matched by dag_id, with outlier exclusions
select min(duration) / 60, max(duration) / 60, avg(duration) / 60 FROM (select TIMESTAMPDIFF(SECOND, start_date, end_date) as duration from dag_run where execution_date >= '2020-01-01 00:00:00' and dag_id not like '%') a WHERE duration < 7200 and duration > 20
-- Avg num dag runs per day
select avg(ct) from (select date(execution_date), count(*) as ct from dag_run where execution_date >= '2020-01-01 00:00:00' and dag_id not like '%hourly%' and dag_id not like 'onboarding%' group by date(execution_date)) a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment