Skip to content

Instantly share code, notes, and snippets.

@Julio-Guerra
Last active October 5, 2015 08:38
Show Gist options
  • Save Julio-Guerra/2780181 to your computer and use it in GitHub Desktop.
Save Julio-Guerra/2780181 to your computer and use it in GitHub Desktop.
TSQL Pareto's Distribution
-- Get the distribution of activity percentage among resources (Pareto's curve point)
-- This query uses abstract tables :
-- Activity(activity_id, resource_id) representing a sale, or a meeting, or a call...
-- Resource(resource_id) representing the resource used to make activities (a rep, or a product...)
SELECT
groups.resource_percentage,
sum(customer_calls_percentage.activity_percentage) AS activity_percentage
FROM
(
-- Get the group number and the percentage of ressources it contains in a single query
SELECT
row_number() OVER (ORDER BY count(Activity.activity_id) DESC) AS i,
row_number() OVER (ORDER BY count(Activity.activity_id) DESC) / cast(count(*) OVER() AS float) AS resource_percentage
FROM
Activity
JOIN
Resource ON Resource.activity_id = Activity.activity_id
GROUP BY
resource_id
) groups
JOIN
(
-- Get the percentage of activity per resource
SELECT
row_number() OVER (ORDER BY count(Activity.activity_id) DESC) AS resource,
cast(count(Activity.activity_id) AS float) / sum(count(Activity.activity_id)) OVER () AS activity_percentage
FROM
Activity
JOIN
Resource ON Resource.resource_id = Activity.activity_id
GROUP BY
resource_id
) resource_activity_percentage ON resource_activity_percentage.resource <= groups.i
GROUP BY
groups.resource_percentage
-- more details on http://stackoverflow.com/q/4515525/303726
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment