Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Example Redshift Query
SELECT *
FROM
(SELECT
month,
amount,
pledge_count,
SUM(1) OVER(PARTITION BY month ORDER BY pledge_count DESC ROWS UNBOUNDED PRECEDING) as row
FROM
(SELECT
TO_CHAR(CONVERT_TIMEZONE('UTC', 'America/New_York', backings.pledged_at), 'YYYY-MM-01') as month,
backings.amount as amount,
COUNT(DISTINCT backings.id) AS pledge_count
FROM
backings
GROUP BY month, backings.amount) AS backings_per_month
ORDER BY month)
WHERE row <= 10;
@hadley

This comment has been minimized.

Copy link

commented Nov 20, 2013

Goal of dplyr is to make it so you could express that in R with code that looks like this:

backings <- tbl(my_redshift, "backing")

backings_per_month <- backings %.% 
  mutate(month = TO_CHAR(CONVERT_TIMEZONE('UTC', 'America/New_York', pledged_at), 'YYYY-MM-01')) %.% 
  group_by(backings, month) %.% 
  summarise(pledge_count = n())

top10 <- backings_per_month() %.% 
  mutate(row = rownum()) %.%
  arrange(month) %.%
  filter(row <= 10)

For redshift, dplyr could know that rownum() doesn't exist and it should use windowed sum instead.

x %.% f(...) is a shortcut for f(x, ...) that makes it easier to read from left to right

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.