Skip to content

Instantly share code, notes, and snippets.

@Heray
Last active March 15, 2019 22:09
Show Gist options
  • Save Heray/3887bb8850022ef4d8991d13d538da73 to your computer and use it in GitHub Desktop.
Save Heray/3887bb8850022ef4d8991d13d538da73 to your computer and use it in GitHub Desktop.
Content Scoring
with threshold as (
SELECT percentile_cont(0.25) within group (order by CTR desc) as threshold25,
percentile_cont(0.50) within group (order by CTR desc) as threshold50,
percentile_cont(0.75) within group (order by CTR desc) as threshold75,
count(*) as total_items
FROM items
),
final as (
SELECT
items.id,
items.url,
ceiling(items.time_rank*5.0/(select total_items from threshold)) as time_bucket,
case
when items.clicks is null or items.clicks = 0 then 1
when items.clicks = 1 then 2
else ceiling(items.CTR_rank*5.0/(select total_items from threshold))
end as CTR_bucket,
case
when items.cat <> 'news' then 1
else GREATEST(1,3.5 - extract('day' from (getdate() - items.posted_at)))
end as fresh_bucket
FROM items
ORDER BY time_spent
)
SELECT
CTR_bucket * time_bucket * fresh_bucket as score, *
FROM final
ORDER BY score DESC, random()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment