Skip to content

Instantly share code, notes, and snippets.

✅ 1. Import input.csv into any SQL database

✅ 2. Check transformations.xls schema description. Left columns contains description of input data. Right columns contains description of expected output data.

✅ 3. Provide SQL query, which will read input data, make requested transformations and show calculated output.

I have chosen BigQuery as my database of choice. Fields formatted '12/2/2016' are NOT accepted by the database as DATE data type field as suggested in the instruction. I used workaround and table with instructed Input Feed Schema and data types has been placed in staging folder: models/staging/stg_customer_details.sql

Create GCP bucket and copy input.csv file there:

@bogumilo
bogumilo / transactions-repeated-within-5-minutes-cnt.sql.sql
Created April 13, 2023 18:57
Count potentially erroneous repeated payments (same marketplace_merchant_id, credit_card_id, amount) that occurred within 5 minutes from each other.
with
w_prev_transaction as (
select
*,
lag(transaction_ts) over( partition by marketplace_merchant_id, credit_card_id, amount
order by transaction_ts
) as prev_transaction_ts
from transactions
)
select
@bogumilo
bogumilo / all-combinations-of-four-car-features.sql
Created April 13, 2023 18:41
All combinations of four car features and corresponding cost of resulting car equipment versions.
select
concat(p1.accessory_name,',', p2.accessory_name,',',p3.accessory_name,',',p4.accessory_name) as car_equipment_version,
p1.feature_cost + p2.feature_cost + p3.feature_cost + p4.feature_cost as total_cost
from car_features p1
inner join car_features p2
on p1.accessory_name < p2.accessory_name
inner join car_features p3
on p2.accessory_name < p3.accessory_name
inner join car_features p4
on p3.accessory_name < p4.accessory_name
@bogumilo
bogumilo / user-pct-with-3-days-time-to-conversion.sql
Created March 9, 2023 14:18
Share of users who make a purchase within 3 days after signing up (Time To Conversion)
with
w_first_purchase as (
select
u.user_id,
u.signup_date,
first(o.order_date) over (
partition by o.user_id order by o.order_date
) as first_order_date
from users u
join orders o using ( user_id )
@bogumilo
bogumilo / called-customer-service-at-least-3-times.sql
Created January 20, 2023 11:31
Count customers who called support at least 3 times
with
calls_by_customer as (
select customer_id, count(call_id) as calls_cnt from callers group by 1
)
select count(*) as customer_count
from calls_by_customer
where calls_cnt > 2
@bogumilo
bogumilo / non-categorised-calls-percentage.sql
Last active January 20, 2023 11:27
Percentage of non-categorised customer support calls
with
non_categorised as (
select
*,
case
when call_category in ('payments', 'deliveries', 'orders', 'website')
then 0
else 1
end as na_flag
from callers
@bogumilo
bogumilo / top3-writers-from-bestsellers-list.sql
Last active January 20, 2023 11:16
Outputs top 3 writers based on number of their books appearance on the besteseller lists. It uses continuous rank therefore output can give more than 3 writers if their rank ex aequo
with
count_top10_by_author as (
select
author_name, row_number() over (partition by author_id) as appearance_count
from global_book_rank
left join books using (book_id)
left join authors using (author_id)
where global_book_rank.rank < 11
),
top_authors_appearance as (
@bogumilo
bogumilo / mean-mode-median.sql
Created January 20, 2023 10:50
Statistics in PostgreSQL: mean, mode and median for e.g. customer service department and their received calls workload
select
round(avg(calls_count), 0) as mean,
mode() within group (order by calls_count) as mode,
percentile_cont(0.5) within group (order by calls_count) as median
from customer_service_stats
@bogumilo
bogumilo / recalled-customer-service-within-3-days.sql
Last active January 20, 2023 10:41
Count customers that called again within 3 days from their last call to customer service
with
with_previous_call as (
select
c.*,
lag(call_received) over (
partition by customer_id order by call_received
) as previous_call
from callers c
),
callers_flag as (
@bogumilo
bogumilo / median-number-of-items.sql
Created January 20, 2023 10:30
Median number of items using generate_series()
with
series as (
select item_count
from items_per_order
group by item_count, generate_series(1, order_occurrences)
)
select
round(percentile_cont(0.5) within group (order by item_count)::decimal, 1) as median
from series