Skip to content

Instantly share code, notes, and snippets.

@sh16ma
Last active January 19, 2022 10:09
Show Gist options
  • Save sh16ma/ebee5fff3f44451cbcebfa7d408a160f to your computer and use it in GitHub Desktop.
Save sh16ma/ebee5fff3f44451cbcebfa7d408a160f to your computer and use it in GitHub Desktop.
#🔎 #BigQuery #クロス集計 #DS100本ノック #SQL
#standaardSQL
with gender_era_amount as (
select
c.gender_cd as gender
, trunc(c.age/10) * 10 as era
, sum(r.amount) as amount
from `prj-test3.100knocks.customer` as c
join `prj-test3.100knocks.receipt` as r
using(customer_id)
group by
gender
, era
)
select
era
, max(case gender when 0 then amount else 0 end) as male
, max(case gender when 1 then amount else 0 end) as female
, max(case gender when 9 then amount else 0 end) as unknown
from gender_era_amount
group by era
order by era
;
@sh16ma
Copy link
Author

sh16ma commented Nov 17, 2021

| S-043 ★
レシート明細テーブル(receipt)と顧客テーブル(customer)を結合し、
性別 (gender)と年代(ageから計算)ごとに売上金額(amount)を合計した売上サマリテー ブル(sales_summary)を作成せよ。
性別は0が男性、1が女性、9が不明を表すものとする。
ただし、項目構成は年代、女性の売上金額、男性の売上金額、性別不明の売上金額の4項 目とすること(縦に年代、横に性別のクロス集計)。
また、年代は10歳ごとの階級とすること。

解 : クエリ完了(経過時間: 1.1 秒、処理されたバイト数: 3.1 MB)

@sh16ma
Copy link
Author

sh16ma commented Nov 17, 2021

① WITH句内クエリ分解

select
        c.gender_cd as gender
        , trunc(c.age/10) * 10 as era
        , sum(r.amount) as amount
from `prj-test3.100knocks.customer` as c
join `prj-test3.100knocks.receipt` as r
using(customer_id)
group by 
        gender
        , era
 order by era
limit 10
;

② WITH句直下クエリ分解

select 
    era 
    , case gender when 0 then amount else 0 end as male
    , case gender when 1 then amount else 0 end as female
    , case gender when 9 then amount else 0 end as unknown
from gender_era_amount
order by era
limit 10
;

最終的に「era」でグループ化させて、各属性の最大値を集計関数でまとめている。

select 
    era 
    , max(case gender when 0 then amount else 0 end) as male
    , max(case gender when 1 then amount else 0 end) as female
    , max(case gender when 9 then amount else 0 end) as unknown
from gender_era_amount
group by era
order by era
;

@sh16ma
Copy link
Author

sh16ma commented Nov 17, 2021

自身が記述した💩 コード

with
    tb as (
        select
            sales_ymd
            , gender
            , age
            , case  
                when age < 20 then "10's"
                when age between 20 and 29 then "20's"
                when age between 30 and 39 then "30's"
                when age between 40 and 49 then "40's"
                when age between 50 and 59 then "50's"
                when age between 60 and 69 then "60's"
                when age between 70 and 79 then "70's"
                else "Over80's"
             end as age_category
            , amount
        from
            `prj-test3.100knocks.receipt`
        left join
            `prj-test3.100knocks.customer`
            using(customer_id)
        where gender is not null
    )
select
    *
from
    (select
        age_category
        , sum(amount) as male
    from tb
    where gender = "男性"
    group by age_category)
    left join
    (select
        age_category
        , sum(amount) as female
    from tb
    where gender = "女性"
    group by age_category) using(age_category)
    left join
    (select
        age_category
        , sum(amount) as unknown
    from tb
    where gender = "不明"
    group by age_category) using(age_category)
order by age_category
;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment