Skip to content

Instantly share code, notes, and snippets.

2021/10/28(木) 合成データの社会動向
公的統計における合成データ
 合成データ...何か統計量/統計モデル等により、擬似的に属性値が生成された、ミクロレベルのデータ
 欧米では...一般公開型ミクロデータPUFのように、合成データが利用されている
  欠測値補完、シミュレーション
手法
 ミクロデータを元に作成する方法
We can't make this file beautiful and searchable because it's too large.
39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K
37,Private,284582,Masters,14,Married-civ-spouse,Exec-managerial,Wife,White,Female,0,0,40,United-States,<=50K
49,Private,160187,9th,5,Married-spouse-absent,Other-service,Not-in-family,Black,Female,0,0,16,Jamaica,<=50K
52,Self-emp-not-inc,209642,HS-grad,9,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,45,United-States,>50K
31,Private,45781,Masters,14,Never-married,Prof-specialty,Not-in-family,White,Female,14084,0,50,United-States,>50K
42,Private,159449,Bachelor
New York City 40.72 74.00
Los Angeles 34.05 118.25
Chicago 41.88 87.63
Houston 29.77 95.38
Phoenix 33.45 112.07
Philadelphia 39.95 75.17
San Antonio 29.53 98.47
Dallas 32.78 96.80
San Diego 32.78 117.15
San Jose 37.30 121.87
@gghatano
gghatano / attack.bash
Last active September 13, 2021 00:12
PWSCUP2021 サンプルスクリプトで攻撃する
# summary:
## create E files for pwscup2021 attack phase by using "rlink.py"
# preparation:
## put directories "pre_anony_d" and "pre_attack" at the same directory as "attack.bash"
## get "rlink.py"
# parameters:
AM0001 Male 62.0 White Graduate Married 27.8 0 0 0 0 Q2 1
AM0002 Male 53.0 White HighSchool Divorced 30.8 0 1 0 0 Q1 0
AM0003 Male 78.0 White HighSchool Married 28.8 0 0 0 0 Q3 1
AM0004 Female 56.0 White Graduate Parther 42.4 1 0 0 0 Q3 0
AM0005 Female 42.0 Black College Divorced 20.3 1 0 0 0 Q4 0
AM0006 Female 72.0 Mexican 11th Separated 28.6 0 0 0 0 Q1 0
AM0007 Male 56.0 Black HighSchool Divorced 33.6 0 0 0 0 Q3 1
AM0008 Male 46.0 White Graduate Parther 27.6 0 0 0 0 Q3 0
AM0009 Male 45.0 Other 11th Never 24.1 0 0 0 0 Q3 0
AM0010 Female 30.0 Hispanic College Parther 26.6 0 0 0 0 Q4 0
@gghatano
gghatano / B.csv
Last active September 9, 2021 03:25
PWSCUP2021 NHANESデータ
Male 62.0 White Graduate Married 27.8 0 0 0 0 Q2 1
Male 53.0 White HighSchool Divorced 30.8 0 1 0 0 Q1 0
Male 78.0 White HighSchool Married 28.8 0 0 0 0 Q3 1
Female 56.0 White Graduate Parther 42.4 1 0 0 0 Q3 0
Female 42.0 Black College Divorced 20.3 1 0 0 0 Q4 0
Female 72.0 Mexican 11th Separated 28.6 0 0 0 0 Q1 0
Male 56.0 Black HighSchool Divorced 33.6 0 0 0 0 Q3 1
Male 46.0 White Graduate Parther 27.6 0 0 0 0 Q3 0
Male 45.0 Other 11th Never 24.1 0 0 0 0 Q3 0
Female 30.0 Hispanic College Parther 26.6 0 0 0 0 Q4 0
FROM centos:7
ENV PYTHONPATH "/opt/python/library"
ENV LANG en_US.utf8
LABEL maintainer="PWSCUP_ADMIN (Twitter: @PWScup_Admin)"
ARG version="3.7.3"
COPY ./jupyter_notebook_config.py /tmp/jupyter_notebook_config.py
@gghatano
gghatano / T.csv
Created July 13, 2021 05:13
PWSCUP2018 final T data
We can't make this file beautiful and searchable because it's too large.
12583,2010/12/1,22728,3.75,24
12583,2010/12/1,22727,3.75,24
12583,2010/12/1,22726,3.75,12
12583,2010/12/1,21724,0.85,12
12583,2010/12/1,21883,0.65,24
12583,2010/12/1,10002,0.85,48
12583,2010/12/1,21791,1.25,24
12583,2010/12/1,21035,2.95,18
12583,2010/12/1,22326,2.95,24
12583,2010/12/1,22629,1.95,24
@gghatano
gghatano / customer_master.csv
Created June 19, 2021 07:32
python100knock_chapter_1
We can't make this file beautiful and searchable because it's too large.
customer_id,customer_name,registration_date,customer_name_kana,email,gender,age,birth,pref
IK152942,平田 裕次郎,2019-01-01 00:25:33,ひらた ゆうじろう,hirata_yuujirou@example.com,M,29,1990/6/10,石川県
TS808488,田村 詩織,2019-01-01 01:13:45,たむら しおり,tamura_shiori@example.com,F,33,1986/5/20,東京都
AS834628,久野 由樹,2019-01-01 02:00:14,ひさの ゆき,hisano_yuki@example.com,F,63,1956/1/2,茨城県
AS345469,鶴岡 薫,2019-01-01 04:48:22,つるおか かおる,tsuruoka_kaoru@example.com,M,74,1945/3/25,東京都
GD892565,大内 高史,2019-01-01 04:54:51,おおうち たかし,oouchi_takashi@example.com,M,54,1965/8/5,千葉県
AS265381,笠井 洋介,2019-01-01 05:51:07,かさい ようすけ,kasai_yousuke@example.com,M,69,1949/8/9,岡山県
HD739338,橋口 将也,2019-01-01 05:51:08,はしぐち まさや,hashiguchi_masaya@example.com,M,45,1974/6/4,神奈川県
HI791416,細井 麻由子,2019-01-01 07:03:53,ほそい まゆこ,hosoi_mayuko@example.com,F,30,1989/7/25,三重県
HD819739,塩見 はるか,2019-01-01 08:17:23,しおみ はるか,shiomi_haruka@example.com,F,49,1969/10/8,神奈川県
@gghatano
gghatano / apple_location.csv
Created April 7, 2021 10:04
Appleの移動傾向データ
We can't make this file beautiful and searchable because it's too large.
geo_type,region,transportation_type,alternative_name,sub-region,country,2020-01-13,2020-01-14,2020-01-15,2020-01-16,2020-01-17,2020-01-18,2020-01-19,2020-01-20,2020-01-21,2020-01-22,2020-01-23,2020-01-24,2020-01-25,2020-01-26,2020-01-27,2020-01-28,2020-01-29,2020-01-30,2020-01-31,2020-02-01,2020-02-02,2020-02-03,2020-02-04,2020-02-05,2020-02-06,2020-02-07,2020-02-08,2020-02-09,2020-02-10,2020-02-11,2020-02-12,2020-02-13,2020-02-14,2020-02-15,2020-02-16,2020-02-17,2020-02-18,2020-02-19,2020-02-20,2020-02-21,2020-02-22,2020-02-23,2020-02-24,2020-02-25,2020-02-26,2020-02-27,2020-02-28,2020-02-29,2020-03-01,2020-03-02,2020-03-03,2020-03-04,2020-03-05,2020-03-06,2020-03-07,2020-03-08,2020-03-09,2020-03-10,2020-03-11,2020-03-12,2020-03-13,2020-03-14,2020-03-15,2020-03-16,2020-03-17,2020-03-18,2020-03-19,2020-03-20,2020-03-21,2020-03-22,2020-03-23,2020-03-24,2020-03-25,2020-03-26,2020-03-27,2020-03-28,2020-03-29,2020-03-30,2020-03-31,2020-04-01,2020-04-02,2020-04-03,2020-04-04,2020-04-05,2020-04-06,2020-04-07,2020-0