Skip to content

Instantly share code, notes, and snippets.

@hiroyuki-sato
Last active August 29, 2015 14:22
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hiroyuki-sato/54b3c8fbde291a1fc91a to your computer and use it in GitHub Desktop.
Save hiroyuki-sato/54b3c8fbde291a1fc91a to your computer and use it in GitHub Desktop.
embulk timezone test

元ネタ

元ネタ

環境

  • クライアント
    • OSX: 10.10.3
    • embulk: 0.6.11
    • embulk-output-postgresql: 0.3.0
  • サーバ
    • OS: CentOS7
    • PostgreSQL: 9.4.1

サマリ

  • a) PostgreSQLサーバ側のTimeZoneパラメータは何になっていますか?
    • Japanです
  • b)TimeZoneパラメータをUTCに変更してPostgreSQLを再起動すると、既に入れた日付型の値はどう見えますか?
    • Japan時、2015-06-01 1:00:00と表示された時刻はUTC変更して再起動しても2015-06-01 1:00:00のままです
  • c) PostgreSQLのTimeZoneパラメータをUTCにしてある状態で実行すると、新たに入れた値はどうなりますか?  * 2015-06-01 1:00:00 +09:00をインサートすると2015-06-01 1:00:00 +00:00になりました
  • d) LC_ALL=C と LC_TIME=C の2つの環境変数をセットした状態でembulkを実行するとどうなりますか?
    • c)と一緒です。

サンプルデータ

  • サンプルデータについて
  • Timezone=Japanでの動作について (a)の検証
  • TimezoneをUTCに変えた場合の動作について(b)の検証
  • TimezoneをUTCにして、insertした場合の動作について(c)の検証
  • LC_ALLとLC_TIMEを設定した場合の動作について

検証

サンプルデータについて

  • ISO8101形式のデータを用意
  • 標準出力にデータを出力し、UTCで9時間前になることを確認
% cat data/test.csv 
id,date_time,note
1,2015-06-01T01:00:00+09:00,test1
2,2015-06-01T02:00:00+09:00,test2
3,2015-06-01T03:00:00+09:00,test3
4,2015-06-01T04:00:00+09:00,test4
5,2015-06-01T05:00:00+09:00,test5
in:
  type: file
  path_prefix: /path/to/data/test
  parser:
    type: csv
    charset: UTF-8
    newline: CRLF
    delimiter: ','
    quote: ''
    escape: ''
    skip_header_lines: 1
    comment_line_marker: null
    allow_extra_columns: false
    allow_optional_columns: false
    columns:
    - {name: id, type: long}
    - {name: date_time, type: timestamp, format: '%Y-%m-%dT%H:%M:%S%z'}
    - {name: note, type: string}
exec: {}
out: 
  type: stdout

UTC変換の確認、2015-05-31になっており、日本の時刻として認識されている。

embulk run conf.yml 
2015-06-01 22:36:24.986 +0900: Embulk v0.6.11
2015-06-01 22:36:27.150 +0900 [INFO] (transaction): Listing local files at directory '/path/to/data' filtering filename by prefix 'test'
2015-06-01 22:36:27.160 +0900 [INFO] (transaction): Loading files [/path/to/data/test.csv]
2015-06-01 22:36:27.239 +0900 [INFO] (transaction): {done:  0 / 1, running: 0}
1,2015-05-31T16:00:00+0000,test1
2,2015-05-31T17:00:00+0000,test2
3,2015-05-31T18:00:00+0000,test3
4,2015-05-31T19:00:00+0000,test4
5,2015-05-31T20:00:00+0000,test5
2015-06-01 22:36:27.405 +0900 [INFO] (transaction): {done:  1 / 1, running: 0}
2015-06-01 22:36:27.443 +0900 [INFO] (main): Committed.
2015-06-01 22:36:27.444 +0900 [INFO] (main): Next config diff: {"in":{"last_path":"//path/to/data/test.csv"},"out":{}}

Timezone=Japanでの動作について (a)の検証

psql (9.4.1)
Type "help" for help.

postgres=# select now();
              now              
-------------------------------
 2015-06-01 22:37:53.581648+09
(1 row)
  • +09となっている。Timezoneは'Japan'

postgresql.conf

timezone = 'Japan'

この状態でデータを投入

in:
  type: file
  path_prefix: /path/to/data/test
  parser:
    type: csv
    charset: UTF-8
    newline: CRLF
    delimiter: ','
    quote: ''
    escape: ''
    skip_header_lines: 1
    comment_line_marker: null
    allow_extra_columns: false
    allow_optional_columns: false
    columns:
    - {name: id, type: long}
    - {name: date_time, type: timestamp, format: '%Y-%m-%dT%H:%M:%S%z'}
    - {name: note, type: string}
exec: {}
out: 
  type: postgresql
  host: "server"
  user: postgres
  password: password
  database: test
  table: time_test
  mode: insert

embulk run conf.yml

selectによる確認

hsato_test=# select * from time_test;
 id |      date_time      | note  
----+---------------------+-------
  1 | 2015-06-01 01:00:00 | test1
  2 | 2015-06-01 02:00:00 | test2
  3 | 2015-06-01 03:00:00 | test3
  4 | 2015-06-01 04:00:00 | test4
  5 | 2015-06-01 05:00:00 | test5
(5 rows)

Japanの時刻がそのまま入っているように見える。

TimezoneをUTCに変えた場合の動作について(b)の検証

timeozneを変更してpostgresqlを再起動

timezone='UTC'

psqlによる確認

hsato_test=# select now();
              now              
-------------------------------
 2015-06-01 13:44:24.137712+00
(1 row)
hsato_test=# select * from time_test;
 id |      date_time      | note  
----+---------------------+-------
  1 | 2015-06-01 01:00:00 | test1
  2 | 2015-06-01 02:00:00 | test2
  3 | 2015-06-01 03:00:00 | test3
  4 | 2015-06-01 04:00:00 | test4
  5 | 2015-06-01 05:00:00 | test5
(5 rows)

時刻はUTCの時刻は表示されず、Japanでインサートされた時刻と同じ時刻が表示された。

TimezoneをUTCにして、insertした場合の動作について(c)の検証

タイムゾーンはUTC

# select now();
             now              
------------------------------
 2015-06-01 14:09:11.29926+00
(1 row)
embulk run conf.yml
hsato_test=# select * from time_test;
 id |      date_time      | note  
----+---------------------+-------
  1 | 2015-06-01 01:00:00 | test1
  2 | 2015-06-01 02:00:00 | test2
  3 | 2015-06-01 03:00:00 | test3
  4 | 2015-06-01 04:00:00 | test4
  5 | 2015-06-01 05:00:00 | test5
(5 rows)

Japanの時刻が格納されている。

UTCなら次の値が格納されるはず

1,2015-05-31T16:00:00+0000,test1
2,2015-05-31T17:00:00+0000,test2
3,2015-05-31T18:00:00+0000,test3
4,2015-05-31T19:00:00+0000,test4
5,2015-05-31T20:00:00+0000,test5

LC_ALLとLC_TIMEを設定した場合の動作について

env LC_ALL=C LC_TIME=C ~/.embulk/bin/embulk run conf.yml
2015-06-01 23:11:18.551 +0900: Embulk v0.6.11
2015-06-01 23:11:20.754 +0900: Loaded plugin embulk-output-postgresql (0.3.0)
2015-06-01 23:11:20.815 +0900 [INFO] (transaction): Listing local files at directory '/Users/hsato/Work/embulk-test/timezone_test/data' filtering filename by prefix 'test'
2015-06-01 23:11:20.824 +0900 [INFO] (transaction): Loading files [/path/to/data/test.csv]
2015-06-01 23:11:20.941 +0900 [INFO] (transaction): Connecting to jdbcpostgresql://xxx.xxx.xxx.xx:5432/test options {user=postgres, tcpKeepAlive=true, loginTimeout=300, socketTimeout=1800}
2015-06-01 23:11:21.055 +0900 [INFO] (transaction): SQL: SET search_path TO "public"
2015-06-01 23:11:21.059 +0900 [INFO] (transaction): > 0.00 seconds
2015-06-01 23:11:21.060 +0900 [INFO] (transaction): Using insert mode
2015-06-01 23:11:21.098 +0900 [INFO] (transaction): SQL: DROP TABLE IF EXISTS "time_test_082634e240_bl_tmp000"
2015-06-01 23:11:21.102 +0900 [INFO] (transaction): > 0.00 seconds
2015-06-01 23:11:21.115 +0900 [INFO] (transaction): SQL: CREATE TABLE IF NOT EXISTS "time_test_082634e240_bl_tmp000" ("id" BIGINT, "date_time" TIMESTAMP, "note" TEXT)
2015-06-01 23:11:21.120 +0900 [INFO] (transaction): > 0.00 seconds
2015-06-01 23:11:21.124 +0900 [INFO] (transaction): SQL: CREATE TABLE IF NOT EXISTS "time_test" ("id" BIGINT, "date_time" TIMESTAMP, "note" TEXT)
2015-06-01 23:11:21.128 +0900 [INFO] (transaction): > 0.00 seconds
2015-06-01 23:11:21.167 +0900 [INFO] (transaction): {done:  0 / 1, running: 0}
2015-06-01 23:11:21.185 +0900 [INFO] (task-0000): Connecting to jdbc:postgresql://xxx.xxx.xxx.xx:5432/test options {user=postgres, tcpKeepAlive=true, loginTimeout=300, socketTimeout=1800}
2015-06-01 23:11:21.211 +0900 [INFO] (task-0000): SQL: SET search_path TO "public"
2015-06-01 23:11:21.214 +0900 [INFO] (task-0000): > 0.00 seconds
2015-06-01 23:11:21.215 +0900 [INFO] (task-0000): Copy SQL: COPY "time_test_082634e240_bl_tmp000" ("id", "date_time", "note") FROM STDIN
2015-06-01 23:11:21.358 +0900 [INFO] (task-0000): Loading 5 rows (150 bytes)
2015-06-01 23:11:21.416 +0900 [INFO] (task-0000): > 0.01 seconds (loaded 5 rows in total)
2015-06-01 23:11:21.422 +0900 [INFO] (transaction): {done:  1 / 1, running: 0}
2015-06-01 23:11:21.424 +0900 [INFO] (transaction): Connecting to jdbc:postgresql://xxx.xxx.xxx.xx:5432/test options {user=postgres, tcpKeepAlive=true, loginTimeout=300, socketTimeout=28800}
2015-06-01 23:11:21.438 +0900 [INFO] (transaction): SQL: SET search_path TO "public"
2015-06-01 23:11:21.441 +0900 [INFO] (transaction): > 0.00 seconds
2015-06-01 23:11:21.442 +0900 [INFO] (transaction): SQL: INSERT INTO "time_test" ("id", "date_time", "note") SELECT "id", "date_time", "note" FROM "time_test_082634e240_bl_tmp000"
2015-06-01 23:11:21.444 +0900 [INFO] (transaction): > 0.00 seconds (5 rows)
2015-06-01 23:11:21.471 +0900 [INFO] (transaction): Connecting to jdbc:postgresql://xxx.xxx.xxx.xx:5432/test options {user=postgres, tcpKeepAlive=true, loginTimeout=300, socketTimeout=1800}
2015-06-01 23:11:21.483 +0900 [INFO] (transaction): SQL: SET search_path TO "public"
2015-06-01 23:11:21.486 +0900 [INFO] (transaction): > 0.00 seconds
2015-06-01 23:11:21.486 +0900 [INFO] (transaction): SQL: DROP TABLE IF EXISTS "time_test_082634e240_bl_tmp000"
2015-06-01 23:11:21.490 +0900 [INFO] (transaction): > 0.00 seconds
2015-06-01 23:11:21.499 +0900 [INFO] (main): Committed.
2015-06-01 23:11:21.499 +0900 [INFO] (main): Next config diff: {"in":{"last_path":"/
path/to/data/test.csv"},"out":{}}
# select now();
              now              
-------------------------------
 2015-06-01 14:13:18.395341+00
(1 row)
# select * from time_test;
 id |      date_time      | note  
----+---------------------+-------
  1 | 2015-06-01 01:00:00 | test1
  2 | 2015-06-01 02:00:00 | test2
  3 | 2015-06-01 03:00:00 | test3
  4 | 2015-06-01 04:00:00 | test4
  5 | 2015-06-01 05:00:00 | test5
(5 rows)

Japanの時刻が格納されている。

LC_ALL未指定時の値

locale
LANG="en_US.utf-8"
LC_COLLATE="en_US.utf-8"
LC_CTYPE="en_US.utf-8"
LC_MESSAGES="en_US.utf-8"
LC_MONETARY="en_US.utf-8"
LC_NUMERIC="en_US.utf-8"
LC_TIME="en_US.utf-8"
LC_ALL=
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment