Skip to content

Instantly share code, notes, and snippets.

@bizenn
Last active October 6, 2016 16:24
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bizenn/ef55e2876db9abda405e20490c8ec828 to your computer and use it in GitHub Desktop.
Save bizenn/ef55e2876db9abda405e20490c8ec828 to your computer and use it in GitHub Desktop.
embulk-filter-timestamp_formatでは、EPOCHより前の日付は正しく扱えない、という話

data/data.txt

1,Alice,Cooper,1947-02-04 11:30:15,0
2,Bob,Brown,1947-12-27 22:01:24.123,0
3,Chris,Squire,1948-03-04 01:02:03.456,0

data.yml

in:
  type: file
  path_prefix: ./data/data
  parser:
    type: csv
    delimiter: ','
    quote: null
    newline: LF
    charset: UTF-8
    columns:
      - {name: id, type: long}
      - {name: first_name, type: string}
      - {name: last_name, type: string}
      - {name: birth, type: string}
      - {name: dummy, type: long}
filters:
  - type: timestamp_format
    default_from_timestamp_format: [ '%Y-%m-%d %H:%M:%S.%N', '%Y-%m-%d %H:%M:%S' ]
    default_to_timezone: 'UTC'
    default_to_timestamp_format: '%Y-%m-%d %H:%M:%S.%N%Z'
    columns:
      - {name: birth, type: timestamp}
out:
  type: stdout
% embulk preview data.yml
2016-10-07 00:00:18.022 +0900: Embulk v0.8.13
2016-10-07 00:00:19.112 +0900 [INFO] (0001:preview): Loaded plugin embulk-filter-timestamp_format (0.2.1)
2016-10-07 00:00:19.125 +0900 [INFO] (0001:preview): Listing local files at directory 'data' filtering filename by prefix 'data'
2016-10-07 00:00:19.127 +0900 [INFO] (0001:preview): Loading files [data/data.txt, data/data.txt~]
+---------+-------------------+------------------+-----------------------------+------------+
| id:long | first_name:string | last_name:string |             birth:timestamp | dummy:long |
+---------+-------------------+------------------+-----------------------------+------------+
|       1 |             Alice |           Cooper |     1947-02-04 11:30:15 UTC |          0 |
|       2 |               Bob |            Brown | 1947-12-27 22:01:25.123 UTC |          0 |
|       3 |             Chris |           Squire | 1948-03-04 01:02:04.456 UTC |          0 |
+---------+-------------------+------------------+-----------------------------+------------+

おお、うまくいきそうだ。

% embulk run data.yml
2016-10-07 00:03:24.436 +0900: Embulk v0.8.13
2016-10-07 00:03:25.746 +0900 [INFO] (0001:transaction): Loaded plugin embulk-filter-timestamp_format (0.2.1)
2016-10-07 00:03:25.762 +0900 [INFO] (0001:transaction): Listing local files at directory 'data' filtering filename by prefix 'data'
2016-10-07 00:03:25.764 +0900 [INFO] (0001:transaction): Loading files [data/data.txt]
2016-10-07 00:03:25.822 +0900 [INFO] (0001:transaction): Using local thread executor with max_threads=16 / output tasks 8 = input tasks 1 * 8
2016-10-07 00:03:25.826 +0900 [INFO] (0001:transaction): {done:  0 / 1, running: 0}
1,Alice,Cooper,1947-02-04 11:30:15.000000 +0000,0
2,Bob,Brown,1947-12-27 22:01:26.-87700 +0000,0
3,Chris,Squire,1948-03-04 01:02:05.-54400 +0000,0
2016-10-07 00:03:25.931 +0900 [INFO] (0001:transaction): {done:  1 / 1, running: 0}
2016-10-07 00:03:25.935 +0900 [INFO] (main): Committed.
2016-10-07 00:03:25.935 +0900 [INFO] (main): Next config diff: {"in":{"last_path":"data/data.txt"},"out":{}}

なぜそうなる...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment