Skip to content

Instantly share code, notes, and snippets.

@pletnes
Last active February 15, 2023 10:35
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pletnes/bccfcf83468dd08d7bc0fa061a4440e2 to your computer and use it in GitHub Desktop.
Save pletnes/bccfcf83468dd08d7bc0fa061a4440e2 to your computer and use it in GitHub Desktop.
Duckdb 0.6.1 timestamp-in-csv parsing issues
#!/usr/bin/env python
from datetime import datetime
from pathlib import Path
import duckdb
import sql_statements
def main():
csv_dir = Path('csv.gz')
csv_files = sorted(csv_dir.glob('*.csv.gz'))
con = duckdb.connect('demo.duckdb')
for csv_file in csv_files:
parquet_file = str(csv_file).replace('csv.gz', 'parquet')
print(f'{datetime.now()} Converting file {csv_file} to {parquet_file}')
convert_sql = sql_statements.convert_template % (csv_file, parquet_file)
con.execute(convert_sql)
if __name__ == '__main__':
main()
LOAD parquet;
COPY (
SELECT
cast(created as timestamptz),
type_,
FROM read_csv(
'demo.csv',
header=True,
columns={
'created': 'varchar',
'type_': 'int4'
}
)
) TO 'demo.parquet' (format 'parquet');
created type_
2020-08-30 06:00:07.378966 UTC 5
2019-11-03 09:39:45.103055 UTC 5
2021-12-10 19:01:45.555969 UTC 3
2021-10-27 14:05:41.296911 UTC 3
# Run in bash with `< convert_demo.sql duckdb`
# macOS with duckdb 0.6.1, homebrew-installed
Error: near line 3: Conversion Error: timestamp field value out of range: "2020-08-30 06:00:07.378966 UTC", expected format is (YYYY-MM-DD HH:MM:SS[.US][±HH:MM| ZONE])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment