Last active
February 15, 2023 10:35
-
-
Save pletnes/bccfcf83468dd08d7bc0fa061a4440e2 to your computer and use it in GitHub Desktop.
Duckdb 0.6.1 timestamp-in-csv parsing issues
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
from datetime import datetime | |
from pathlib import Path | |
import duckdb | |
import sql_statements | |
def main(): | |
csv_dir = Path('csv.gz') | |
csv_files = sorted(csv_dir.glob('*.csv.gz')) | |
con = duckdb.connect('demo.duckdb') | |
for csv_file in csv_files: | |
parquet_file = str(csv_file).replace('csv.gz', 'parquet') | |
print(f'{datetime.now()} Converting file {csv_file} to {parquet_file}') | |
convert_sql = sql_statements.convert_template % (csv_file, parquet_file) | |
con.execute(convert_sql) | |
if __name__ == '__main__': | |
main() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
LOAD parquet; | |
COPY ( | |
SELECT | |
cast(created as timestamptz), | |
type_, | |
FROM read_csv( | |
'demo.csv', | |
header=True, | |
columns={ | |
'created': 'varchar', | |
'type_': 'int4' | |
} | |
) | |
) TO 'demo.parquet' (format 'parquet'); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
created | type_ | |
---|---|---|
2020-08-30 06:00:07.378966 UTC | 5 | |
2019-11-03 09:39:45.103055 UTC | 5 | |
2021-12-10 19:01:45.555969 UTC | 3 | |
2021-10-27 14:05:41.296911 UTC | 3 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Run in bash with `< convert_demo.sql duckdb` | |
# macOS with duckdb 0.6.1, homebrew-installed | |
Error: near line 3: Conversion Error: timestamp field value out of range: "2020-08-30 06:00:07.378966 UTC", expected format is (YYYY-MM-DD HH:MM:SS[.US][±HH:MM| ZONE]) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment