Skip to content

Instantly share code, notes, and snippets.

@hartleybrody
Last active October 4, 2023 01:11
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hartleybrody/60bf9e611f5edd3a2950406e83591f26 to your computer and use it in GitHub Desktop.
Save hartleybrody/60bf9e611f5edd3a2950406e83591f26 to your computer and use it in GitHub Desktop.
use CSV to bulk export/import data to postgres tables

Intro

For when you need to insert or update millions of rows and don't want to mess around with scripting to batch individual SQL commands, it's worth using postgres's built in, super-fast support for importing and exporting data from the CSV file format.

Note: that we're using the \COPY command and not the COPY command, since it allows us to work with files locally available on our machine and not worry about transferring files to the database server's file system.

Via the psql manual on \COPY:

Performs a frontend (client) copy. This is an operation that runs an SQL COPY command, but instead of the server reading or writing the specified file, psql reads or writes the file and routes the data between the server and the local file system. This means that file accessibility and privileges are those of the local user, not the server, and no SQL superuser privileges are required.

Export data from database to CSV

Open a psql connection to your source database and run

\COPY (SELECT id, field_1, ... FROM ...) TO '~/Downloads/export.csv' (FORMAT csv, HEADER);

et voila, you'll have a CSV in your downloads folder with the columns you specified, filled with the rows you selected.

--> via stackoverflow.

Import data from CSV to an empty table

Doing a simple import into a totally empty table is easy.

\COPY table_name FROM '/path/to/import.csv' (FORMAT csv);

Note that if you're adding a bunch of new rows to a table with an autoincrement primary key, you'll have to reset the sequence on that primary key column so the database doesn't generate duplicate IDs.

SELECT MAX(id) FROM tbl;
-[ RECORD 1 ]
max | {some_int}

ALTER SEQUENCE tbl_id_seq RESTART WITH {some_int + 1};

--> via stackoverflow

Note that if your source CSV and the table you're loading data into have different columns (or even different ordering of columns?), you may need to specify that column names explicitly when loading:

\COPY table_name (col1, col2, ...) FROM STDIN WITH (FORMAT CSV, HEADER)

Note that adding HEADERS to this command does NOT automatically map the first row of the CSV (ie the column headers) to the correct database columns, it [basically just tells postgres to skip the first line of the CSV file}(https://stackoverflow.com/a/33271507/625840).

Import data from CSV to a table with existing values

First, we'll create a temporary table with the schema that matches the CSV file

CREATE TEMP TABLE tmp_x (id int, field_1 text, ...);

Next, we'll load the CSV data into our new temporary table

\COPY tmp_x FROM '~/Downloads/import.csv' (FORMAT csv);

Then, we'll update our target table using the data in our temporary table

UPDATE actual_table
SET    field_1 = tmp_x.field_1
FROM   tmp_x
WHERE  actual_table.id = tmp_x.id;

Finally, we'll drop the temporary table that we made

DROP TABLE tmp_x;

--> via stackoverflow.

@hartleybrody
Copy link
Author

Note that if your source CSV and the table you're loading data into have different columns (or even different ordering of columns?), you may need to specify that columns explicitly when loading:

COPY {table_name} ({headers}) FROM STDIN WITH (FORMAT CSV)

Note that adding HEADERS to this command does NOT automatically map the first row of the CSV (ie the column headers) to the correct database columns, it basically just tells postgres to skip the first line of the CSV file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment