Skip to content

Instantly share code, notes, and snippets.

@polynomialherder
Last active October 19, 2020 01:40
Show Gist options
  • Save polynomialherder/2acf56126721fa070e952ada9eeff9d7 to your computer and use it in GitHub Desktop.
Save polynomialherder/2acf56126721fa070e952ada9eeff9d7 to your computer and use it in GitHub Desktop.
A short demonstration of how to performantly load a delimited datafile into a SQLite database
# Using SQLite's CSV mode from Python for extremely fast bulk data loads
#
# Note that if the target table does not exist, SQLite will create it automatically,
# creating column names from the first row in the CSV assuming TEXT datatypes.
#
# If the target table does exist, SQLite will treat the first row in the CSV file
# as *data*. So if the CSV file contains a header, you should either delete it from the
# source file, or, if using SQLite > 3.32.0, pass the `--skip 1` flag to the `.import`
# command (`import data.csv Data --skip 1`)
#
# This approach assumes that your CSV data is well-formed, although SQLite will robustly handle
# exotic character sets and certain kinds of CSV parse errors, emitting a warning and continuing
# if possible
#
# This approach tends to dramatically outperform common approaches such as reading the data using the
# built in csv module and calling `sqlite3.connection.executemany`, or using `pandas.Dataframe.to_sql`.
import subprocess
if __name__ == '__main__':
with subprocess.Popen(
["sqlite3", "testdb"],
universal_newlines=True,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT
) as sqlite:
sqlite.stdin.write(".mode csv\n")
sqlite.stdin.write('.separator "\t"\n')
sqlite.stdin.write(".import data.tsv Data\n")
sqlite.stdin.write(".quit\n")
sqlite.stdin.flush()
# Read stdout output and read to CSV
for stdout_line in iter(sqlite.stdout.readline, ""):
print(stdout_line)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment