Skip to content

Instantly share code, notes, and snippets.

@vitorbaptista
Last active September 29, 2020 14:06
Show Gist options
  • Select an option

  • Save vitorbaptista/19d476d99595584e9ad5 to your computer and use it in GitHub Desktop.

Select an option

Save vitorbaptista/19d476d99595584e9ad5 to your computer and use it in GitHub Desktop.
Importing and exporting a datapackage into/from a sqlite database

Install packages

pip install sqlalchemy urllib3[secure]
pip install -e git+https://github.com/okfn/tabulator-py@master#egg=tabulator
pip install -e git+https://github.com/okfn/datapackage-py.git@master#egg=datapackage
pip install -e git+https://github.com/okfn/jsontableschema-sql-py.git@update#egg=jtssql
pip install -e git+https://github.com/okfn/datapackage-storage-py@master#egg=datapackage_storage
import sqlalchemy
import jtssql
import dpsql
# The first steps of importing and exporting are the same.
# The difference is that when importing we want to push a
# datapackage to the storage, and when exporting we want to
# create a datapackage from the data in the storage.
# The first step is to point the DB with the data we want to
# export. We'll use the same SQLite DB that was created on
# "import_datapackage.py"
engine = sqlalchemy.create_engine('sqlite:///nasdaq-listings.sqlite')
# We then create a JSON Table Schema storage using this engine
storage = jtssql.Storage(engine)
# Now we're able to export the DB to a datapackage. The first parameter
# is the storage we just created, the second is the path where we want
# the datapackage descriptor to be written. The last parameter is the name
# of the datapackage. The name must be a string following the data packages
# spec (check http://dataprotocols.org/data-packages/#required-fields)
datapackage_storage.export_package(storage, 'nasdaq-listings/datapackage.json', 'nasdaq-listings')
import sqlalchemy
import jtssql
import dpsql
# We'll import a datapackage into a SQLite database. To do so,
# we first have to create a sqlalchemy engine pointing to the SQLite DB
# we want to use (it'll be created if it doesn't exist).
engine = sqlalchemy.create_engine('sqlite:///nasdaq-listings.sqlite')
# We then create a JSON Table Schema storage using this engine
storage = jtssql.Storage(engine)
# Now we're able to import the package. There're two required parameters:
# the storage, where the package will be imported into, and the datapackage.
# We can pass the datapackage in many ways. As an actual dict, a path to a local file,
# a URL, etc. Here we'll pass a URL. For all other options, check
# https://github.com/okfn/datapackage-py/blob/master/datapackage/datapackage.py#L27-L34
datapackage_storage.import_package(storage, 'https://github.com/datasets/nasdaq-listings/archive/master.zip')
@danfowler

Copy link
Copy Markdown

Shall we change import dpsql to import datapackage_storage?

@nishantjoseph

nishantjoseph commented Nov 7, 2017

Copy link
Copy Markdown

Hi,
I tried following you word to word but I get error "ImportError: No module named jtssql" every time I run the python code. Has something changed. Do you have a new gist? Please guide me.

@niconoe

niconoe commented Sep 29, 2020

Copy link
Copy Markdown

Same question: why has jtssql disappeared? I'm building a pipeline that imports a frictionless tabular data package to SQLite, processing it using SQL then exporting it to another package. As far as I know, there is no way with the existing tools/libraries to automatically get data out of SQLite to get a complete tabular package (with all embedded CSV files).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment