Skip to content

Instantly share code, notes, and snippets.

@amzoss
Last active March 28, 2023 21:01
Show Gist options
  • Save amzoss/1d2f58ef06c6fe9c0e496a9aae18784b to your computer and use it in GitHub Desktop.
Save amzoss/1d2f58ef06c6fe9c0e496a9aae18784b to your computer and use it in GitHub Desktop.
ldlite_notebook_sample.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"private_outputs": true,
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
},
"gpuClass": "standard"
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/amzoss/1d2f58ef06c6fe9c0e496a9aae18784b/ldlite_notebook_sample.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"# Using LDLite to Query FOLIO APIs"
],
"metadata": {
"id": "CY1UEs-PNUsL"
}
},
{
"cell_type": "markdown",
"source": [
"[LDLite](https://github.com/library-data-platform/ldlite) is a tool that is part of the Library Data Platform project. It is a Python package that provides basic LDP data reporting functions, including automatic data transformations, without requiring the LDP reporting database server to be installed.\n",
"\n",
"* [LDLite API documentation](https://library-data-platform.github.io/ldlite/ldlite.html)\n",
"* [LDLite documentation on docs.folio.org](https://docs.folio.org/docs/reporting/ldlite/)\n",
"\n"
],
"metadata": {
"id": "iUmmdhsIF5Vy"
}
},
{
"cell_type": "markdown",
"source": [
"## Installing and setting up LDLite"
],
"metadata": {
"id": "j7Yz4NGxPLxG"
}
},
{
"cell_type": "markdown",
"source": [
"This first code block runs a command to install (or upgrade) LDLite. It can take about 30 seconds to install LDLite and its other requirements."
],
"metadata": {
"id": "7isqDgaQPQGF"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Cvsq97yB32dM",
"collapsed": true
},
"outputs": [],
"source": [
"#!pip install --upgrade ldlite\n",
"!pip install ldlite==0.0.33"
]
},
{
"cell_type": "markdown",
"source": [
"Once LDLite is installed, you can import it as a Python package. This next code block also imports `pandas`, a useful data science package."
],
"metadata": {
"id": "B3DwVMEnN86g"
}
},
{
"cell_type": "code",
"source": [
"import ldlite\n",
"import pandas"
],
"metadata": {
"id": "UN0ppqOx4rMI"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Next, use the following code to start up an instance of LDLite (`ld`). Then we connect LDLite to the Okapi instance for the [FOLIO snapshot reference environment](https://folio-snapshot.dev.folio.org/). \n",
"\n",
"We also start up a new database connection (`db`). By default, this is just a simple database that stays in memory until the session is over. You can also create a local file that retains the data after the session, or you can connect to a live PostgreSQL database. See the [LDLite API documentation](https://library-data-platform.github.io/ldlite/ldlite.html#LDLite.connect_db) for more information."
],
"metadata": {
"id": "cyiEtZuhOJgZ"
}
},
{
"cell_type": "code",
"source": [
"ld = ldlite.LDLite()\n",
"ld.connect_okapi(url='https://folio-snapshot-okapi.dev.folio.org/',\n",
" tenant='diku',\n",
" user='diku_admin',\n",
" password='admin')\n",
"db = ld.connect_db()"
],
"metadata": {
"id": "a-6pv3Tw4yGg"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## Writing Queries"
],
"metadata": {
"id": "M_mdg4YLPHWd"
}
},
{
"cell_type": "markdown",
"source": [
"### Query syntax\n",
"\n",
"LDLite queries start with the command `ld.query()`. You then need to specify:\n",
"\n",
"1. The name of the `table` you want to create\n",
"2. The `path` to the API endpoint you want to request data from (not the full request URL, just the last part to the specific API endpoint)\n",
"3. The [CQL](https://dev.folio.org/reference/glossary/#cql) `query`\n",
"\n",
"### Query output\n",
"\n",
"If your query is successful, you'll see some output. \n",
"\n",
"* **The first line:** something like `ldlite: querying: ....`\n",
"* **The following line:** a list of created tables, starting with a table that has the name you specified in your query command, likely followed by one or more additional tables that contain transformations of the original JSON data\n",
"\n",
"If you only see one table and no transformations, your query may not have returned any results. Try starting with a broad query to make sure everything is working, and then work on adjusting it to what you really need.\n",
"\n",
"### A note of caution about asking for large datasets\n",
"\n",
"In the first example below, you see a general query that can be used to pull all of the data from FOLIO for a particular endpoint. \n",
"\n",
"`query=cql.allRecords=1 sortby id`\n",
"\n",
"On FOLIO Snapshot, the datasets are all pretty small. It's probably good to be careful running something like this on a bigger system.\n"
],
"metadata": {
"id": "pKzuZHHTPtLH"
}
},
{
"cell_type": "code",
"source": [
"_ = ld.query(table='users', path='/users', query='cql.allRecords=1 sortby id')"
],
"metadata": {
"id": "txJanNlg4-yp"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### Accessing query results\n",
"\n",
"To access the data you just requested, you need to send an `ld.select()` command. You need to specify the `table`, and you can optionally specify a limit to the number of records to return or the set of specific columns you'd like to see."
],
"metadata": {
"id": "O--lCC-zRmRu"
}
},
{
"cell_type": "code",
"source": [
"ld.select(table='users__t', columns = ['username','barcode','active','type','enrollment_date','personal__last_name','personal__first_name','personal__middle_name', 'personal__email'], limit=10)"
],
"metadata": {
"id": "nPBYaJ3E5D2v"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### Using CQL to filter results"
],
"metadata": {
"id": "JjGYrekjG9J9"
}
},
{
"cell_type": "markdown",
"source": [
"The same CQL queries that work for traditional FOLIO API calls should work in the `query` parameter of `ld.query()`. In this example, the query filters user records to those where either the last name is Denesik or the username is diku_admin."
],
"metadata": {
"id": "cJ8eZ6aESsTp"
}
},
{
"cell_type": "code",
"source": [
"_ = ld.query(table='users_filtered', path='/users', query='(personal.lastName=Denesik or username=diku_admin)')"
],
"metadata": {
"id": "vPFI83_P6aR3"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"ld.select(table='users_filtered__t')"
],
"metadata": {
"id": "hnHRilZY6g0P"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## Exporting data"
],
"metadata": {
"id": "3gScMh_gTBMs"
}
},
{
"cell_type": "markdown",
"source": [
"### Exporting a full table"
],
"metadata": {
"id": "j8MA09QZHmO1"
}
},
{
"cell_type": "markdown",
"source": [
"The following commands will grab a full table from the database and export it to either a CSV file or an Excel file.\n",
"\n",
"In Google Colab, you have to select the \"Files\" option on the leftmost sidebar to see the exported files. From there, you should be able to download them. If they don't show up right away, you can click the refresh button at the top or just try clicking out and then clicking back in."
],
"metadata": {
"id": "2uQbZ9vPTH1N"
}
},
{
"cell_type": "code",
"source": [
"ld.export_csv(table='users__t', filename='users.csv')\n",
"\n",
"ld.export_excel(table='users_filtered__t', filename='users_filtered.xlsx')"
],
"metadata": {
"id": "VFCSCpVeTGpw"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"But what if you want to export just a subset of that table, like we created in our `ld.select()` command?"
],
"metadata": {
"id": "sMA5bWdcUF3-"
}
},
{
"cell_type": "markdown",
"source": [
"### Export a subset of a table"
],
"metadata": {
"id": "Q_-x6SMLUUhd"
}
},
{
"cell_type": "markdown",
"source": [
"#### Method 1: redo the original query so that it is closer to what you want\n",
"\n",
"For example, you can add a limit to the `ld.query()` command if you don't want all of the results. See the [LDLite API documentation](https://library-data-platform.github.io/ldlite/ldlite.html#LDLite.query) for more information about `ld.query()`."
],
"metadata": {
"id": "cW2qNCC_UXvs"
}
},
{
"cell_type": "markdown",
"source": [
"#### Method 2: write some SQL to create a new table based on your preferences\n",
"\n",
"Since the export focuses on a full table, you may want to tailor your tables. Once LDLite has pulled the data out of FOLIO and transformed it, you can query the database like a normal database.\n",
"\n",
"First, you need to create a database `cursor`, which will allow you to execute SQL commands."
],
"metadata": {
"id": "lW6XlGFBVDKc"
}
},
{
"cell_type": "code",
"source": [
"cur = db.cursor()"
],
"metadata": {
"id": "DBEHctjiWW6x"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Next, you should write an SQL command that creates the table you want. "
],
"metadata": {
"id": "59z2YT1MWYUI"
}
},
{
"cell_type": "code",
"source": [
"cur.execute(\"\"\"\n",
" CREATE TABLE users_subset AS\n",
" SELECT \n",
" username,\n",
" barcode,\n",
" active,\n",
" type,\n",
" enrollment_date,\n",
" personal__last_name,\n",
" personal__first_name,\n",
" personal__middle_name,\n",
" personal__email\n",
" FROM users__t\n",
" LIMIT 10;\n",
" \"\"\")"
],
"metadata": {
"id": "3672xU80WngU"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Now that you have a new table, you can use `ld.select()` to view the contents or the `export` functions to export the modified table."
],
"metadata": {
"id": "qd2BxUWaXVL1"
}
},
{
"cell_type": "code",
"source": [
"ld.select(table='users_subset')"
],
"metadata": {
"id": "3tWiNm9EXH0-"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"ld.export_csv(table='users_subset', filename='users_subset.csv')"
],
"metadata": {
"id": "xgwt460uXkDR"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"#### Method 3: pull the data into pandas for processing\n",
"\n",
"The `pandas` package is a common way of doing data science tasks in Python, including basic data transformations.\n",
"\n",
"To pull data out of the LDLite database and work with it in `pandas`, you can execute a `SELECT` command on the database and send it to a `fetchdf()` command. You still need to create a database cursor and execute SQL, but once it's out of the database, you can do the data work in `pandas` instead of SQL."
],
"metadata": {
"id": "S_K3tK-MX7xz"
}
},
{
"cell_type": "code",
"source": [
"cur = db.cursor()\n",
"df = cur.execute(\"\"\"\n",
" SELECT * FROM users__t;\n",
" \"\"\").fetchdf()\n",
"\n",
"df.head(5)"
],
"metadata": {
"id": "317Lv1Wc79c5"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Data processing in `pandas` is out of scope for this notebook, but this example shows how to select columns and limit to the first 10 rows."
],
"metadata": {
"id": "X_CJOVvtaSxD"
}
},
{
"cell_type": "code",
"source": [
"df1 = df[['username','barcode','active','type','enrollment_date','personal__last_name','personal__first_name','personal__middle_name', 'personal__email']].head(10)\n",
"\n",
"df1"
],
"metadata": {
"id": "6V85c1CHZndJ"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Pandas has `to_csv()` and `to_excel()` functions (among others!) to export data frames."
],
"metadata": {
"id": "pK13tFtaadFr"
}
},
{
"cell_type": "code",
"source": [
"df1.to_csv('users_subset_pandas.csv')\n",
"\n",
"df1.to_excel('users_subset_pandas.xlsx')"
],
"metadata": {
"id": "8iSS_GBe5h9m"
},
"execution_count": null,
"outputs": []
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment