Neilblaze/SQLite.md

## SQLite.md

      
    Raw
  

              SQLite.md
            
          
    SQLite is a self-contained, high-reliability, embedded, full-featured, public-domain, SQL database engine. It is super easy to install and use it in the code. Here I am going to cover how to use SQLite as an in-memory cache. Often times, we might have some metadata like zipcodes/citycodes mapping which are couple MBs in size. It might be an overkill to create a database and maintain them separately if they are not going to change. For those kind of scenarios my goto solution is SQLite which can be used to easily load CSV files as tables. It is super fast to access those data since they are in memory. If you are using docker, then you don’t even need to worry about cleaning/recreating DB files used by SQLite.
Below are some generic methods I would like to go through.

Creating table from CSV
Get values for a column, filter value.

import gzip
import os
import csv
import re
import sqlite3
from sqlite3 import Error

# allows you to access columns using names 
# whereas a plain tuple would make you use numbered indices
def dict_factory(cursor, row):
    d = {}
    for idx, col in enumerate(cursor.description):
        d[col[0]] = row[idx]
    return d

db_file = 'database.db'

def create_connection(db_file):
    try:
        conn = sqlite3.connect(db_file)
        conn.row_factory = dict_factory
        conn.text_factory = str  # allows utf-8 data to be stored
        return conn
    except Error as e:
        log.logger.error('Error creating sqlite3 connection %s' % e)
        raise e


# create table from csv
def load_csv_to_sql_db():
    conn = create_connection(db_file)
    c = conn.cursor()
    path = CSV_FILE_PATH
    with gzip.open(path, 'rb') as f:
            reader = csv.reader(f)
            # tablename will be the filename here
            tablename = os.path.basename(f.name).split('.')[0]
            header = True

            for row in reader:
                if header:
                    header = False
                    sql = "DROP TABLE IF EXISTS %s" % tablename
                    c.execute(sql)
                    sql = "CREATE TABLE %s (%s)" % (
                        tablename, ", ".join(
                            ["%s text" % re.sub('\W+', '', # filter out any non alphanumeric characters
                             column) for column in row])) 
                    c.execute(sql)
                    log.logger.info('Data table %s created' % tablename)
                    index_cols = (‘col1’, ‘col2’) #columns that needs index
                    for column in row:
                        if column.lower() in index_cols:
                            index = "%s__%s" % (tablename, column)
                            sql = "CREATE INDEX %s on %s (%s)" % (
                                index, tablename, column)
                            c.execute(sql)
                        insertsql = "INSERT INTO %s VALUES (%s)" % (
                            tablename, ", ".join(["?" for column in row]))
                    rowlen = len(row)
                else:
                    if len(row) == rowlen:
                        c.execute(insertsql, row)

            conn.commit()
    conn.close()

# read a column and filter out based on value passed
def get_value(table_name, col_name, filter_val):
    conn = create_connection(db_file)
    try:
        cur = conn.cursor()
        cur.execute("SELECT * FROM %s"
                    " WHERE %s=?" % (table_name, col_name), (filter_val,))
        row = cur.fetchone() #fetching single row
        return row or {}
    except Exception as e:
        log.logger.error('Error while fetching %s'
                         ' from table : %s and value : %s' % (
                             col_name, table_name, filter_val))
    conn.close()

Now we have a loaded the data from the CSV into sqlite table and use the interface get_value to access the required columns. We call the ‘load_csv_to_sql_db’ method in before_first_request in flask which is where we load all our inmemory cache values. sqlite3 is also a great command line database for data analysis. I use it all the time to load CSVs and to understand the data. You can find more detail here Command Line Shell For SQLite. The above methods can optimized to your requirement to fetch multiple values, reusing the connection objects, etc. I have just showcased a pattern to use SQLite as an in-memory cache.