Skip to content

Instantly share code, notes, and snippets.

@simonw
Created November 27, 2024 23:58
Show Gist options
  • Save simonw/d14fb1d710f1a07b07e7cc6698709f7d to your computer and use it in GitHub Desktop.
Save simonw/d14fb1d710f1a07b07e7cc6698709f7d to your computer and use it in GitHub Desktop.

2024-11-27T23:45:12 conversation: 01jdr061842cw7dy0fect772zf

Model: qwq:latest

Prompt:

Write a Python function import_csv(conn, url, table_name) which acceopts a connection to a SQLite databse and a URL to a CSV file and the name of a table - it then creates that table with the right columns and imports the CSV data from that URL

Response:

Okay, I need to write a Python function that imports data from a CSV file located at a given URL into a specified table in a SQLite database. The function should take three parameters: a connection to a SQLite database (conn), a URL to a CSV file (url), and the name of the table (table_name) to be created and populated with data from the CSV.

First, I need to understand what this function needs to do step by step:

  1. Retrieve the CSV Data: Given a URL, the function needs to download the CSV file.
  2. Read the CSV Data: Once downloaded, it should read the CSV data to determine the column names and types.
  3. Create the Table: Using the connection to the SQLite database, create a table with columns that match the CSV headers.
  4. Insert Data into the Table: Import the data from the CSV into the newly created table.

Let's think about each step in detail.

Step 1: Retrieve the CSV Data

I need to download the CSV file from the provided URL. In Python, I can use the requests library to handle HTTP requests and download the file.

import requests

response = requests.get(url)
if response.status_code == 200:
    csv_data = response.text
else:
    raise Exception("Failed to retrieve CSV data")

Step 2: Read the CSV Data

Once I have the CSV data as a string, I can use the csv module to read it. I need to read the first row to get the column names, and then determine appropriate SQLite data types for each column based on the data.

import csv
from io import StringIO

file_like = StringIO(csv_data)
reader = csv.reader(file_like)
headers = next(reader)

# Determine data types
# For simplicity, assume all columns are text initially
column_defs = [f"{header} TEXT" for header in headers]

Determining the correct data type can be tricky because CSV files don't specify data types. One approach is to scan a few rows and guess the type based on the content. For example, if a column contains only integers, set it as INTEGER; if it has decimal points, set it as REAL; otherwise, set it as TEXT.

Step 3: Create the Table

Using the connection provided, I need to execute an SQL command to create the table with the determined columns.

create_table_query = f"CREATE TABLE IF NOT EXISTS {table_name} ({', '.join(column_defs)})"
conn.execute(create_table_query)

Step 4: Insert Data into the Table

Now, I need to insert the rows from the CSV into the table. I can use parameterized queries to avoid SQL injection risks.

First, prepare an INSERT query with placeholders for each column.

insert_query = f"INSERT INTO {table_name} ({', '.join(headers)}) VALUES ({', '.join(['?'] * len(headers))})"

Then, iterate over the remaining rows in the CSV and execute the insert query for each row.

for row in reader:
    conn.execute(insert_query, row)
conn.commit()

Putting It All Together

Now, let's combine all these steps into a single function. I also need to handle potential exceptions and ensure that resources are properly cleaned up.

import requests
import csv
from io import StringIO

def import_csv(conn, url, table_name):
    # Step 1: Retrieve the CSV Data
    try:
        response = requests.get(url)
        response.raise_for_status()  # Raise an exception for HTTP errors
    except requests.exceptions.RequestException as e:
        raise Exception(f"Failed to retrieve CSV data: {e}")
    
    csv_data = response.text
    
    # Step 2: Read the CSV Data
    file_like = StringIO(csv_data)
    reader = csv.reader(file_like)
    headers = next(reader)
    
    # Determine column definitions
    # For simplicity, assume all columns are text
    column_defs = [f"{header} TEXT" for header in headers]
    
    # Step 3: Create the Table
    create_table_query = f"CREATE TABLE IF NOT EXISTS {table_name} ({', '.join(column_defs)})"
    conn.execute(create_table_query)
    
    # Step 4: Insert Data into the Table
    insert_query = f"INSERT INTO {table_name} ({', '.join(headers)}) VALUES ({', '.join(['?'] * len(headers))})"
    for row in reader:
        conn.execute(insert_query, row)
    conn.commit()

Considerations

  1. Data Type Detection: The current implementation assumes all columns are text. For better accuracy, implement a function to detect data types by sampling the CSV data.
  2. Error Handling: Add more robust error handling, especially when dealing with different CSV formats or database operations.
  3. Performance: For large CSV files, consider batch inserting records to manage memory usage efficiently.
  4. Dependencies: Ensure that the requests library is installed, as it's not part of Python's standard library.

This function provides a basic framework for importing CSV data from a URL into a SQLite database. Depending on specific requirements, it can be extended and optimized further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment