Skip to content

Instantly share code, notes, and snippets.

@simonw
Created November 12, 2024 23:24
Show Gist options
  • Save simonw/0a47f9e35a50d4e25a47826f4ab75dda to your computer and use it in GitHub Desktop.
Save simonw/0a47f9e35a50d4e25a47826f4ab75dda to your computer and use it in GitHub Desktop.

2024-11-12T23:19:55 conversation: 01jchaw69rebqxa5dx6pypnjag

Model: qwen2.5-coder:32b

Prompt:

python function that takes URL to a CSV file and path to a SQLite database, fetches the CSV with the standard library, creates a table with the right columns and inserts the data

Response:

Certainly! Below is a Python function that accomplishes this task using only the standard library (urllib for fetching the CSV, csv for reading the CSV file, and sqlite3 for interacting with the SQLite database).

import csv
import sqlite3
from urllib import request

def fetch_csv_and_insert_to_sqlite(csv_url, db_path):
    # Fetch the CSV from the URL
    response = request.urlopen(csv_url)
    reader = csv.reader(response.read().decode('utf-8').splitlines())

    # Extract headers (column names) from the first row of the CSV
    headers = next(reader)

    # Create a connection to the SQLite database (will be created if it doesn't exist)
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()

    # Generate the SQL statement for creating the table
    columns_with_types = ', '.join([f'"{header}" TEXT' for header in headers])
    create_table_query = f'CREATE TABLE IF NOT EXISTS data ({columns_with_types})'
    cursor.execute(create_table_query)

    # Prepare the SQL insert query
    placeholders = ', '.join(['?' for _ in headers])
    insert_query = f'INSERT INTO data VALUES ({placeholders})'

    # Insert each row from the CSV into the SQLite table
    cursor.executemany(insert_query, reader)

    # Commit changes and close the connection
    conn.commit()
    conn.close()

# Example usage:
csv_url = 'https://example.com/path/to/your/file.csv'
db_path = '/path/to/your/database.db'
fetch_csv_and_insert_to_sqlite(csv_url, db_path)

Explanation:

  1. Fetching CSV from URL: The urllib.request.urlopen function is used to fetch the CSV file from the provided URL.

  2. Reading CSV Data: The CSV data is decoded and split into lines using splitlines(). A csv.reader object is created to read the CSV data.

  3. Extracting Headers: The first row of the CSV (headers) is extracted and used to determine the column names for the SQLite table.

  4. Creating Table in SQLite: A SQL statement is generated to create a new table named data with columns corresponding to the headers from the CSV file. All columns are defined as TEXT type for simplicity. The table is created only if it does not already exist (CREATE TABLE IF NOT EXISTS).

  5. Inserting Data into SQLite: The CSV data (excluding the header row) is inserted into the newly created table using executemany, which executes an SQL statement with parameters.

  6. Committing Changes and Closing Connection: All changes are committed to the database, and the connection is closed.

Notes:

  • This script assumes all data can be stored as text (TEXT). You might need to modify the column types if your data includes integers or other non-text data types.
  • The table name data is hardcoded. If you want a different table name, you can pass it as an additional parameter to the function.
  • Error handling (e.g., network issues, invalid CSV format) is not included in this script for simplicity. You might want to add error handling depending on your requirements.

Feel free to modify the code according to your specific needs!

@simonw
Copy link
Author

simonw commented Nov 12, 2024

This gave me an annoying SSL error which I fixed myself like this:

import ssl
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

def fetch_csv_and_insert_to_sqlite(csv_url, db_path):
    # Fetch the CSV from the URL
    
    response = request.urlopen(csv_url, context=context)
    # ...

Then I ran this:

fetch_csv_and_insert_to_sqlite('https://github.com/simonw/scrape-san-mateo-county-election-results-2024/raw/refs/heads/main/results.csv', '/tmp/blah.db')

Which worked, I got a database with a schema like this:

CREATE TABLE data ("Precinct_name" TEXT, "Split_name" TEXT, "Reporting_flag" TEXT, "Update_count" TEXT, "Pct_Id" TEXT, "Pct_seq_nbr" TEXT, "Reg_voters" TEXT, "Turn_Out" TEXT, "Contest_Id" TEXT, "Contest_seq_nbr" TEXT, "Contest_title" TEXT, "Contest_party_name" TEXT, "Selectable_Options" TEXT, "candidate_id" TEXT, "candidate_name" TEXT, "Candidate_Type" TEXT, "cand_seq_nbr" TEXT, "Party_Code" TEXT, "total_ballots" TEXT, "total_votes" TEXT, "total_under_votes" TEXT, "total_over_votes" TEXT, "Vote Centers_ballots" TEXT, "Vote Centers_votes" TEXT, "Vote Centers_under_votes" TEXT, "Vote Centers_over_votes" TEXT, "Vote by Mail_ballots" TEXT, "Vote by Mail_votes" TEXT, "Vote by Mail_under_votes" TEXT, "Vote by Mail_over_votes" TEXT, "" TEXT);

And this many rows:

sqlite-utils tables blah.db --counts
[{"table": "data", "count": 15589}]

@glesica
Copy link

glesica commented Nov 13, 2024

Interesting, when I run the same commands I get a similar program, but it uses requests.

llm -m qwen2.5-coder:32b 'python function that takes URL to a CSV file and path to a SQLite database, fetches the CSV with the standard library, creates a table with the right columns and inserts the data'
import csv
import sqlite3
import requests
from io import StringIO

def fetch_csv_and_insert_into_sqlite(url, db_path):
    # Fetch the CSV file from the URL
    response = requests.get(url)
    if response.status_code != 200:
        raise Exception(f"Failed to fetch CSV. Status code: {response.status_code}")

    # Read the CSV content using StringIO
    csv_content = StringIO(response.text)
    csv_reader = csv.reader(csv_content)

    # Extract headers (column names) from the first row
    columns = next(csv_reader)

    # Create a table name (for simplicity, let's call it 'data')
    table_name = 'data'

    # Create a connection to the SQLite database
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()

    # Create a table with the appropriate columns
    column_definitions = ', '.join([f'"{col}" TEXT' for col in columns])
    create_table_query = f'CREATE TABLE IF NOT EXISTS {table_name} ({column_definitions})'
    cursor.execute(create_table_query)

    # Insert data into the table
    insert_query = f'INSERT INTO {table_name} ({", ".join([f\'"{col}"\' for col in columns])}) VALUES ({", ".join(["?" for _ in columns])})'
    cursor.executemany(insert_query, csv_reader)

    # Commit changes and close the connection
    conn.commit()
    conn.close()

# Example usage:
# fetch_csv_and_insert_into_sqlite('https://example.com/data.csv', 'my_database.db')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment