Model: qwq:latest
Write a Python function import_csv(conn, url, table_name) which acceopts a connection to a SQLite databse and a URL to a CSV file and the name of a table - it then creates that table with the right columns and imports the CSV data from that URL
Okay, I need to write a Python function that imports data from a CSV file located at a given URL into a specified table in a SQLite database. The function should take three parameters: a connection to a SQLite database (conn
), a URL to a CSV file (url
), and the name of the table (table_name
) to be created and populated with data from the CSV.
First, I need to understand what this function needs to do step by step:
- Retrieve the CSV Data: Given a URL, the function needs to download the CSV file.
- Read the CSV Data: Once downloaded, it should read the CSV data to determine the column names and types.
- Create the Table: Using the connection to the SQLite database, create a table with columns that match the CSV headers.
- Insert Data into the Table: Import the data from the CSV into the newly created table.
Let's think about each step in detail.
I need to download the CSV file from the provided URL. In Python, I can use the requests
library to handle HTTP requests and download the file.
import requests
response = requests.get(url)
if response.status_code == 200:
csv_data = response.text
else:
raise Exception("Failed to retrieve CSV data")
Once I have the CSV data as a string, I can use the csv
module to read it. I need to read the first row to get the column names, and then determine appropriate SQLite data types for each column based on the data.
import csv
from io import StringIO
file_like = StringIO(csv_data)
reader = csv.reader(file_like)
headers = next(reader)
# Determine data types
# For simplicity, assume all columns are text initially
column_defs = [f"{header} TEXT" for header in headers]
Determining the correct data type can be tricky because CSV files don't specify data types. One approach is to scan a few rows and guess the type based on the content. For example, if a column contains only integers, set it as INTEGER; if it has decimal points, set it as REAL; otherwise, set it as TEXT.
Using the connection provided, I need to execute an SQL command to create the table with the determined columns.
create_table_query = f"CREATE TABLE IF NOT EXISTS {table_name} ({', '.join(column_defs)})"
conn.execute(create_table_query)
Now, I need to insert the rows from the CSV into the table. I can use parameterized queries to avoid SQL injection risks.
First, prepare an INSERT query with placeholders for each column.
insert_query = f"INSERT INTO {table_name} ({', '.join(headers)}) VALUES ({', '.join(['?'] * len(headers))})"
Then, iterate over the remaining rows in the CSV and execute the insert query for each row.
for row in reader:
conn.execute(insert_query, row)
conn.commit()
Now, let's combine all these steps into a single function. I also need to handle potential exceptions and ensure that resources are properly cleaned up.
import requests
import csv
from io import StringIO
def import_csv(conn, url, table_name):
# Step 1: Retrieve the CSV Data
try:
response = requests.get(url)
response.raise_for_status() # Raise an exception for HTTP errors
except requests.exceptions.RequestException as e:
raise Exception(f"Failed to retrieve CSV data: {e}")
csv_data = response.text
# Step 2: Read the CSV Data
file_like = StringIO(csv_data)
reader = csv.reader(file_like)
headers = next(reader)
# Determine column definitions
# For simplicity, assume all columns are text
column_defs = [f"{header} TEXT" for header in headers]
# Step 3: Create the Table
create_table_query = f"CREATE TABLE IF NOT EXISTS {table_name} ({', '.join(column_defs)})"
conn.execute(create_table_query)
# Step 4: Insert Data into the Table
insert_query = f"INSERT INTO {table_name} ({', '.join(headers)}) VALUES ({', '.join(['?'] * len(headers))})"
for row in reader:
conn.execute(insert_query, row)
conn.commit()
- Data Type Detection: The current implementation assumes all columns are text. For better accuracy, implement a function to detect data types by sampling the CSV data.
- Error Handling: Add more robust error handling, especially when dealing with different CSV formats or database operations.
- Performance: For large CSV files, consider batch inserting records to manage memory usage efficiently.
- Dependencies: Ensure that the
requests
library is installed, as it's not part of Python's standard library.
This function provides a basic framework for importing CSV data from a URL into a SQLite database. Depending on specific requirements, it can be extended and optimized further.