Skip to content

Instantly share code, notes, and snippets.

@simonw
Created January 27, 2026 15:01
Show Gist options
  • Select an option

  • Save simonw/ee2583b2eb5706400a4737f56d57c456 to your computer and use it in GitHub Desktop.

Select an option

Save simonw/ee2583b2eb5706400a4737f56d57c456 to your computer and use it in GitHub Desktop.

Prompt

I want to build a Datasette plugin that offers a UI to upload files to an S3 bucket and stores information about them in a SQLite table. Break this down into ten tasks suitable for execution by parallel coding agents.

Response

Reasoning trace

The user wants to build a Datasette plugin that:

  1. Offers a UI to upload files to an S3 bucket
  2. Stores information about them in a SQLite table

They want this broken down into 10 tasks suitable for parallel coding agents. This means the tasks should be:

  • Independent or minimally dependent on each other
  • Well-defined interfaces
  • Concrete deliverables
  • Testable

Let me think about the architecture:

Core Components:

  1. S3 Integration - Uploading files to S3 (boto3/aiobotocore)
  2. Database Schema - SQLite table structure for file metadata
  3. UI Components - HTML forms, templates, JavaScript for file upload
  4. API Endpoints - Datasette plugin hooks (register_routes, etc.)
  5. Configuration - AWS credentials, bucket settings, table names
  6. File Processing - Validation, MIME type detection, size limits
  7. Metadata Extraction - EXIF data, file headers, etc.
  8. Security/Auth - Permission checks, file validation
  9. Error Handling - Retry logic, error messages, logging
  10. Testing/Integration - Unit tests, integration tests

Task Breakdown Strategy:

Since these need to be parallel-friendly, I should define clear interfaces between them:

  1. Database Schema & Models - Define the SQLite table structure and Python dataclasses/Pydantic models
  2. Configuration System - Handle Datasette plugin configuration (AWS keys, bucket, table name)
  3. S3 Client Abstraction - Wrapper around boto3 for uploads, with async support
  4. File Validation Service - MIME type checking, size limits, extension validation
  5. Metadata Extraction Service - Extract file metadata (size, checksum, EXIF if image)
  6. Database Repository Layer - CRUD operations for the files table
  7. Upload API Endpoint - HTTP endpoint handling multipart uploads
  8. Frontend Upload UI - HTML template + JavaScript for drag-and-drop/file selection
  9. File Listing/Admin UI - Display uploaded files, search, pagination
  10. Plugin Integration & Wiring - Datasette hooks (prepare_connection, register_routes), plugin.py entry point

Dependencies Analysis:

  • Task 1 (Schema) and Task 2 (Config) are independent
  • Task 3 (S3) depends on Task 2 (Config) for credentials
  • Task 4 (Validation) and Task 5 (Metadata) are independent utilities
  • Task 6 (Repository) depends on Task 1 (Schema)
  • Task 7 (API) depends on Tasks 3, 4, 5, 6
  • Task 8 (UI) depends on Task 7 (API endpoint URL)
  • Task 9 (Listing) depends on Task 6 (Repository)
  • Task 10 (Integration) depends on everything but mainly 7, 8, 9

Actually, for true parallelism, I should define interfaces so that tasks can mock their dependencies.

Let me refine:

  1. Database Schema Definition - SQL schema, migration logic, Python models
  2. Configuration & Settings - Pydantic settings class, Datasette plugin config parsing
  3. S3 Storage Backend - Abstract storage interface + S3 implementation (can mock)
  4. File Validation Module - Pure function validators (MIME, size, extensions)
  5. Metadata Extraction Engine - Extract EXIF, hashes, etc. (pure functions)
  6. Data Access Layer - Repository pattern for SQLite operations (depends on schema interface)
  7. Upload HTTP Handler - FastAPI/Datasette route handler (depends on storage, validation, metadata, DAL interfaces)
  8. Frontend Upload Component - HTML/JS form with progress bar (assumes REST endpoint contract)
  9. File Management UI - Table view, delete functionality (depends on DAL interface)
  10. Plugin Bootstrap & Wiring - Entry point, hook implementations, dependency injection container

Interface Contracts:

Each task should define its interface:

  1. Schema: Provides get_create_table_sql(), FileRecord dataclass
  2. Config: Provides S3Config dataclass with from_datasette_plugin_config()
  3. Storage: AbstractStorage with async def upload(file_bytes, key) -> URL and S3Storage implementation
  4. Validation: validate_file(file_bytes, filename) -> ValidationResult
  5. Metadata: extract_metadata(file_bytes, filename) -> FileMetadata
  6. Repository: FileRepository with insert(), get(), list(), delete() (uses FileRecord from #1)
  7. Upload Handler: Implements Datasette register_routes, expects StorageInterface, ValidatorInterface, RepoInterface
  8. Frontend: Provides Jinja2 templates and static JS, assumes /upload endpoint accepts multipart/form-data
  9. Management UI: Template + route handlers, assumes RepoInterface
  10. Plugin: Combines all, registers routes, initializes DB

This way agents can work in parallel using mock/stub interfaces.

Let me write these out clearly with specific deliverables and interface definitions.

Here is a decomposition into 10 parallelizable tasks with defined interfaces, allowing agents to work simultaneously using contract-first development.

Task 1: Database Schema & Models

Deliverable: SQL migration file and Python dataclasses.

  • Define SQLite table schema (files table with columns: id, filename, s3_key, bucket, content_type, size_bytes, checksum_sha256, metadata_json, created_at)
  • Create Pydantic/dataclass models: FileRecord, FileCreateRequest
  • Provide get_migration_sql() function that returns DDL
  • Interface: Exports FileRecord class and schema version constant

Task 2: Configuration System

Deliverable: Settings parser and validation.

  • Create PluginConfig Pydantic model with fields: aws_access_key_id, aws_secret_access_key, s3_bucket, s3_region, s3_endpoint_url (optional), max_file_size_mb, allowed_mime_types, db_table_name
  • Implement load_config(datasette_instance) -> PluginConfig that reads from datasette.metadata(["plugins", "datasette-s3-uploads"])
  • Validate AWS credentials on startup (test connection)
  • Interface: Exports PluginConfig class and get_config(datasette) function

Task 3: S3 Storage Backend

Deliverable: Abstract storage interface and S3 implementation.

  • Define AbstractStorage protocol with async upload(file_bytes: bytes, key: str, content_type: str) -> str (returns URL) and async delete(key: str) -> bool
  • Implement S3Storage class using aiobotocore or boto3 (async wrapper)
  • Include retry logic with exponential backoff
  • Interface: Exports AbstractStorage protocol and S3Storage class. Depends on PluginConfig from Task 2 for initialization.

Task 4: File Validation Engine

Deliverable: Pure validation functions.

  • Implement validate_file(file_bytes: bytes, filename: str, config: ValidationRules) -> ValidationResult
  • Checks: file size limits, MIME type against magic numbers (python-magic), extension whitelist/blacklist
  • Return structured error messages suitable for UI display
  • Interface: Exports ValidationRules dataclass and validate_file() function. No external dependencies.

Task 5: Metadata Extraction Service

Deliverable: Metadata extraction utilities.

  • Implement extract_metadata(file_bytes: bytes, filename: str) -> FileMetadata
  • Extract: SHA256 checksum, file size, MIME type, image dimensions (if PIL can open), EXIF data (if available)
  • Return JSON-serializable dict
  • Interface: Exports FileMetadata dataclass and extract_metadata() function. No external dependencies.

Task 6: Data Access Layer (Repository)

Deliverable: Database CRUD operations.

  • Implement FileRepository class with methods: async create(record: FileRecord), async get(file_id: int), async list(limit, offset), async delete(file_id), async get_by_checksum(checksum)
  • Handle SQLite async operations using datasette.databases["dbname"].execute_write_fn() or similar
  • Interface: Exports FileRepository class. Depends on FileRecord from Task 1 and PluginConfig from Task 2 (for table name).

Task 7: Upload API Endpoint

Deliverable: HTTP route handler.

  • Implement upload_handler(request: Request) that handles multipart/form-data
  • Orchestrates: Task 4 (validation) → Task 5 (metadata) → Task 3 (S3 upload) → Task 6 (DB insert)
  • Return JSON: {success: bool, file_id: int, url: str} or error details
  • Handle streaming for large files (spool to disk if needed)
  • Interface: Function suitable for datasette.add_route(). Depends on interfaces of Tasks 2, 3, 4, 5, 6.

Task 8: Frontend Upload UI

Deliverable: HTML template and JavaScript.

  • Create Jinja2 template upload_form.html with drag-and-drop zone, file list, progress bar
  • Vanilla JS (no external deps) handling: drag events, chunked upload (if large files), progress indication, error display
  • POSTs to /upload endpoint (assumed contract from Task 7)
  • Interface: Exports template string and static JS. Assumes JSON response format from Task 7.

Task 9: File Management UI

Deliverable: Admin interface templates and routes.

  • Create files_list.html template showing table of uploads with pagination, search by filename, delete buttons
  • Implement route handlers: GET /files (list view), POST /files/{id}/delete
  • Uses Task 6 (Repository) for data fetching
  • Interface: Route registration functions. Depends on Task 6 interface.

Task 10: Plugin Integration & Bootstrap

Deliverable: Entry point and wiring.

  • Create datasette_s3_uploads/__init__.py with register_routes(), prepare_connection() (for creating table on startup)
  • Dependency injection: Instantiate Task 2 (Config), Task 3 (S3), Task 6 (Repo), wire them into Task 7 and Task 9 routes
  • Serve static files (Task 8 JS) via register_static()
  • Interface: Standard Datasette plugin hooks. Depends on all other tasks' public APIs.

Coordination Notes for Agents:

  • Contract-First: Tasks 3-6 should define Protocol classes or abstract base classes so they can be mocked during development.
  • Shared Types: Task 1 owns the core data structures; all other tasks import from there.
  • Testing: Each task should include unit tests using mocks for its dependencies (e.g., Task 7 mocks the S3 client).
  • Naming Convention: Use consistent table name default (s3_uploads) and route prefix (/-/s3-uploads).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment