I want to build a Datasette plugin that offers a UI to upload files to an S3 bucket and stores information about them in a SQLite table. Break this down into ten tasks suitable for execution by parallel coding agents.
Reasoning trace
The user wants to build a Datasette plugin that:
- Offers a UI to upload files to an S3 bucket
- Stores information about them in a SQLite table
They want this broken down into 10 tasks suitable for parallel coding agents. This means the tasks should be:
- Independent or minimally dependent on each other
- Well-defined interfaces
- Concrete deliverables
- Testable
Let me think about the architecture:
Core Components:
- S3 Integration - Uploading files to S3 (boto3/aiobotocore)
- Database Schema - SQLite table structure for file metadata
- UI Components - HTML forms, templates, JavaScript for file upload
- API Endpoints - Datasette plugin hooks (register_routes, etc.)
- Configuration - AWS credentials, bucket settings, table names
- File Processing - Validation, MIME type detection, size limits
- Metadata Extraction - EXIF data, file headers, etc.
- Security/Auth - Permission checks, file validation
- Error Handling - Retry logic, error messages, logging
- Testing/Integration - Unit tests, integration tests
Task Breakdown Strategy:
Since these need to be parallel-friendly, I should define clear interfaces between them:
- Database Schema & Models - Define the SQLite table structure and Python dataclasses/Pydantic models
- Configuration System - Handle Datasette plugin configuration (AWS keys, bucket, table name)
- S3 Client Abstraction - Wrapper around boto3 for uploads, with async support
- File Validation Service - MIME type checking, size limits, extension validation
- Metadata Extraction Service - Extract file metadata (size, checksum, EXIF if image)
- Database Repository Layer - CRUD operations for the files table
- Upload API Endpoint - HTTP endpoint handling multipart uploads
- Frontend Upload UI - HTML template + JavaScript for drag-and-drop/file selection
- File Listing/Admin UI - Display uploaded files, search, pagination
- Plugin Integration & Wiring - Datasette hooks (prepare_connection, register_routes), plugin.py entry point
Dependencies Analysis:
- Task 1 (Schema) and Task 2 (Config) are independent
- Task 3 (S3) depends on Task 2 (Config) for credentials
- Task 4 (Validation) and Task 5 (Metadata) are independent utilities
- Task 6 (Repository) depends on Task 1 (Schema)
- Task 7 (API) depends on Tasks 3, 4, 5, 6
- Task 8 (UI) depends on Task 7 (API endpoint URL)
- Task 9 (Listing) depends on Task 6 (Repository)
- Task 10 (Integration) depends on everything but mainly 7, 8, 9
Actually, for true parallelism, I should define interfaces so that tasks can mock their dependencies.
Let me refine:
- Database Schema Definition - SQL schema, migration logic, Python models
- Configuration & Settings - Pydantic settings class, Datasette plugin config parsing
- S3 Storage Backend - Abstract storage interface + S3 implementation (can mock)
- File Validation Module - Pure function validators (MIME, size, extensions)
- Metadata Extraction Engine - Extract EXIF, hashes, etc. (pure functions)
- Data Access Layer - Repository pattern for SQLite operations (depends on schema interface)
- Upload HTTP Handler - FastAPI/Datasette route handler (depends on storage, validation, metadata, DAL interfaces)
- Frontend Upload Component - HTML/JS form with progress bar (assumes REST endpoint contract)
- File Management UI - Table view, delete functionality (depends on DAL interface)
- Plugin Bootstrap & Wiring - Entry point, hook implementations, dependency injection container
Interface Contracts:
Each task should define its interface:
- Schema: Provides
get_create_table_sql(),FileRecorddataclass - Config: Provides
S3Configdataclass with from_datasette_plugin_config() - Storage:
AbstractStoragewithasync def upload(file_bytes, key) -> URLandS3Storageimplementation - Validation:
validate_file(file_bytes, filename) -> ValidationResult - Metadata:
extract_metadata(file_bytes, filename) -> FileMetadata - Repository:
FileRepositorywithinsert(),get(),list(),delete()(uses FileRecord from #1) - Upload Handler: Implements Datasette
register_routes, expects StorageInterface, ValidatorInterface, RepoInterface - Frontend: Provides Jinja2 templates and static JS, assumes
/uploadendpoint accepts multipart/form-data - Management UI: Template + route handlers, assumes RepoInterface
- Plugin: Combines all, registers routes, initializes DB
This way agents can work in parallel using mock/stub interfaces.
Let me write these out clearly with specific deliverables and interface definitions.
Here is a decomposition into 10 parallelizable tasks with defined interfaces, allowing agents to work simultaneously using contract-first development.
Deliverable: SQL migration file and Python dataclasses.
- Define SQLite table schema (
filestable with columns:id,filename,s3_key,bucket,content_type,size_bytes,checksum_sha256,metadata_json,created_at) - Create Pydantic/dataclass models:
FileRecord,FileCreateRequest - Provide
get_migration_sql()function that returns DDL - Interface: Exports
FileRecordclass and schema version constant
Deliverable: Settings parser and validation.
- Create
PluginConfigPydantic model with fields:aws_access_key_id,aws_secret_access_key,s3_bucket,s3_region,s3_endpoint_url(optional),max_file_size_mb,allowed_mime_types,db_table_name - Implement
load_config(datasette_instance) -> PluginConfigthat reads fromdatasette.metadata(["plugins", "datasette-s3-uploads"]) - Validate AWS credentials on startup (test connection)
- Interface: Exports
PluginConfigclass andget_config(datasette)function
Deliverable: Abstract storage interface and S3 implementation.
- Define
AbstractStorageprotocol withasync upload(file_bytes: bytes, key: str, content_type: str) -> str(returns URL) andasync delete(key: str) -> bool - Implement
S3Storageclass usingaiobotocoreorboto3(async wrapper) - Include retry logic with exponential backoff
- Interface: Exports
AbstractStorageprotocol andS3Storageclass. Depends onPluginConfigfrom Task 2 for initialization.
Deliverable: Pure validation functions.
- Implement
validate_file(file_bytes: bytes, filename: str, config: ValidationRules) -> ValidationResult - Checks: file size limits, MIME type against magic numbers (python-magic), extension whitelist/blacklist
- Return structured error messages suitable for UI display
- Interface: Exports
ValidationRulesdataclass andvalidate_file()function. No external dependencies.
Deliverable: Metadata extraction utilities.
- Implement
extract_metadata(file_bytes: bytes, filename: str) -> FileMetadata - Extract: SHA256 checksum, file size, MIME type, image dimensions (if PIL can open), EXIF data (if available)
- Return JSON-serializable dict
- Interface: Exports
FileMetadatadataclass andextract_metadata()function. No external dependencies.
Deliverable: Database CRUD operations.
- Implement
FileRepositoryclass with methods:async create(record: FileRecord),async get(file_id: int),async list(limit, offset),async delete(file_id),async get_by_checksum(checksum) - Handle SQLite async operations using
datasette.databases["dbname"].execute_write_fn()or similar - Interface: Exports
FileRepositoryclass. Depends onFileRecordfrom Task 1 andPluginConfigfrom Task 2 (for table name).
Deliverable: HTTP route handler.
- Implement
upload_handler(request: Request)that handlesmultipart/form-data - Orchestrates: Task 4 (validation) → Task 5 (metadata) → Task 3 (S3 upload) → Task 6 (DB insert)
- Return JSON:
{success: bool, file_id: int, url: str}or error details - Handle streaming for large files (spool to disk if needed)
- Interface: Function suitable for
datasette.add_route(). Depends on interfaces of Tasks 2, 3, 4, 5, 6.
Deliverable: HTML template and JavaScript.
- Create Jinja2 template
upload_form.htmlwith drag-and-drop zone, file list, progress bar - Vanilla JS (no external deps) handling: drag events, chunked upload (if large files), progress indication, error display
- POSTs to
/uploadendpoint (assumed contract from Task 7) - Interface: Exports template string and static JS. Assumes JSON response format from Task 7.
Deliverable: Admin interface templates and routes.
- Create
files_list.htmltemplate showing table of uploads with pagination, search by filename, delete buttons - Implement route handlers:
GET /files(list view),POST /files/{id}/delete - Uses Task 6 (Repository) for data fetching
- Interface: Route registration functions. Depends on Task 6 interface.
Deliverable: Entry point and wiring.
- Create
datasette_s3_uploads/__init__.pywithregister_routes(),prepare_connection()(for creating table on startup) - Dependency injection: Instantiate Task 2 (Config), Task 3 (S3), Task 6 (Repo), wire them into Task 7 and Task 9 routes
- Serve static files (Task 8 JS) via
register_static() - Interface: Standard Datasette plugin hooks. Depends on all other tasks' public APIs.
Coordination Notes for Agents:
- Contract-First: Tasks 3-6 should define
Protocolclasses or abstract base classes so they can be mocked during development. - Shared Types: Task 1 owns the core data structures; all other tasks import from there.
- Testing: Each task should include unit tests using mocks for its dependencies (e.g., Task 7 mocks the S3 client).
- Naming Convention: Use consistent table name default (
s3_uploads) and route prefix (/-/s3-uploads).