Date: January 14, 2026
Status: Implementation Planning
Related: See import-analysis-and-recommendations.md for detailed data analysis
This document specifies the 7 MCP tools to build for the Custom Import MCP server. Each tool is designed to handle specific import workflows based on real CSV data analysis from Data & AI Community Day Durban events.
Total Scope: ~1,250 lines of Python code using FastMCP framework
Purpose: Fast CSV validation WITHOUT database access
Why first: Every import workflow needs validation before processing
Parameters:
{
"file_path": str (required),
"import_type": enum (required) [
"quicket_registrations",
"quicket_checkins",
"walkup",
"sessionize",
"ratings"
]
}Returns:
{
"valid": bool,
"file_info": {
"format": "csv" | "xlsx",
"rows": int,
"columns": list[str]
},
"validation_results": {
"required_columns_present": bool,
"missing_columns": list[str],
"extra_columns": list[str],
"data_quality_score": float,
"duplicate_rows": int,
"missing_data_percentage": float
},
"errors": list[{row, field, error}],
"warnings": list[{row, field, warning}]
}Implementation Notes:
- CSV parsing + format-specific rules
- No database access (fast feedback ~50ms)
- Validates column presence, data types, required fields
- Detects duplicates within file
Complexity: ~150 lines
Purpose: Dry-run showing what WOULD happen (connects to database in read-only mode)
Why second: User confidence before committing changes
Parameters:
{
"file_path": str (required),
"import_type": enum (required),
"event_id": int (required for most types),
"duplicate_strategy": enum (optional, default="skip") ["skip", "update", "fail"],
"preview_limit": int (optional, default=10)
}Returns:
{
"summary": {
"total_rows": int,
"would_create": int,
"would_update": int,
"would_skip": int,
"would_fail": int
},
"sample_actions": [
{
"row": int,
"action": "create" | "update" | "skip" | "fail",
"record_type": "contact" | "ticket" | "session" | "rating",
"identifier": str,
"data_preview": {...},
"reason": str
}
],
"duplicates_found": [{identifier, action, reason}],
"errors": [{row, field, error}],
"warnings": [{row, field, warning}]
}Implementation Notes:
- Reads database to check for existing records
- Simulates import logic without writes
- Shows sample actions (first N rows)
- Slower than validate (~500ms+) but provides detailed impact analysis
Complexity: ~200 lines
Purpose: Import ticket purchases from Quicket CSV (37 columns)
Why critical: Most complex import, handles core registration workflow
Parameters:
{
"file_path": str (required),
"event_id": int (required),
"duplicate_strategy": enum (optional, default="skip") ["skip", "update", "fail"]
}Logic:
- Parse CSV with pandas (handle UTF-8 BOM)
- For each row:
- Parse contact data (email, first_name, surname, phone, company, employment_status)
- Check if contact exists by composite key
(email, first_name, surname)- If exists: use existing contact_id
- If not: create new contact
- Parse Areas of Interest (comma-separated with HTML entities)
- Decode HTML entities:
"&"→"&" - Split by comma:
"AI Coding, Data Engineering & Stream Processing" - For each topic: find or create in
interest_topics - Create junction records in
contact_interests
- Decode HTML entities:
- Check if ticket exists by
(ticket_number, event_id)- If exists: apply duplicate_strategy
- If not: create ticket with all fields:
- purchaser_email, order_number, marketing_source
- ticket_type, is_complimentary
- Ensure contact_role exists (attendee, event_id)
Returns:
{
"success": bool,
"summary": {
"total_rows": int,
"contacts_created": int,
"contacts_updated": int,
"tickets_created": int,
"tickets_updated": int,
"tickets_skipped": int,
"interests_processed": int
},
"errors": list[{row, error}],
"warnings": list[{row, warning}]
}Key Challenges:
- Composite key deduplication:
(email, first_name, surname)- email alone NOT unique - HTML entity decoding in interests field
- Purchaser vs attendee distinction
- Marketing source attribution
Complexity: ~250 lines (most complex parser)
Purpose: Import speakers + sessions from Sessionize CSV (25 columns)
Why in Phase 1: Must come BEFORE import_speaker_ratings due to FK dependency (speaker_ratings.session_id → speaker_sessions.id)
Parameters:
{
"file_path": str (required),
"event_id": int (required),
"duplicate_strategy": enum (optional, default="skip") ["skip", "update", "fail"]
}Logic:
- Parse CSV with pandas (handle multi-line bio fields)
- Group rows by
Session Id- Multi-speaker structure: One row per session-speaker pair
- First speaker row: Has complete session data
- Subsequent speaker rows: Have minimal session info
- For each session group:
- Create session record from first row:
- Store in
speaker_sessions: title, description, track, level, room, scheduled_at, duration_minutes, session_format - Store:
sessionize_session_idfor re-import tracking
- Store in
- For each speaker row in group:
- Find or create contact by composite key
(email, first_name, surname) - Create or update speaker enrichment in
speakerstable:- Store: sessionize_speaker_id, bio, tagline, shirt_size, profile_picture_url
- Link speaker to session via
speaker_sessions.speaker_id- Store: speaker_informed_at, speaker_confirmed_at
- Find or create contact by composite key
- Ensure contact_role exists (speaker, event_id)
- Create session record from first row:
Returns:
{
"success": bool,
"summary": {
"total_rows": int,
"sessions_created": int,
"sessions_updated": int,
"speakers_created": int,
"speakers_updated": int,
"speakers_linked": int
},
"multi_speaker_sessions": [
{
"session_id": int,
"title": str,
"speaker_count": int,
"speakers": list[str]
}
],
"errors": list[{row, error}],
"warnings": list[{row, warning}]
}Key Challenges:
- Multi-speaker grouping logic (group by Session Id)
- Session data in first row only vs speaker data in all rows
- Speaker enrichment vs basic contact data
- Real example: "Mirror-Code" session has 2 speakers (Shraddha + Pratish Neerputh)
Complexity: ~200 lines (multi-speaker grouping complexity)
Purpose: Import session evaluations with fuzzy matching
Why after Sessionize: References speaker_sessions.id via FK constraint - sessions must exist first
Parameters:
{
"file_path": str (required),
"event_id": int (required),
"fuzzy_match_threshold": float (optional, default=0.8)
}Logic:
- Parse CSV/Excel (support both formats)
- Load all sessions for this event from database
- For each rating row:
- Parse session identifier field:
- Strip number prefix:
"01. ","02. ", etc. - Strip speaker suffix:
" - Speaker Name(s)" - Result: Clean session title for matching
- Strip number prefix:
- Fuzzy match against loaded sessions (Levenshtein distance)
- Use
fuzzywuzzy.token_sort_ratio(handles word order differences) - Threshold default: 80% similarity
- Use
- If match score >= threshold:
- Get all speakers for matched session
- For EACH speaker:
- Create
speaker_ratingsrecord with SAME values:- topic_rating (informativeness: 1-7)
- presentation_rating (delivery: 1-7)
- time_investment_rating (1-7)
- improvement_suggestions (text)
- Create
- If no match: log warning with suggestions
- Parse session identifier field:
Returns:
{
"success": bool,
"summary": {
"total_rows": int,
"ratings_created": int,
"sessions_matched": int,
"sessions_unmatched": int,
"multi_speaker_sessions": int
},
"unmatched_sessions": [
{
"row": int,
"session_identifier": str,
"best_match": str,
"match_score": float,
"suggestion": str
}
],
"errors": list[{row, error}],
"warnings": list[{row, warning}]
}Key Challenges:
- Fuzzy string matching with 80%+ threshold
- Multi-speaker sessions: Create N ratings from 1 evaluation
- Rating scale is 1-7 (not 1-5 as originally assumed)
- Session identifier format variations
- FK dependency: Requires
speaker_sessionsrecords to exist
Complexity: ~200 lines (fuzzy matching + multi-speaker logic)
Purpose: Update attendance from post-event Quicket export
Parameters:
{
"file_path": str (required),
"event_id": int (required),
"update_contact_info": bool (optional, default=True)
}Logic:
- Parse CSV (SAME format as
import_quicket_registrations) - For each row:
- Find ticket by
(ticket_number, event_id) - If NOT found:
- Create contact + ticket (late registration case)
- If found:
- UPDATE:
checked_in = TRUE,checked_in_at = timestamp - If
update_contact_info=True:- Update contact fields (phone, company, employment_status)
- Update interests (if changed since registration)
- UPDATE:
- Find ticket by
Returns:
{
"success": bool,
"summary": {
"total_rows": int,
"tickets_updated": int,
"tickets_created": int,
"contact_info_updated": int
},
"errors": list[{row, error}],
"warnings": list[{row, warning}]
}Key Challenges:
- UPDATE operation vs CREATE (most tickets already exist)
- Late registration handling
- Contact info change detection
Complexity: ~150 lines (simpler - mostly updates)
Purpose: Microsoft Forms walk-ins (10 columns)
Parameters:
{
"file_path": str (required),
"event_id": int (required),
"duplicate_strategy": enum (optional, default="skip") ["skip", "update", "fail"],
"phone_cleanup": bool (optional, default=True)
}Logic:
- Parse CSV (simple 10-column format)
- For each row:
- Clean phone number (strip spaces, validate 10 digits if
phone_cleanup=True) - Find or create contact by composite key
(email, first_name, surname) - Check if contact already has ticket for this event
- If yes: apply duplicate_strategy
- If no: create ticket
- Set ticket fields:
ticket_number = -1(walk-in sentinel value)checked_in = TRUE(they're physically present)checked_in_at = completion_timepurchase_date = completion_time
- Clean phone number (strip spaces, validate 10 digits if
Returns:
{
"success": bool,
"summary": {
"total_rows": int,
"contacts_created": int,
"tickets_created": int,
"tickets_skipped": int,
"phone_numbers_cleaned": int
},
"phone_validation_warnings": list[{row, phone, issue}],
"errors": list[{row, error}],
"warnings": list[{row, warning}]
}Key Challenges:
- Phone number format variations (spaces, missing leading zeros)
- Data quality issues (duplicates, date format ambiguity)
- Simpler than Quicket (no interests, no purchaser tracking)
Complexity: ~100 lines (simplest import)
1. validate_import_data → Validate any CSV format
2. preview_import → Preview any import operation
3. import_quicket_registrations → Core registration workflow
4. import_sessionize_data → Create speaker_sessions (FK dependency!)
5. import_speaker_ratings → References speaker_sessions.id
Rationale:
- Tools 1-2 are infrastructure used by all imports
- Tool 3 handles the most common/complex use case (event registrations)
- Tool 4 MUST come before Tool 5 due to FK constraint:
speaker_ratings.session_idreferencesspeaker_sessions.id - Tool 5 provides high-value analytics (speaker/session feedback)
- After Phase 1: Can run complete workflow:
- Validate registration CSV
- Preview registration import
- Import registrations (attendees)
- Import sessions/speakers (creates speaker_sessions records)
- Import ratings (links to speaker_sessions) ✅
Estimated Lines: ~950 lines
6. import_quicket_checkins → Post-event attendance updates
7. import_walkup_registrations → On-site registration workflow
Rationale:
- Tool 6 builds on Tool 3 logic (same CSV format, UPDATE operations)
- Tool 7 is simplest import (good for quick win, no dependencies)
- After Phase 2: Complete import coverage for all data sources
Estimated Lines: ~250 lines
Critical FK constraints to respect:
-- speaker_ratings MUST reference existing speaker_sessions
ALTER TABLE speaker_ratings
ADD CONSTRAINT speaker_ratings_session_id_fkey
FOREIGN KEY (session_id) REFERENCES speaker_sessions(id);
-- tickets reference contacts (handled by all import tools)
ALTER TABLE tickets
ADD CONSTRAINT tickets_contact_id_fkey
FOREIGN KEY (contact_id) REFERENCES contacts(id);
-- speaker_sessions reference speakers (handled by sessionize import)
ALTER TABLE speaker_sessions
ADD CONSTRAINT speaker_sessions_speaker_id_fkey
FOREIGN KEY (speaker_id) REFERENCES speakers(id);Import order implications:
import_quicket_registrations→ Createscontacts,tickets✅import_sessionize_data→ Createscontacts,speakers,speaker_sessions✅import_speaker_ratings→ Createsspeaker_ratingsreferencingspeaker_sessions.id✅
This is why sessionize MUST come before ratings!
✅ Separate because:
- Different CSV formats (37 columns vs 10 columns vs 25 columns)
- Different operations (CREATE vs UPDATE vs LINK)
- Different workflows (pre-event vs post-event)
- Different complexity levels (simple walk-ins vs complex multi-speaker sessions)
- Follows MCP principle: "Atomic tools (one thing well), composable"
✅ Separate because:
validate_import_data= fast, no DB access (~50ms)preview_import= slower, reads DB (~500ms+)- Composable: Can validate multiple files, then preview one
- User can validate quickly, then decide which file to import
❌ Decision: Direct implementation instead
Reasons:
- FastMCP can't easily call other MCP servers (architectural limitation)
- csv-mcp-server provides 40+ generic operations
- We need format-specific validation (Quicket 37 columns, Sessionize multi-speaker structure)
- Direct implementation = fewer dependencies, more control
Dependencies:
[project]
dependencies = [
"fastmcp>=0.3.0", # MCP server framework
"asyncpg>=0.29.0", # Async PostgreSQL driver
"pandas>=2.0.0", # CSV parsing
"fuzzywuzzy[speedup]>=0.18.0", # Fuzzy string matching
"python-Levenshtein>=0.21.0", # Fast string similarity
]Database Access:
- Read
${NEON_DATABASE_URL}environment variable (same as Postgres MCP Pro) - Use
asyncpgconnection pool for performance - All operations in transactions (rollback on error)
Error Handling:
- Validate before import (fail fast)
- Preview before commit (user confirmation)
- Atomic transactions (all or nothing)
- Detailed error reporting (row number, field, issue)
mcp-servers/import-mcp/
├── pyproject.toml # Dependencies + FastMCP config
├── README.md # Usage guide with examples
├── src/import_mcp/
│ ├── server.py # FastMCP entry point (~100 lines)
│ ├── tools/
│ │ ├── validation.py # validate_import_data, preview_import (~350 lines)
│ │ ├── quicket.py # import_quicket_registrations/checkins (~400 lines)
│ │ ├── walkins.py # import_walkup_registrations (~100 lines)
│ │ ├── sessionize.py # import_sessionize_data (~200 lines)
│ │ └── ratings.py # import_speaker_ratings (~200 lines)
│ └── utils/
│ ├── db.py # Database connection pool (~50 lines)
│ ├── parsers.py # Interests, phone cleanup, HTML decode (~100 lines)
│ └── matching.py # Fuzzy matching algorithms (~100 lines)
└── tests/
├── test_parsers.py # Unit tests for parsing logic
├── test_integration.py # Integration tests with real CSVs
└── fixtures/ # Symlink to docs/import-files/*.csv
Total: ~1,600 lines including tests
Phase 1 Complete When:
- ✅ Can validate Quicket CSV (37 columns)
- ✅ Can preview import showing create/update counts
- ✅ Can import registrations with interests parsing
- ✅ Can import sessions/speakers (multi-speaker grouping)
- ✅ Can import ratings with 80%+ fuzzy matching (FK constraint satisfied)
- ✅ End-to-end workflow tested:
reg.csv→sessionize.csv→evals.csv
Phase 2 Complete When:
- ✅ Can update attendance from check-in CSV
- ✅ Can import walk-ins with phone cleanup
- ✅ All 7 tools tested with real CSV files
- ✅ Import MCP server added to
.mcp.jsonconfiguration
-
Setup Project Structure
- Create
mcp-servers/import-mcp/directory - Initialize
pyproject.tomlwith FastMCP + dependencies - Create module structure (server.py, tools/, utils/)
- Create
-
Build Phase 1 Tools (in order)
validate_import_data- Foundationpreview_import- User confidenceimport_quicket_registrations- Core workflowimport_sessionize_data- Create speaker_sessions (FK dependency!)import_speaker_ratings- Analytics value (depends on sessionize)
-
Test with Real Data
- Use
docs/import-files/reg.csv(50 rows) - Use
docs/import-files/sessionize.csv(25+ rows) - Use
docs/import-files/evals.csv(199 rows) - Verify FK constraints satisfied (sessions created before ratings)
- Use
-
Build Phase 2 Tools
import_quicket_checkinsimport_walkup_registrations
-
Integration & Documentation
- Add to
.mcp.jsonMCP server configuration - Update
README.mdwith usage examples - Update
CLAUDE.mdPhase 1 status → Complete
- Add to
- Detailed Data Analysis:
docs/planning/import-analysis-and-recommendations.md(32,000+ words) - Database Schema:
database/DATABASE.md+database/schema.sql - Real CSV Files:
docs/import-files/(5 files: reg, checkin, walkins, sessionize, evals) - FastMCP Documentation: https://github.com/jlowin/fastmcp
- MCP Protocol: https://modelcontextprotocol.io
Last Updated: January 14, 2026