This system aims to automate and enhance the process of researching and determining the most accurate resolution for prediction market proposals. It leverages LLM-based agents and a series of specialized modules to:
- Analyze complex proposal data, including questions, criteria, and possible outcomes.
- Generate targeted research prompts for each potential outcome.
- Conduct comprehensive, multi-faceted research using AI-powered agents and external data sources.
- Critically evaluate and compare gathered evidence using logical reasoning and syllogisms.
- Determine the most likely accurate outcome based on the analyzed evidence.
- Compare AI-derived conclusions against original proposal outcomes to resolve disputes.
- Provide detailed, well-cited reports supporting the final determination.
The system is designed to improve accuracy, reduce bias, and increase efficiency in resolving prediction market disputes. It can handle a wide range of topics, from politics and sports to business and cryptocurrency, adapting its research and analysis strategies accordingly.
See INSTALL.md
Run AI Pipeline on various datasets, evaluate outcomes, create reports, update AI pipeline, and repeat to refine results.
-
Update proposals:
python src/fetch_proposals.py
-
Process a single proposal:
python src/disputer.py
-
Batch process a group of proposals:
python src/batch.py --dataset=test
Advanced usage:
python src/batch.py \ --dataset=disputed_and_settled \ --limit=800 \ --model="openhermes" \ --randomize \ --cache_enabled \ --tag="pipeline-refined-01"
Use Perplexity for research:
python src/batch.py --dataset=disputed_and_settled \ --limit=800 \ --model="openhermes" \ --randomize \ --cache_enabled \ --tag="perplexity20" \ --use_perplexity \ --prioritize_disputes
Use gpt-4o-mini-2024-07-18:
python src/batch.py --dataset=disputed_and_settled \ --limit=800 \ --model="gpt-4o-mini-2024-07-18" \ --randomize \ --cache_enabled \ --tag="gpt-4o-mini" \ --prioritize_disputes
To run the batch processing continuously with specific parameters, use the following command:
while true do python src/batch.py \ --dataset=disputed \ --limit=800 \ --model="gpt-4o-mini-2024-07-18" \ --randomize \ --cache_enabled \ --tag="gpt-4o-mini" \ --prioritize_disputes done
Options explained:
--dataset: Specifies which dataset to use--limit: Maximum number of proposals to process--model: The LLM model to use (e.g., "openhermes")--randomize: Randomize the order of proposals--cache_enabled: Enable caching for faster subsequent runs--tag: Add a custom tag to the output for easier identification--use_perplexity: Substitute Perplexity for CrewAI research agents--prioritize_disputes: Proposals where proposed != settlement are moved to the front of the queue -
Analyze Outcomes:
- Evaluate AI Pipeline Outcomes with src/analysis.ipynb
jupyter notebook src/analysis.ipynb
python src/create_report.py --name "v0.3 test" -
Update AI Pipeline to improve outcomes (Preprocessing, LLMs, prompts, workflow, etc.) and repeat.
Each proposal includes the following fields:
title: The title of the proposal.description: The description of the proposal.res_data: The resolution data of the proposal, outlining potential outcomes.url: The URL to additional information or the source of the proposal.proposedResult: The originally proposed result (p1, p2, p3, p4). Note: This field is only used for result comparison, never for model input, to prevent bias.
- Answer: The most accurate answer based on research (p1, p2, p3, p4).
- Evidence: Supporting evidence for the answer, including links to relevant sources.
The research portion of the workflow uses CrewAI.
To use Perplexity instead, add --use_perplexity to your batch or disputer command.
- Query Constructor Agent: Constructs search queries based on the proposal details.
- Web Researcher Agent: Conducts the web search to gather relevant information.
- Evidence Collector Agent: Formats the final output according to specified criteria.
- Query Construction: Constructs search queries to ensure relevance.
- Web Research: Searches the web based on the constructed queries to gather evidence.
- Collect Evidence: Formats the final answer and prepares a detailed report of the findings.
- Ollama OpenHermes: Provides AI capabilities for generating research prompts and analyzing results.
- Google Search API (via Serper): Utilized for executing web searches based on constructed queries.
- SearXNG Search API: Web search
- Browserless Scrape API: Web browse
The Disputer system follows a structured process from receiving a proposal to delivering a finalized conclusion.
- Prompt Generation: Based on the proposal, generate detailed prompts for each possible outcome.
- Conduct Research: Parallel processing of research tasks for each prompt using CrewAI (or Perplexity)
- Evidence Comparison: Using AI, compare the evidence gathered from all possible outcomes to determine the most accurate.
- Result Comparison: Compare the AI’s conclusion with the original proposal outcomes to resolve any disputes.
graph TD
PG[Prompt Generation]
PG --> CR[Create Research Crews]
CE[Compare Evidence]
CE --> RC[Results Comparison]
subgraph PG [Prompt Generation]
AI[Generate Prompts] --> GI[Generate Individual Prompts]
GI --> CP[Consolidate Prompts]
end
subgraph CR [Research Crews]
R1[Research Crew for p1]
R2[Research Crew for p2]
R3[Research Crew for p3]
end
subgraph CREW [Research Crew]
QCA[Query Constructor Agent]
QCA --> WRA[Web Researcher Agent]
WRA --> ECA[Evidence Collector Agent]
WRA --> QCA
ECA --> WRA
subgraph QCA [Query Constructor Agent]
GQ[Generate and Validate Queries]
end
subgraph WRA [Web Researcher Agent]
SR[Search for Relevant Information]
end
subgraph ECA [Evidence Collector Agent]
FE[Format & Analyze]
end
end
subgraph CE [Compare Evidence]
OE[Evaluate Evidence] --> DR[Determine Result]
end
subgraph RC [Results Comparison]
CD[Compare with Expected Outcome] --> DS[Dispute Status Evaluation]
end
CR --> CREW
CREW --> CE
style PG fill:#bb86fc,stroke:#333,stroke-width:1px
style CR fill:#6200ee,stroke:#333,stroke-width:1px
style CREW fill:#6200ee,stroke:#333,stroke-width:1px
style QCA fill:#3700b3,stroke:#333,stroke-width:1px
style WRA fill:#3700b3,stroke:#333,stroke-width:1px
style ECA fill:#3700b3,stroke:#333,stroke-width:1px
style CE fill:#018786,stroke:#333,stroke-width:1px
style RC fill:#018786,stroke:#333,stroke-width:1px
graph TD
A[Raw Proposal Data] --> B[Resolution Criteria Refinement]
subgraph PG [Resolution Criteria Refinement]
B --> B1[Classify Resolution Type]
B1 --> B2[Generate Refined Summaries]
B2 --> B3[Consolidate Summaries]
B3 --> B4[Convert to Structured Answers]
B4 --> B5[Reformat Description]
B5 --> B6[Reformat Title]
end
B6 --> C[Prompt Generation]
subgraph CR [Prompt Generation]
C --> C1[Generate Outcome Prompts]
C --> C2[Generate General Prompt]
C1 --> C3[Critique and Refine]
C2 --> C3
C3 --> C4[Choose Best Prompts]
end
C4 --> D[Research Conduction]
subgraph CREW [Research Conduction]
D --> D1[Create Research Crews]
D1 --> D2[Parallel Research Execution]
D2 --> D3[Web Search]
D2 --> D4[Content Extraction]
D3 --> D5[Collect Results]
D4 --> D5
end
D5 --> E[Evidence Comparison]
subgraph CE [Evidence Comparison]
E --> E1[Filter Evidence]
E1 --> E2[Consolidate Evidence]
E2 --> E3[Extract Key Evidence]
E3 --> E4[Evaluate Outcomes]
E4 --> E5[Consolidate Evaluations]
E5 --> E6[Determine Likely Answer]
E6 --> E7[Generate Final Report]
end
E7 --> F[Result Comparison]
subgraph RC [Result Comparison]
F --> F1[Compare Results]
F1 --> F2[Evaluate Dispute]
F2 --> F3[Classify Proposal]
F3 --> F4[Summarize Result]
end
F4 --> G[Final Dispute Resolution]
subgraph QCA [External Components]
H[Language Models]
I[Web Search APIs]
J[CrewAI Setup]
K[Perplexity]
end
H -.-> D2
H -.-> E4
I -.-> D3
J -.-> D1
K -.-> D1
style PG fill:#bb86fc,stroke:#333,stroke-width:1px
style CR fill:#6200ee,stroke:#333,stroke-width:1px
style CREW fill:#6200ee,stroke:#333,stroke-width:1px
style QCA fill:#3700b3,stroke:#333,stroke-width:1px
style CE fill:#018786,stroke:#333,stroke-width:1px
style RC fill:#018786,stroke:#333,stroke-width:1px
graph TD
subgraph R [Research]
R1[Report 1]
R2[Report 2]
R3[Report 3]
R4[Report 4]
end
subgraph E [Evaluate]
O1[p1]
O2[p2]
O3[p3]
end
R1 --> O1 & O2 & O3
R2 --> O1 & O2 & O3
R3 --> O1 & O2 & O3
R4 --> O1 & O2 & O3
O1 --> |True/False| C[Consolidate Results]
O2 --> |True/False| C
O3 --> |True/False| C
C --> D{Clear Result?}
D -->|Yes| P[Predict Final Outcome]
D -->|No| FB[Fallback Process]
subgraph FB [Fallback Process]
KE[Extract Key Evidence]
E1[Evaluation 1]
E2[Evaluation 2]
E3[Evaluation 3]
FR[Find Most Common Result]
end
FB --> KE
KE --> E1 & E2 & E3
E1 & E2 & E3 --> FR
FR --> P
style R fill:#bb86fc,stroke:#333,stroke-width:1px
style E fill:#6200ee,stroke:#333,stroke-width:1px
style FB fill:#6200ee,stroke:#333,stroke-width:1px
style C fill:#018786,stroke:#333,stroke-width:1px
style P fill:#018786,stroke:#333,stroke-width:1px
The system is structured to run as a cohesive unit with each component interacting seamlessly. The src/batch.py or src/disputer.py scripts serve as the entry point, orchestrating the overall process and managing exceptions and logging.
For a detailed look at each component, refer to the source files in the src/workflow directory:
resolution_refiner.pyprompt_generator.pyresearch_manager.pyperplexity_research.pyevidence_comparator.pyresult_comparator.py
- clean_logs.sh to clean
/logs - init.sh to set up the environment and install requirements
No License














