Skip to content

Instantly share code, notes, and snippets.

@chrismaree
Last active June 16, 2025 16:10
Show Gist options
  • Select an option

  • Save chrismaree/278a73cc11e0d6a1d8115942f7ced474 to your computer and use it in GitHub Desktop.

Select an option

Save chrismaree/278a73cc11e0d6a1d8115942f7ced474 to your computer and use it in GitHub Desktop.

Disputer Function

Goal

This system aims to automate and enhance the process of researching and determining the most accurate resolution for prediction market proposals. It leverages LLM-based agents and a series of specialized modules to:

  1. Analyze complex proposal data, including questions, criteria, and possible outcomes.
  2. Generate targeted research prompts for each potential outcome.
  3. Conduct comprehensive, multi-faceted research using AI-powered agents and external data sources.
  4. Critically evaluate and compare gathered evidence using logical reasoning and syllogisms.
  5. Determine the most likely accurate outcome based on the analyzed evidence.
  6. Compare AI-derived conclusions against original proposal outcomes to resolve disputes.
  7. Provide detailed, well-cited reports supporting the final determination.

The system is designed to improve accuracy, reduce bias, and increase efficiency in resolving prediction market disputes. It can handle a wide range of topics, from politics and sports to business and cryptocurrency, adapting its research and analysis strategies accordingly.

Installation

See INSTALL.md

Usage

Run AI Pipeline on various datasets, evaluate outcomes, create reports, update AI pipeline, and repeat to refine results.

  1. Update proposals:

    python src/fetch_proposals.py
  2. Process a single proposal:

    python src/disputer.py
  3. Batch process a group of proposals:

    python src/batch.py --dataset=test

    Advanced usage:

    python src/batch.py \
    --dataset=disputed_and_settled \
    --limit=800 \
    --model="openhermes" \
    --randomize \
    --cache_enabled \
    --tag="pipeline-refined-01"

    Use Perplexity for research:

    python src/batch.py --dataset=disputed_and_settled \
    --limit=800 \
    --model="openhermes" \
    --randomize \
    --cache_enabled \
    --tag="perplexity20" \
    --use_perplexity \
    --prioritize_disputes

    Use gpt-4o-mini-2024-07-18:

    python src/batch.py --dataset=disputed_and_settled \
    --limit=800 \
    --model="gpt-4o-mini-2024-07-18" \
    --randomize \
    --cache_enabled \
    --tag="gpt-4o-mini" \
    --prioritize_disputes

    To run the batch processing continuously with specific parameters, use the following command:

    while true
    do
    python src/batch.py \
        --dataset=disputed \
        --limit=800 \
        --model="gpt-4o-mini-2024-07-18" \
        --randomize \
        --cache_enabled \
        --tag="gpt-4o-mini" \
        --prioritize_disputes    
    done

    Options explained:

    --dataset: Specifies which dataset to use --limit: Maximum number of proposals to process --model: The LLM model to use (e.g., "openhermes") --randomize: Randomize the order of proposals --cache_enabled: Enable caching for faster subsequent runs --tag: Add a custom tag to the output for easier identification --use_perplexity: Substitute Perplexity for CrewAI research agents --prioritize_disputes: Proposals where proposed != settlement are moved to the front of the queue

  4. Analyze Outcomes:

    jupyter notebook src/analysis.ipynb
    python src/create_report.py --name "v0.3 test"
  5. Update AI Pipeline to improve outcomes (Preprocessing, LLMs, prompts, workflow, etc.) and repeat.

Proposal JSON Object Input

Each proposal includes the following fields:

  • title: The title of the proposal.
  • description: The description of the proposal.
  • res_data: The resolution data of the proposal, outlining potential outcomes.
  • url: The URL to additional information or the source of the proposal.
  • proposedResult: The originally proposed result (p1, p2, p3, p4). Note: This field is only used for result comparison, never for model input, to prevent bias.

Expected Output

  1. Answer: The most accurate answer based on research (p1, p2, p3, p4).
  2. Evidence: Supporting evidence for the answer, including links to relevant sources.

System Components

The research portion of the workflow uses CrewAI.

To use Perplexity instead, add --use_perplexity to your batch or disputer command.

Agents

  • Query Constructor Agent: Constructs search queries based on the proposal details.
  • Web Researcher Agent: Conducts the web search to gather relevant information.
  • Evidence Collector Agent: Formats the final output according to specified criteria.

Tasks

  • Query Construction: Constructs search queries to ensure relevance.
  • Web Research: Searches the web based on the constructed queries to gather evidence.
  • Collect Evidence: Formats the final answer and prepares a detailed report of the findings.

Tools

  • Ollama OpenHermes: Provides AI capabilities for generating research prompts and analyzing results.
  • Google Search API (via Serper): Utilized for executing web searches based on constructed queries.
  • SearXNG Search API: Web search
  • Browserless Scrape API: Web browse

Workflow

The Disputer system follows a structured process from receiving a proposal to delivering a finalized conclusion.

  1. Prompt Generation: Based on the proposal, generate detailed prompts for each possible outcome.
  2. Conduct Research: Parallel processing of research tasks for each prompt using CrewAI (or Perplexity)
  3. Evidence Comparison: Using AI, compare the evidence gathered from all possible outcomes to determine the most accurate.
  4. Result Comparison: Compare the AI’s conclusion with the original proposal outcomes to resolve any disputes.
graph TD

    PG[Prompt Generation]
    PG --> CR[Create Research Crews]
    CE[Compare Evidence]
    CE --> RC[Results Comparison]

    subgraph PG [Prompt Generation]
       AI[Generate Prompts] --> GI[Generate Individual Prompts]
       GI --> CP[Consolidate Prompts]
    end

    subgraph CR [Research Crews]
       R1[Research Crew for p1]
       R2[Research Crew for p2]
       R3[Research Crew for p3]
    end

    subgraph CREW [Research Crew]
        QCA[Query Constructor Agent]
        QCA --> WRA[Web Researcher Agent]
        WRA --> ECA[Evidence Collector Agent]
        WRA --> QCA
        ECA --> WRA

        subgraph QCA [Query Constructor Agent]
            GQ[Generate and Validate Queries]
        end

        subgraph WRA [Web Researcher Agent]
            SR[Search for Relevant Information]
        end

        subgraph ECA [Evidence Collector Agent]
            FE[Format & Analyze]
        end

    end

    subgraph CE [Compare Evidence]
       OE[Evaluate Evidence] --> DR[Determine Result]
    end

    subgraph RC [Results Comparison]
       CD[Compare with Expected Outcome] --> DS[Dispute Status Evaluation]
    end

    CR --> CREW
    CREW --> CE

    style PG fill:#bb86fc,stroke:#333,stroke-width:1px
    style CR fill:#6200ee,stroke:#333,stroke-width:1px
    style CREW fill:#6200ee,stroke:#333,stroke-width:1px
    style QCA fill:#3700b3,stroke:#333,stroke-width:1px
    style WRA fill:#3700b3,stroke:#333,stroke-width:1px
    style ECA fill:#3700b3,stroke:#333,stroke-width:1px
    style CE fill:#018786,stroke:#333,stroke-width:1px
    style RC fill:#018786,stroke:#333,stroke-width:1px

Loading

Detailed Pipeline

graph TD
    A[Raw Proposal Data] --> B[Resolution Criteria Refinement]
    
    subgraph PG [Resolution Criteria Refinement]
    B --> B1[Classify Resolution Type]
    B1 --> B2[Generate Refined Summaries]
    B2 --> B3[Consolidate Summaries]
    B3 --> B4[Convert to Structured Answers]
    B4 --> B5[Reformat Description]
    B5 --> B6[Reformat Title]
    end
    
    B6 --> C[Prompt Generation]
    
    subgraph CR [Prompt Generation]
    C --> C1[Generate Outcome Prompts]
    C --> C2[Generate General Prompt]
    C1 --> C3[Critique and Refine]
    C2 --> C3
    C3 --> C4[Choose Best Prompts]
    end
    
    C4 --> D[Research Conduction]
    
    subgraph CREW [Research Conduction]
    D --> D1[Create Research Crews]
    D1 --> D2[Parallel Research Execution]
    D2 --> D3[Web Search]
    D2 --> D4[Content Extraction]
    D3 --> D5[Collect Results]
    D4 --> D5
    end
    
    D5 --> E[Evidence Comparison]
    
    subgraph CE [Evidence Comparison]
    E --> E1[Filter Evidence]
    E1 --> E2[Consolidate Evidence]
    E2 --> E3[Extract Key Evidence]
    E3 --> E4[Evaluate Outcomes]
    E4 --> E5[Consolidate Evaluations]
    E5 --> E6[Determine Likely Answer]
    E6 --> E7[Generate Final Report]
    end
    
    E7 --> F[Result Comparison]
    
    subgraph RC [Result Comparison]
    F --> F1[Compare Results]
    F1 --> F2[Evaluate Dispute]
    F2 --> F3[Classify Proposal]
    F3 --> F4[Summarize Result]
    end
    
    F4 --> G[Final Dispute Resolution]
    
    subgraph QCA [External Components]
    H[Language Models]
    I[Web Search APIs]
    J[CrewAI Setup]
    K[Perplexity]
    end
    
    H -.-> D2
    H -.-> E4
    I -.-> D3
    J -.-> D1
    K -.-> D1

    style PG fill:#bb86fc,stroke:#333,stroke-width:1px
    style CR fill:#6200ee,stroke:#333,stroke-width:1px
    style CREW fill:#6200ee,stroke:#333,stroke-width:1px
    style QCA fill:#3700b3,stroke:#333,stroke-width:1px
    style CE fill:#018786,stroke:#333,stroke-width:1px
    style RC fill:#018786,stroke:#333,stroke-width:1px
Loading

Outcome Evaluation

graph TD
    subgraph R [Research]
        R1[Report 1]
        R2[Report 2]
        R3[Report 3]
        R4[Report 4]
    end

    subgraph E [Evaluate]
        O1[p1]
        O2[p2]
        O3[p3]
    end

    R1 --> O1 & O2 & O3
    R2 --> O1 & O2 & O3
    R3 --> O1 & O2 & O3
    R4 --> O1 & O2 & O3

    O1 --> |True/False| C[Consolidate Results]
    O2 --> |True/False| C
    O3 --> |True/False| C

    C --> D{Clear Result?}
    D -->|Yes| P[Predict Final Outcome]
    D -->|No| FB[Fallback Process]

    subgraph FB [Fallback Process]
        KE[Extract Key Evidence]
        E1[Evaluation 1]
        E2[Evaluation 2]
        E3[Evaluation 3]
        FR[Find Most Common Result]
    end

    FB --> KE
    KE --> E1 & E2 & E3
    E1 & E2 & E3 --> FR
    FR --> P

    style R fill:#bb86fc,stroke:#333,stroke-width:1px
    style E fill:#6200ee,stroke:#333,stroke-width:1px
    style FB fill:#6200ee,stroke:#333,stroke-width:1px
    style C fill:#018786,stroke:#333,stroke-width:1px
    style P fill:#018786,stroke:#333,stroke-width:1px
Loading

System Setup and Execution

The system is structured to run as a cohesive unit with each component interacting seamlessly. The src/batch.py or src/disputer.py scripts serve as the entry point, orchestrating the overall process and managing exceptions and logging.

For a detailed look at each component, refer to the source files in the src/workflow directory:

  • resolution_refiner.py
  • prompt_generator.py
  • research_manager.py
  • perplexity_research.py
  • evidence_comparator.py
  • result_comparator.py

Example Reports

Additional Scripts

Contributors

License

No License

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment