Skip to content

Instantly share code, notes, and snippets.

@simonw
Last active March 26, 2025 06:39
Show Gist options
  • Save simonw/481f46a779f20d8650b4fc5854f070c5 to your computer and use it in GitHub Desktop.
Save simonw/481f46a779f20d8650b4fc5854f070c5 to your computer and use it in GitHub Desktop.
files-to-prompt sqlite-utils llm llm-gemini llm-anthropic -e md -e py -c | \
  llm -m gemini-2.5-pro-exp-03-25 -s '
Provide a detailed design proposal for adding tool
calling support to LLM

Different model plugins will have different ways of
executing tools, but LLM itself should provide both
a Python and a CLI abstraction over these
differences.

Tool calling involves passing tool definitions to a
model, the model then replies with tools it would
like executed, the harness code then executes those
and passes the result back to the model and
continues in a loop until the model stops
requesting tools. This should happen as part of the
existing llm.Conversation mechanism.

At the Python layer tools should be provided as
Python functions that use type hints, which can
then be converted into JSON schema using Pydantic -
similar to how the existing schema= mechanism
works.

For the CLI option tools will be provided as an
option passed to the command, similar to how the
sqlite-utils convert CLI mechanism works.'

Okay, here is a detailed design proposal for adding tool calling support to llm.

1. Goals

  1. Integrate Tool Calling: Enable LLMs that support tool calling (function calling) to request the execution of user-provided tools within a conversation.
  2. Python API: Allow users to define tools as Python functions with type hints. Automatically generate JSON Schema definitions using Pydantic. Integrate seamlessly with llm.Conversation.
  3. CLI API: Allow users to specify tools via the command line, likely by referencing Python files containing tool functions.
  4. Abstraction: The core llm library should provide a consistent interface for users, abstracting away the specific implementation details of different model providers (OpenAI, Anthropic, Gemini, etc.).
  5. Plugin System: Model plugins should be responsible for translating between LLM's internal tool representation and the specific API format of the underlying model.
  6. Conversation Context: Tool calls and their results should become part of the ongoing conversation history sent back to the model.

2. Core Concepts & Data Structures

  1. Tool Definition: A representation of a tool that can be understood by both the user's code and the LLM (after conversion).
    • Python: A standard Python function with type hints and a docstring.
    • Internal/Schema: A JSON Schema object describing the function's name, description, and parameters (types, descriptions, required fields).
  2. Tool Call Request: A structured request from the LLM to execute a specific tool with specific arguments.
    • Contains: Tool name, arguments (dictionary), potentially a unique call ID if the API provides one.
  3. Tool Result: The outcome of executing a requested tool.
    • Contains: Tool name, call ID (if applicable), the return value (JSON serializable), or an error message if execution failed.

3. Python API Design (llm library)

3.1. Defining Tools

Tools will be defined as standard Python functions using type hints and docstrings. Pydantic will be used internally to generate the JSON Schema.

import llm
from pydantic import BaseModel, Field
from typing import Literal

def get_current_weather(
    location: str = Field(..., description="The city and state, e.g. San Francisco, CA"),
    unit: Literal["celsius", "fahrenheit"] = "fahrenheit"
) -> dict:
    """
    Get the current weather in a given location.

    Args:
        location: The city and state, e.g. San Francisco, CA
        unit: The unit to use for the temperature, celsius or fahrenheit
    """
    # Implementation would go here - fetch weather data
    if "tokyo" in location.lower():
        return {"location": location, "temperature": "10", "unit": unit, "forecast": ["sunny", "windy"]}
    elif "san francisco" in location.lower():
        return {"location": location, "temperature": "72", "unit": unit, "forecast": ["cloudy", "foggy"]}
    else:
        return {"location": location, "temperature": "unknown"}

# Example using a Pydantic model for more complex arguments
class GetNthFibonacciRequest(BaseModel):
    n: int = Field(..., description="The index of the Fibonacci number to calculate (0-based)", ge=0, le=100)

def get_nth_fibonacci(req: GetNthFibonacciRequest) -> dict:
    """Calculates the nth Fibonacci number."""
    a, b = 0, 1
    if req.n == 0: return {"result": a}
    for _ in range(req.n - 1):
        a, b = b, a + b
    return {"result": b}

# List of tools to be passed to the model
my_tools = [get_current_weather, get_nth_fibonacci]
  • Docstrings: The main function docstring becomes the tool's description. Parameter descriptions can be added using Field(..., description="...") or potentially extracted from standard docstring formats (e.g., Google style) as a fallback.
  • Type Hints: Used by Pydantic to generate the JSON Schema parameter types (string, integer, number, boolean, array, object). Literal is used for enums. Pydantic models can be used for complex nested arguments.
  • Return Type: The function should return a JSON-serializable dictionary or other value.

3.2. Using Tools in Conversations

A new tools parameter will be added to Conversation.prompt() (and AsyncConversation.prompt()).

import llm

model = llm.get_model("gpt-4o") # Or another tool-capable model
conversation = model.conversation()

# The tool calling loop is handled internally by prompt()
response = conversation.prompt(
    "What's the weather like in San Francisco and what is the 5th Fibonacci number?",
    tools=my_tools
)

# The final response from the model after tool execution
print(response.text())

Internal Flow:

  1. conversation.prompt() receives the tools list.
  2. It converts each function into its JSON Schema representation using a helper function (e.g., llm.function_to_json_schema(func)).
  3. It passes the prompt text and the list of tool schemas to the underlying model.execute() method.
  4. The model.execute() implementation (within the plugin) interacts with the specific model API, providing the tool schemas.
  5. If the model API returns a request to call a tool, model.execute() yields a standardized ToolCallRequest object (defined in llm).
  6. conversation.prompt() (or its internal iterator) intercepts ToolCallRequest.
  7. It finds the corresponding Python function in the my_tools list based on the requested tool name.
  8. It validates the arguments provided by the LLM against the function's signature/schema (using Pydantic).
  9. It executes the Python function with the validated arguments.
  10. It captures the return value (or any exception).
  11. It packages the result into a standardized ToolResult object.
  12. It passes the ToolResult back to the model.execute() generator/iterator (perhaps via .send() or a new model method like model.continue_execution(tool_results)).
  13. The model.execute() implementation sends the tool result back to the model API.
  14. The loop continues: the model might respond with another tool call or with the final text response.
  15. conversation.prompt() yields/returns the final text chunks/Response object once the model stops requesting tool calls.

Response Object Enhancements:

The Response object will need attributes to store the history of tool calls and results made during the generation of that specific response.

@dataclass
class ToolCallRequest:
    id: Optional[str]  # Unique ID for the call, if provided by the model API
    name: str
    arguments: Dict[str, Any]

@dataclass
class ToolResult:
    id: Optional[str]  # Matching ID from the request, if available
    name: str
    result: Optional[Any] = None  # JSON serializable result
    error: Optional[str] = None   # Error message if execution failed

class Response:
    # ... existing attributes ...
    tool_calls: List[Tuple[ToolCallRequest, ToolResult]] = field(default_factory=list)
    # Internal list to track the sequence of calls and results for this response turn.

The Response.log_to_db() method will need updating to store this tool call history, likely in new database tables.

3.3. Async API

The AsyncConversation.prompt() method will mirror the synchronous API, accepting the tools parameter and handling the tool calling loop using async/await for tool execution if the tool functions themselves are async. If tool functions are synchronous, they will be run in a thread pool executor via asyncio.to_thread.

4. CLI API Design (llm command)

4.1. Specifying Tools

A new --tool option will be added to llm prompt and llm chat. It will accept the path to a Python file.

llm prompt "Weather in SF?" --tool path/to/my_tools.py
llm chat -m gemini-1.5-flash --tool path/to/my_tools.py

The specified Python file (my_tools.py) should contain one or more functions intended to be used as tools.

Discovery: By default, llm will import the file and make all functions defined directly within that module available as tools. Alternatively, a decorator could be introduced (@llm.tool) to explicitly mark functions for exposure, providing better control. Let's start with importing all functions for simplicity.

4.2. Execution Flow

  1. The llm prompt (or chat) command parses the --tool file.py options.
  2. For each file, it imports the module.
  3. It inspects the module, finding all function objects.
  4. For each function, it generates the JSON Schema (using the same mechanism as the Python API).
  5. It passes the list of tool schemas to the selected model via the prompt() method (which internally calls model.execute()).
  6. When the underlying model.execute() yields a ToolCallRequest, the CLI harness:
    • Finds the corresponding function object within the imported modules.
    • Validates arguments.
    • Executes the function.
    • Captures the result/error.
    • Creates the ToolResult object.
    • Sends the result back into the model execution loop.
  7. Prints the final text response to the console.

4.3. Example my_tools.py

# path/to/my_tools.py
from pydantic import Field
from typing import Literal

def get_current_weather(
    location: str = Field(..., description="The city and state, e.g. San Francisco, CA"),
    unit: Literal["celsius", "fahrenheit"] = "fahrenheit"
) -> dict:
    """Get the current weather in a given location."""
    # Implementation...
    if "tokyo" in location.lower():
        return {"location": location, "temperature": "10", "unit": unit, "forecast": ["sunny", "windy"]}
    # ... rest of implementation
    return {"location": location, "temperature": "unknown"}

# Another tool
def calculate_sum(a: int, b: int) -> dict:
    """Calculates the sum of two integers."""
    return {"sum": a + b}

5. Plugin Interface (llm.Model)

The Model (and AsyncModel, KeyModel, AsyncKeyModel) base classes need modifications to support tool calling.

  1. Capability Flag:

    class Model:
        ...
        supports_tool_calling: bool = False
        # Optional: If the model supports structured output matching a schema
        # distinct from general tool calling (like Gemini function calling vs JSON mode)
        supports_schema: bool = False

    Plugins set supports_tool_calling = True if their model supports the tool call/result loop. supports_schema remains for models that can only force JSON output matching a schema without the back-and-forth tool loop.

  2. Modified execute() Signature (Conceptual): The existing execute() method needs to handle the tool calling loop. A possible way is for it to yield special objects.

    from typing import Iterator, Union, List, Dict, Any, Optional, AsyncGenerator
    
    @dataclass
    class ToolCallRequest:
        id: Optional[str]
        name: str
        arguments: Dict[str, Any]
    
    @dataclass
    class ToolResult:
        id: Optional[str]
        name: str
        result: Optional[Any] = None
        error: Optional[str] = None
    
    # Standardized input/output for execute's tool interaction
    ToolInput = Union[str, ToolResult]
    ToolOutput = Union[str, Chunk, ToolCallRequest]
    
    class Model(ABC):
        # ... existing methods ...
    
        @abstractmethod
        def execute(
            self,
            prompt: Prompt, # Now includes prompt.tools_schemas list
            stream: bool,
            response: Response,
            conversation: Optional[Conversation],
        ) -> Iterator[ToolOutput]: # Yields text chunks OR ToolCallRequest
            """
            Executes the prompt. If the model supports tools, this method
            is responsible for the loop:
            1. Send prompt + tool schemas to the model API.
            2. Parse response:
               - If text response, yield text chunks.
               - If tool call request, yield ToolCallRequest object.
            3. If ToolCallRequest was yielded, expect a ToolResult to be
               sent back via the generator's .send() method.
            4. Send ToolResult to the model API.
            5. Go back to step 2.
            """
            pass
    
    class AsyncModel(ABC):
        # ... similar changes for async execute ...
        @abstractmethod
        async def execute(
            self,
            prompt: Prompt,
            stream: bool,
            response: AsyncResponse,
            conversation: Optional[AsyncConversation],
        ) -> AsyncGenerator[ToolOutput, ToolInput]: # async yield, async send
           pass

    Explanation of execute Changes:

    • The Prompt object passed to execute will now contain an additional (optional) field like prompt.tool_schemas: List[dict].
    • execute now yields ToolOutput, which can be text str, Chunk, or a ToolCallRequest.
    • The caller (the Conversation.prompt implementation) iterates the generator.
    • If it receives a ToolCallRequest, it executes the tool and uses generator.send(tool_result) to send the ToolResult back into the execute generator.
    • The execute implementation needs to handle receiving this ToolResult (e.g., via the value of a yield expression) and continue the API interaction.

    This generator-based approach allows the core llm library to manage the execution harness while the plugin focuses solely on the API communication details.

6. Database Schema Changes

The logs.db schema needs updating to store tool interactions.

  1. New Table: tool_calls

    • id: TEXT PRIMARY KEY (e.g., ULID)
    • response_id: TEXT, FK to responses.id
    • request_id: TEXT (optional, the ID from the API call if provided)
    • tool_name: TEXT
    • arguments: TEXT (JSON dictionary)
    • result: TEXT (JSON representation of the return value)
    • error: TEXT (Error message if execution failed)
    • timestamp_utc: TEXT (When the tool call was processed)
  2. Modify responses table:

    • Consider adding a boolean has_tool_calls column for easier querying, although joining tool_calls is also possible.

The Response.log_to_db() method will be updated to insert records into the tool_calls table based on the content of response.tool_calls.

7. Error Handling

  • Tool Not Found: If the LLM requests a tool not provided by the user, the harness returns a ToolResult with an appropriate error message.
  • Argument Validation Failure: If the LLM provides arguments that don't match the tool's schema, Pydantic raises an error during validation. The harness returns a ToolResult with the validation error details.
  • Tool Execution Error: If the user's Python function raises an exception, the harness catches it and returns a ToolResult with the exception message/traceback.
  • Model Does Not Support Tools: If tools= is passed to a model where supports_tool_calling is False, prompt() should raise a TypeError or ValueError.

8. Implementation Steps

  1. Define ToolCallRequest and ToolResult: Add these dataclasses to llm.models.
  2. Schema Generation: Implement llm.function_to_json_schema() using Pydantic.
  3. Update Prompt: Add tool_schemas: List[dict] attribute.
  4. Update Response: Add tool_calls: List[Tuple[ToolCallRequest, ToolResult]] attribute.
  5. Modify Model.execute: Change signature/yield type as described in section 5. Update documentation.
  6. Modify Conversation.prompt:
    • Add tools parameter.
    • Implement schema generation.
    • Implement the loop that iterates over model.execute().
    • Handle ToolCallRequest: find function, validate args, execute, create ToolResult.
    • Use generator.send(tool_result) to continue execution.
    • Store (request, result) tuples in response.tool_calls.
  7. Update Async Variants: Implement the async versions of the above.
  8. CLI Implementation:
    • Add --tool option to llm prompt / llm chat.
    • Implement module import and function discovery.
    • Pass discovered tools (as functions) to the prompt() call.
  9. Database: Implement schema migrations and update Response.log_to_db().
  10. Update OpenAI Plugin: Modify the OpenAI plugin's execute() methods to handle the new tool calling flow (sending schemas, parsing tool call requests, handling results).
  11. Testing: Add comprehensive tests for Python API, CLI, plugin interactions, error handling, and database logging.
  12. Documentation: Update user docs (CLI, Python API) and plugin author docs.

This design provides a flexible and extensible way to integrate tool calling, leveraging existing patterns (Pydantic, JSON Schema) while keeping the core logic abstracted from specific model implementations.Okay, here is a detailed design proposal for adding tool calling support to llm.

1. Goals

  1. Integrate Tool Calling: Enable LLMs that support tool calling (function calling) to request the execution of user-provided tools within a conversation.
  2. Python API: Allow users to define tools as Python functions with type hints. Automatically generate JSON Schema definitions using Pydantic. Integrate seamlessly with llm.Conversation.
  3. CLI API: Allow users to specify tools via the command line by referencing Python files containing tool functions.
  4. Abstraction: The core llm library should provide a consistent interface for users, abstracting away the specific implementation details of different model providers (OpenAI, Anthropic, Gemini, etc.).
  5. Plugin System: Model plugins should be responsible for translating between LLM's internal tool representation and the specific API format of the underlying model.
  6. Conversation Context: Tool calls and their results should become part of the ongoing conversation history sent back to the model.

2. Core Concepts & Data Structures

  1. Tool Definition: A representation of a tool that can be understood by both the user's code and the LLM (after conversion).
    • Python: A standard Python function with type hints and a docstring.
    • Internal/Schema: A JSON Schema object describing the function's name, description, and parameters (types, descriptions, required fields).
  2. Tool Call Request: A standardized structure representing a request from the LLM to execute a specific tool with specific arguments.
    • Contains: id (Optional[str], unique ID for the call, if provided by the model API), name (str, tool name), arguments (Dict[str, Any]).
  3. Tool Result: A standardized structure representing the outcome of executing a requested tool.
    • Contains: id (Optional[str], matching ID from the request), name (str, tool name), result (Optional[Any], JSON serializable return value), error (Optional[str], error message if execution failed).

3. Python API Design (llm library)

3.1. Defining Tools

Tools will be defined as standard Python functions using type hints and docstrings. Pydantic will be used internally to generate the JSON Schema. The existing schema= mechanism's infrastructure can be leveraged or extended.

# examples/tools.py
import llm
from pydantic import BaseModel, Field
from typing import Literal

def get_current_weather(
    location: str = Field(..., description="The city and state, e.g. San Francisco, CA"),
    unit: Literal["celsius", "fahrenheit"] = "fahrenheit"
) -> dict:
    """
    Get the current weather in a given location.

    Args:
        location: The city and state, e.g. San Francisco, CA
        unit: The unit to use for the temperature, celsius or fahrenheit
    """
    # Implementation would go here - fetch weather data
    print(f"*** Tool: get_current_weather called with location={location}, unit={unit} ***")
    if "tokyo" in location.lower():
        return {"location": location, "temperature": "10", "unit": unit, "forecast": ["sunny", "windy"]}
    elif "san francisco" in location.lower():
        return {"location": location, "temperature": "72", "unit": unit, "forecast": ["cloudy", "foggy"]}
    else:
        return {"location": location, "temperature": "unknown"}

class GetNthFibonacciRequest(BaseModel):
    n: int = Field(..., description="The index of the Fibonacci number to calculate (0-based)", ge=0, le=100)

def get_nth_fibonacci(req: GetNthFibonacciRequest) -> dict:
    """Calculates the nth Fibonacci number."""
    print(f"*** Tool: get_nth_fibonacci called with n={req.n} ***")
    a, b = 0, 1
    if req.n == 0: return {"result": a}
    for _ in range(req.n - 1):
        a, b = b, a + b
    return {"result": b}

# Can also define tools that raise errors
def sometimes_errors(succeed: bool = True) -> dict:
    """This tool sometimes raises an error."""
    print(f"*** Tool: sometimes_errors called with succeed={succeed} ***")
    if succeed:
        return {"status": "success"}
    else:
        raise ValueError("You asked for an error!")

# List of tools to be passed to the model
my_tools = [get_current_weather, get_nth_fibonacci, sometimes_errors]
  • Docstrings: The main function docstring becomes the tool's description. Parameter descriptions can be added using Field(..., description="...") or extracted from standard docstring formats (e.g., Google style - TBD).
  • Type Hints: Used by Pydantic to generate the JSON Schema parameter types. Pydantic models can define complex nested arguments.
  • Return Type: The function should return a JSON-serializable dictionary or other value. Non-dict values will likely be wrapped in a standard structure like {"result": value}.

3.2. Using Tools in Conversations

A new tools parameter will be added to Conversation.prompt() (and AsyncConversation.prompt()) accepting a list of callable functions.

import llm
# Assuming my_tools is defined as above
# from examples.tools import my_tools

# Use an OpenAI model that supports tool calling
# Ensure API key is configured via llm keys set openai or OPENAI_API_KEY
model = llm.get_model("gpt-4o")
conversation = model.conversation()

# The tool calling loop is handled internally by prompt()
# Use --no-stream for simpler debugging initially
response = conversation.prompt(
    "What's the weather like in San Francisco and what is the 5th Fibonacci number?",
    tools=my_tools,
    stream=False # Use stream=False for easier debugging of tool loops
)

# The final response from the model after tool execution
print(response.text())

# You can inspect the tool calls made during the response generation
for request, result in response.tool_calls:
    print(f"Request: {request.name}({request.arguments})")
    if result.error:
        print(f"  Error: {result.error}")
    else:
        print(f"  Result: {result.result}")

# Example with a tool error
response_error = conversation.prompt(
    "Use the sometimes_errors tool and make it fail",
    tools=my_tools,
    stream=False
)
print(response_error.text())
for request, result in response_error.tool_calls:
    print(f"Request: {request.name}({request.arguments})")
    if result.error:
        print(f"  Error: {result.error}")
    else:
        print(f"  Result: {result.result}")

Internal Flow (Revised based on thought process Option 3):

  1. conversation.prompt() receives the tools list (of Python functions).
  2. It generates JSON Schema representations for each tool using an internal helper (_function_to_schema).
  3. It adds these schemas to the Prompt object (prompt.tool_schemas).
  4. The conversation.prompt() method (or its internal iterator/async handler _execute_prompt_internal) manages the tool-calling loop.
  5. It calls model.execute(prompt, ...) which returns a generator.
  6. It iterates through the generator returned by model.execute().
  7. If model.execute() yields str or Chunk, yield it to the user (if streaming). Accumulate text.
  8. If model.execute() yields a ToolCallRequest object:
    • Look up the corresponding Python function from the tools list.
    • Error Handling: If not found, create a ToolResult with an error and .send() it back.
    • Validate the arguments from ToolCallRequest.arguments against the function signature/schema using Pydantic.
    • Error Handling: If validation fails, create ToolResult with validation error and .send() it back.
    • Execute the Python function (using asyncio.to_thread if sync function in async context).
    • Error Handling: If the function raises an exception, catch it, create ToolResult with the error message/traceback and .send() it back.
    • If execution succeeds, create ToolResult with the return value.
    • Store the (ToolCallRequest, ToolResult) pair in response._tool_calls_internal.
    • Use generator.send(tool_result) to send the result back into the model.execute() generator.
  9. The loop continues until model.execute() finishes yielding (signifying the model has produced its final text response).
  10. The accumulated text and the list of tool calls are finalized on the Response object.

3.3. Response Object Enhancements

from dataclasses import dataclass, field
# ... other imports ...

@dataclass
class ToolCallRequest:
    id: Optional[str] = None # Unique ID for the call, if provided by the model API
    name: str = ""
    arguments: Dict[str, Any] = field(default_factory=dict)

@dataclass
class ToolResult:
    id: Optional[str] = None # Matching ID from the request, if available
    name: str = ""
    result: Optional[Any] = None  # JSON serializable result
    error: Optional[str] = None   # Error message if execution failed

class Response:
    # ... existing attributes ...
    # Internal list, populated during generation
    _tool_calls_internal: List[Tuple[ToolCallRequest, ToolResult]] = field(default_factory=list)

    @property
    def tool_calls(self) -> List[Tuple[ToolCallRequest, ToolResult]]:
        """A list of (request, result) tuples for tool calls made during this response generation."""
        self._force() # Ensure iteration is complete
        return self._tool_calls_internal

    def log_to_db(self, db):
        # ... existing logging ...
        # Add logic to insert into the new tool_calls table
        # using self.tool_calls and the response_id
        if self.tool_calls:
            tool_call_records = []
            for request, result in self.tool_calls:
                 tool_call_records.append({
                     "response_id": self.id, # Assuming response.id is set before log_to_db
                     "request_id": request.id,
                     "tool_name": request.name,
                     "arguments": json.dumps(request.arguments),
                     "result": json.dumps(result.result) if result.result is not None else None,
                     "error": result.error,
                     "timestamp_utc": datetime.datetime.now(datetime.timezone.utc).isoformat(),
                 })
            db["tool_calls"].insert_all(tool_call_records, pk="id") # Need unique ID? Or composite?

# Similar changes needed for AsyncResponse, using await self._force()

3.4. Async API

The AsyncConversation.prompt() method will mirror the synchronous API. The internal loop handler (_execute_prompt_internal_async) will use async for and await generator.asend(tool_result). It will use asyncio.to_thread for executing synchronous tool functions. If a tool function is async def, it will be awaited directly.

4. CLI API Design (llm command)

4.1. Specifying Tools

A new --tool PATH option will be added to llm prompt and llm chat. It can be specified multiple times.

# Use tools defined in my_tools.py
llm prompt "Weather in SF and 5th Fib number?" --tool my_tools.py

# Use tools from multiple files
llm prompt "Combine weather and fib" --tool weather.py --tool math.py

The specified Python file (my_tools.py) should contain one or more functions intended to be used as tools.

Discovery: llm will import the file and make all functions defined directly within that module available as tools, ignoring those starting with _. Docstrings and type hints will be used for schema generation.

4.2. Execution Flow

  1. The llm prompt (or chat) command parses the --tool file.py options.
  2. For each file, it imports the module and inspects it to find candidate functions.
  3. It generates the JSON Schema for each tool function.
  4. It passes the list of functions (not just schemas) to the internal conversation.prompt() call via the tools= argument.
  5. The Python API's internal loop handles execution as described in section 3.2.
  6. The CLI prints the final text response. If tool execution occurred, potentially add a note to stderr or use a --verbose flag to show tool interactions.

5. Plugin Interface (llm.Model)

  1. Capability Flag: Add supports_tool_calling: bool = False to _BaseModel. Plugins set this to True.

  2. Modified execute(): The generator-based approach described in section 3.2 (step 8) seems most robust. execute yields str, Chunk, or ToolCallRequest, and receives ToolResult via .send().

    # In llm/models.py
    from typing import Iterator, Union, List, Dict, Any, Optional, AsyncGenerator, Tuple, TypeVar
    
    @dataclass
    class ToolCallRequest: ... # As defined above
    @dataclass
    class ToolResult: ... # As defined above
    
    # Define types for generator input/output
    ToolOutput = Union[str, Chunk, ToolCallRequest]
    ToolInput = Optional[ToolResult] # Can send None initially, then ToolResult
    
    # Sync Model
    class Model(ABC):
        # ...
        @abstractmethod
        def execute(
            self,
            prompt: Prompt,
            stream: bool,
            response: Response,
            conversation: Optional[Conversation],
        ) -> Iterator[ToolOutput]: # Yields text/Chunk/ToolCallRequest
            # Implementation needs to handle receiving ToolResult via yield value
            tool_result: ToolInput = yield "Initial text or first tool call request"
            while isinstance(tool_result, ToolResult):
                # Process tool_result, call API again
                # ...
                # Yield more text or another ToolCallRequest
                next_output: ToolOutput = "More text or another tool call"
                tool_result = yield next_output # Yield and wait for next .send()
            # Final processing if needed
            pass
    
    # Async Model
    class AsyncModel(ABC):
        # ...
        @abstractmethod
        async def execute(
            self,
            prompt: Prompt,
            stream: bool,
            response: AsyncResponse,
            conversation: Optional[AsyncConversation],
        ) -> AsyncGenerator[ToolOutput, ToolInput]: # Async yield/send
            # Similar logic using 'yield' and '(yield ...)' for send
            tool_result: ToolInput = yield "Initial text or first tool call request"
            while isinstance(tool_result, ToolResult):
                # Process tool_result, call API again (await)
                # ...
                # Yield more text or another ToolCallRequest
                next_output: ToolOutput = "More text or another tool call"
                tool_result = yield next_output # Yield and wait for next .asend()
            # Final processing if needed
            # Need 'yield' at least once in an async generator
            if False: yield # pragma: no cover
    • The Prompt object passed to execute will contain prompt.tool_schemas: List[dict].
    • Plugins implement the logic to:
      • Format prompt.tool_schemas for their specific API.
      • Send the prompt and schemas.
      • Parse the API response.
      • Yield text/Chunk for content, or ToolCallRequest for tool calls (parsing name, args, id).
      • Handle the ToolResult sent back via yield value = yield ....
      • Format the ToolResult for their API and send it back.
      • Repeat until the API provides a final text response.

6. Database Schema Changes

  1. Add tool_calls table:
    CREATE TABLE [tool_calls] (
       [id] INTEGER PRIMARY KEY, -- Auto-incrementing local ID for the call record
       [response_id] TEXT REFERENCES [responses]([id]),
       [request_id] TEXT, -- ID from the API if available
       [tool_name] TEXT,
       [arguments] TEXT, -- JSON
       [result] TEXT, -- JSON
       [error] TEXT,
       [timestamp_utc] TEXT
    );
    CREATE INDEX [idx_tool_calls_response_id] ON [tool_calls] ([response_id]);
  2. Migration: A new migration (m016_tool_calls.py?) will be added to llm/migrations.py to create this table.
  3. Response.log_to_db(): Update this method to iterate through response.tool_calls (or response._tool_calls_internal) and insert records into the new tool_calls table, linking them via response_id.

7. Error Handling Summary

  • Model does not support tools: conversation.prompt() raises TypeError if tools= used with incompatible model.
  • Tool Not Found (CLI): CLI harness fails during function lookup, raises click.ClickException.
  • Tool Not Found (Python API): Harness loop creates ToolResult(error=...) and sends to model via generator.
  • Argument Validation Failure: Pydantic validation in harness loop creates ToolResult(error=...) and sends to model.
  • Tool Execution Error: Harness loop catches exception, creates ToolResult(error=...) and sends to model.

8. Documentation Updates

  • User Docs:
    • New section explaining tool calling concept.
    • Python API: How to define tools, use tools= parameter, interpret response.tool_calls.
    • CLI: How to use --tool, structure of tool files.
  • Plugin Author Docs:
    • Explain supports_tool_calling flag.
    • Detail the modified execute() signature and the generator .send() mechanism.
    • Define the ToolCallRequest and ToolResult dataclasses.
    • Provide examples for handling the tool loop within execute().

This design provides a robust framework for tool calling, balancing user convenience with the flexibility needed for diverse model APIs, and integrating naturally into the existing llm structures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment