Skip to content

Instantly share code, notes, and snippets.

@clutterstack
Last active November 21, 2025 05:28
Show Gist options
  • Select an option

  • Save clutterstack/7f84b8a74b41f1db2c3caf7b74a8cd3c to your computer and use it in GitHub Desktop.

Select an option

Save clutterstack/7f84b8a74b41f1db2c3caf7b74a8cd3c to your computer and use it in GitHub Desktop.
A minimal tool-calling Python script for a local Ollama-hosted LLM
#!/usr/bin/env python3
import subprocess
import ollama, readline
#MODEL = "phi4-mini"
MODEL = "qwen3:0.6b"
#MODEL = "qwen3:1.7b"
#MODEL = "qwen3:4b"
# MODEL = "qwen3:8b"
SYSTEM_PROMPT = """
"""
## Appending system prompt message to context.
messages = [{"role": "system", "content": SYSTEM_PROMPT}]
# The Ollama Python SDK can get this info from a function with a structured
# docstring, but to mirror the OpenAI example, let's stick with this dict schema thing
tools = [{
"type": "function",
"function": {
"name": "ping",
"description": "Ping a host on the internet.",
"parameters": {
"type": "object",
"properties": {
"host": {"type": "string"}
},
"required": ["host"]
}
}
}]
# The actual function that gets called
def run_ping(args):
host = str(args.get("host", "")).strip()
if not host:
return None, "error: missing host"
cmd = ["ping", "-c", "3", host] # this format for subprocess.run()
try:
r = subprocess.run(
cmd,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
text=True,
timeout=15,
check=False,
)
# Format the command and output to feed to the model
cmd_str = " ".join(cmd)
return str("\n".join([cmd_str, r.stdout]))
except Exception as e:
return cmd_str, f"error: {e}"
# Match up the tool name(s) with the function(s) to invoke
# This looks silly with only one function, but with many it would
# avoid a long if-elif chain
tool_functions = {
"ping": run_ping
}
# Crank the handle one turn
def process(user_text):
# Appending user message to `messages`.
messages.append({"role": "user", "content": user_text})
# (Ollama docs example: https://docs.ollama.com/capabilities/tool-calling#multi-turn-tool-calling-agent-loop)
while True:
# Send all the context (system prompt, accumulated messages, tool schemas) to the model
# via Ollama and get the model's completion output back
print("\n(Sending context to Ollama API)")
resp = ollama.chat(model=MODEL, messages=messages, tools=tools, think=False)
# We're interested in the `message` field. That's where
# the role and any thinking, content, and tool calls will be
print("From the model's response:\n")
print(">> Role: ", resp.message.role)
print(">> Thinking: ", resp.message.thinking)
print(">> Tool calls: ", resp.message.tool_calls)
print(">> Content: ", resp.message.content)
messages.append(resp.message)
if resp.message.tool_calls:
# A mini-loop to run all the requested functions and add their results to the context
for tool_call in resp.message.tool_calls:
fn_name = tool_call.function.name
fn_args = tool_call.function.arguments
if tool_call.function.name in tool_functions:
fn_result = tool_functions[fn_name](fn_args)
print("\nLocal function output: ")
print(fn_result)
# Append tool message, with the tool function result in the content field.
messages.append({
"role": "tool",
"tool_name": fn_name,
"content": fn_result
})
else:
break
def main():
while True:
# Each iteration of this loop takes one input prompt from the user
# but may also include a turn of the crank (API call and context additions)
# for tool calls
try:
line = input("> ").strip()
process(line)
print()
except KeyboardInterrupt:
print()
break
if __name__ == "__main__":
main()

Note we added think=False this time (compare with the chat-only version)

A simple chat:

❯ python tool_call_min.py
> What's a lagomorph

(Sending context to Ollama API)
From the model's response:

>> Role:  assistant
>> Thinking:  None
>> Tool calls:  None
>> Content:  A lagomorph is a type of mammal, specifically a group of primates, that are known for their long, narrow noses and the ability to chew their teeth.

> 

Get it to make a tool call:

❯ python tool_call_min.py
> can you ping fly.io?

(Sending context to Ollama API)
From the model's response:

>> Role:  assistant
>> Thinking:  None
>> Tool calls:  [ToolCall(function=Function(name='ping', arguments={'host': 'fly.io'}))]
>> Content:  

Local function output: 

ping -c 3 fly.io
PING fly.io (37.16.18.81): 56 data bytes
64 bytes from 37.16.18.81: icmp_seq=0 ttl=55 time=17.132 ms
64 bytes from 37.16.18.81: icmp_seq=1 ttl=55 time=19.061 ms
64 bytes from 37.16.18.81: icmp_seq=2 ttl=55 time=18.845 ms

--- fly.io ping statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 17.132/18.346/19.061/0.863 ms


(Sending context to Ollama API)
From the model's response:

>> Role:  assistant
>> Thinking:  None
>> Tool calls:  None
>> Content:  Yes, I can ping Fly.io. The ping response confirms that Fly.io is reachable and responds with a round-trip time of approximately 18.346 milliseconds.

> 

We started with one API call to send the context up until, and including, the user's prompt to the model.

The model responded with a tool_call instead of content, so we

  • added that response message to the context
  • ran the tool
  • added a message with the tool role to the context (to let the model know the result of running the tool function), and
  • fired the whole thing back to the API.

The model responded to that with just content, so we could display that and prompt the user for another round of input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment