Automatic Reasoning and Tool-use (ART)

July 9, 2025

admin

Large Language Models are good at thinking. They’re also good at acting — if you give them the right tools and tell them when to use them.
The problem: most tool-using LLM pipelines today rely on hand-crafted prompts and static scripts that hardcode when to call a calculator, a search engine, or a code executor. They’re brittle. They don’t adapt.

ART (Automatic Reasoning and Tool-use) is a different approach:

Give the LLM a library of example tasks that show reasoning + tool calls.
Freeze the model — no retraining — and let it plan solutions step-by-step.
Whenever a tool call appears, pause the LLM, run the tool, and feed the output back into the conversation before continuing.
Optionally, let humans fix mistakes or add tools without retraining the model.

It’s program synthesis meets orchestration.

🚦 The ART Loop

Select Examples – Pull relevant reasoning + tool-use demonstrations from the task library.
Run Program – Generate reasoning steps; pause to call tools as needed.
Fix Mistakes (Optional) – Allow human edits or new tools to be added dynamically.

Code Demo: Mini-ART with Python

Below is a minimal simulation of ART in Python. We’ll give the LLM a math problem, let it decide to use a calculator, and resume reasoning after getting the tool’s output.

import openai

openai.api_key = "your-api-key"

# Define tool functions
def calculator(expression: str) -> str:
    try:
        return str(eval(expression))
    except Exception as e:
        return f"Error: {e}"

# Example Task Library
task_library = [
    {
        "task": "Math: Calculate sum",
        "steps": [
            "Q1: [reason] Identify the numbers to add.",
            "Q2: [tool:calculator] 2 + 2",
            "Q3: [reason] State the result."
        ]
    }
]

# Input problem
new_task = "What is 17 * 24 plus 10?"

# Prompt the model
system_prompt = """You are an assistant that solves problems step-by-step.
If a step requires calculation, output: [tool:calculator] <expression>
Resume reasoning after receiving the tool's output."""

response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": f"Task: {new_task}"}
    ],
    max_tokens=150
)

steps = response["choices"][0]["message"]["content"].split("\n")

# Simulate ART execution loop
for step in steps:
    if step.startswith("[tool:calculator]"):
        expression = step.split("]")[1].strip()
        tool_result = calculator(expression)
        print(f"🔧 Tool Output: {tool_result}")
    else:
        print(f"🤖 LLM: {step}")

What This Code Demonstrates

Dynamic Tool Use: The model decides when to call a tool.
Interleaved Execution: LLM pauses for tool results before continuing.
Extensibility: Adding a new tool is just a matter of defining a function and teaching it in the library.

Why This Matters

Benchmarks show ART can outperform:

Standard few-shot prompting
Automatic Chain-of-Thought (CoT)
Even hand-crafted CoT in some cases — especially when paired with human feedback.

For complex, multi-step reasoning tasks (math, code generation, multi-hop search), this method moves us closer to agents that adapt rather than just follow a script.

The Code