Handling failures

Tools fail, networks blip, APIs change. The patterns that keep agents useful through it all.

Real agents fail in real ways. The good news: failures inside an agent run are recoverable by the Mind, not by you. The right prompt patterns turn "the script crashed" into "the agent worked around it."

How failures surface

When a tool fails, it returns { "error": "Jettson <Tool> failed: <reason>" }. The Mind reads this on the next iteration like any other tool result — it doesn't crash the agent. Your prompt decides what happens next.

The whole loop only ends in error if:

The Mind hits the 20-iteration cap
The agent exceeds the plan's max duration
The reasoning proxy is unavailable for the entire run (very rare)

Single-tool failures? The agent keeps going. Your job is to tell it how.

Pattern 1 — Ask the Mind to handle errors gracefully

In the task prompt, give the agent permission to recover:

text

If jettson_browser_navigate fails:
- Wait, then try once more
- If it fails again, try jettson_http_request to the same URL
- If that also fails, return { "error": "could not reach <url>", "fallback": <whatever you have so far> }

Don't error out the whole task — return partial progress.

This converts "the task failed" into "the task returned a partial answer with an error field." Way easier to handle on the caller side.

Pattern 2 — Retry at the prompt level

For transient failures (network blips, 503s), one retry is usually enough.

text

If a tool call fails with a temporary error (timeout, 5xx, "temporarily unavailable"),
wait a moment and try again ONCE. Don't retry more than once.
If the second attempt also fails, fall back to the alternative source.

The Mind is good at distinguishing "retry-worthy" errors from permanent ones — bad input vs. transient infra.

Pattern 3 — Fallback chains

When a task can be done multiple ways, list them in order:

text

To find the company's pricing:
1. jettson_http_request to https://<domain>/pricing.json (if their public API has it)
2. jettson_browser_navigate to https://<domain>/pricing (scrape)
3. jettson_browser_navigate to https://<domain> (look for a pricing link)
4. If still nothing, return { "pricing": "unknown", "tried": ["api", "scrape", "homepage"] }

This shape is what makes the customer-research example robust — no one source is reliable, three of them stacked are.

Pattern 4 — Tell the agent when to give up

The default is for the Mind to keep trying. That's usually right but occasionally wasteful. Bound it:

text

If after 3 distinct attempts you can't get the data, return a partial result.
Do not loop indefinitely.

The 20-iteration cap is the absolute backstop; this prompt-level guidance kicks in earlier and produces cleaner output.

Pattern 5 — Ask the user (when there's a user to ask)

For agents in interactive contexts (chat, copilot UI), the right answer to ambiguity is sometimes "ask back." Bake it in:

text

If the task lacks information you need to make a confident call, return:
{ "needs_input": "<what you need from the human>" }
Don't guess. The user is on the other end of this and they'll answer.

The caller sees needs_input populated and renders a follow-up prompt UI instead of spawning a new agent blind.

Handling agent-level errors

When status transitions to error, the agent doc has:

json

{
  "status": "error",
  "errorMessage": "Agent exceeded the maximum duration for its plan."
}

Common causes and reactions:

| errorMessage | Cause | What to do | | --- | --- | --- | | "Agent exceeded the maximum duration for its plan." | Hit maxAgentDurationMinutes | Trim the task; upgrade plan for longer runs; split into multiple agents | | "Agent reached the iteration limit without completing." | 20-iter cap hit | Prompt is too open-ended; constrain the output shape | | "Jettson Mind is temporarily unavailable. Please retry." | Reasoning proxy transient failure | Retry once with backoff | | "Reasoning step exceeded the per-call token budget." | A tool returned too much data for one step | Add selector to extracts; use shell to filter before reading |

Handling spawn-time errors

These happen at POST /api/v1/agents time and never get a successful spawn:

429 rate_limited — honor Retry-After
429 concurrent_limit_reached — wait or stop another agent
402 monthly_quota_exceeded — upgrade or wait for the 1st of the month
400 invalid_task — your task body is bad, fix it
503 temporarily_unavailable — retry with backoff

See Errors for the full catalog and retry guidance.

A note on idempotency

Spawn calls are not idempotent — two POSTs make two agents. If your code might retry due to a network hiccup on the way to Jettson (e.g. a 504 from a load balancer in front of your own service), dedupe on your side before calling.

Good shape:

text

1. Compute a deterministic ID for this task (hash the task text + user)
2. Check your DB: has this ID been spawned?
3. If yes, return the existing agent_id
4. If no, POST to Jettson, store the (id, agent_id) tuple, return

Errors — full catalog with retry advice
Rate limits — back-off recipes
Tool composition patterns — fallback chains in context

How failures surface

Pattern 1 — Ask the Mind to handle errors gracefully

Pattern 2 — Retry at the prompt level

Pattern 3 — Fallback chains

Pattern 4 — Tell the agent when to give up

Pattern 5 — Ask the user (when there's a user to ask)

Handling agent-level errors

Handling spawn-time errors

A note on idempotency

Related