watsonx Orchestrate ADK: How to Get Started Building Real Agents and Workflows
Earlier this year I had the chance to attend an IBM watsonx Orchestrate workshop. After the workshop I kept going – setting up Orchestrate's Agent Development

Earlier this year I had the chance to attend an IBM watsonx Orchestrate workshop. After the workshop I kept going – setting up Orchestrate's Agent Development Kit (ADK) locally, building a proper multi-agent workflow, and connecting it to real tools. This post is for anyone wanting to get started with watsonx Orchestrate and its ADK, or just curious about what building with an agentic platform looks like in practice.
The ADK is the pro-code layer of watsonx Orchestrate: a CLI, a local server runtime, agents defined in YAML, tools written in Python. You can run a full Orchestrate server locally and connect it to either cloud LLMs or local models via Ollama.
Under the hood the ADK is essentially two things: the orchestrate CLI and the watsonx Orchestrate Developer Edition as a locally running server instance. With the CLI you manage environments, which can be a SaaS instance on IBM Cloud or AWS, an on-prem installation, or your local Developer Edition – and you use the same commands to import agents and tools, start chats, and point the runtime at different model providers. The Developer Edition itself runs containerised via Docker/Compose or a similar runtime and gives you a complete Orchestrate environment for development and testing on your own machine, while the actual LLMs are reached via model providers: IBM-hosted models on watsonx.ai, third-party providers, or local Ollama models that you register with the AI gateway.
The setup is straightforward:
1pip install ibm-watsonx-orchestrate
2orchestrate server start -e server.env -f docker-compose.yml
Two things worth knowing before you start: the server is resource-heavy. Recommended specs are 8 cores and 32 GB RAM, and in my experience the server alone – without any LLM – uses up to 15 GB. On a well-specced machine that's fine, on a standard developer laptop that can be tight. And the ADK requires a valid watsonx Orchestrate subscription and API key. There is no fully offline free tier.
Agents are created and managed via YAML files. Tools are Python functions decorated with @tool. Both can live in version control, both deploy with CLI commands. That fits naturally into an engineering workflow – code review, CI/CD, the usual.
A typical agent definition:
1spec_version: v1
2kind: native
3style: react
4name: some_agent
5llm: groq/openai/gpt-oss-120b
6instructions: |-
7 You are responsible for X.
8 When given Y, do Z and return the result.
9collaborators:
10 - other_agent
11tools:
12 - some_tool
One field in this example that is easy to miss is style. watsonx Orchestrate supports three styles for native agents: default, react and planner. The default style uses the model’s built-in reasoning in a fairly lightweight, tool-centric loop and works well for simpler, loosely ordered tasks. The react style implements an explicit “think → act → observe” pattern and is intended for more complex or ambiguous problems where each step depends on the previous outcome. The planner style goes one step further and lets the agent generate a plan first and then execute it step by step. This is useful when you want a transparent, multi-step flow that you can inspect and reason about. In the examples here I used react, because it maps naturally to multi-step workflows that depend heavily on tools and collaborators.
Recently, I built a meeting workflow where this pattern becomes very concrete. A central meetingmanager agent orchestrates five specialist agents – for example for transcript cleaning, action item extraction, minutes writing and task creation. The orchestration is expressed entirely as a process description in the instructions, for example (shortened):
1name: meetingmanager
2style: react
3llm: groq/openai/gpt-oss-120b
4instructions: |-
5 You orchestrate end-to-end meeting processing with five specialist agents.
6
7 CRITICAL Input handling
8 - Transcript is expected in the first user message.
9 - Do not ask for it twice.
10 - If missing, ask once for transcript + project key.
11
12 Orchestration flow
13 - Step 1: transcriptcleaner – clean raw transcript (VTT/plain text).
14 - Step 2: transcriptanalyst – extract action items, decisions, questions.
15 - Step 3: minuteswriter – generate customer-facing minutes (Markdown).
16 - Step 4: taskwriter – create one task per action item using project key.
17 - Step 5: wikiwriter – create wiki page when wiki params are provided.
18
19collaborators:
20 - transcriptcleaner
21 - transcriptanalyst
22 - minuteswriter
23 - taskwriter
24 - wikiwriter
25tools:
26 - create_project
The “CRITICAL” section is not decoration, it was necessary to stop the orchestrator from getting stuck in loops and asking for the transcript again after it had already been provided. The solution is to be very explicit about what the input expectations are and what the flow should be.
Instructions are the free-form “operating manual” of an agent. In addition to that, Orchestrate gives you a more structured way to encode behaviour: guidelines. Guidelines are small, conditional rules that follow a “when ... then ..." pattern and can optionally mandate a specific tool call.
A simplified example:
1guidelines:
2 - condition: "User asks for a status update on a task"
3 action: "Summarise the current status briefly before asking a follow-up question if needed."
4 tool: "get_task_status"
The idea is to use guidelines for behaviour that must be consistent and repeatable, for example, always calling a particular tool when a certain condition is met, or always avoiding certain content in specific contexts – and keep the more open-ended reasoning and tone in the instructions. In practice, this can be a cleaner alternative to stuffing every “if X then Y” rule into a giant block of instructions as a prompt, and it makes it easier to review and tweak agent behaviour over time.
The key design choice in Orchestrate is that orchestration and agent instruction is expressed in natural language instructions, not as a graph or rules engine. An orchestrator agent gets instructions describing the process: which sub-agents to call, in what order, what to pass between them and the LLM resolves that at runtime. For everything predictable and clearly sequenced, this works well. Where it requires attention is making sure the instructions are unambiguous enough that the model doesn't reorder steps or skip things you didn't expect it to skip.
A practical pattern that holds up well: keep agents narrow. One agent per cognitive task, one toolset per agent. A coordinator agent on top that routes between them. This makes individual agents easier to test and their behavior easier to reason about.
Tools are where integrations live. A tool is a regular Python function decorated with @tool. The framework generates the JSON schema and exposes it to agents automatically:
1from ibm_watsonx_orchestrate.agent_builder.tools import tool
2
3@tool(
4 name="create_task",
5 description="Creates a task in the project tracker.",
6)
7def create_task(project_key: str, title: str, description: str, assignee: str = "") -> str:
8 # your integration logic here
9 ...
From the agent's perspective, it just sees a named tool with a description. The API calls, authentication, error handling, and data mapping happen in Python. This is useful when prebuilt connectors don't cover the specifics of your environment. You write a function, import it, and it's available to any agent you assign it to.
The ADK itself supports multiple tool types. Python tools like the one above give you full control over logic and error handling in code. OpenAPI tools let you turn an OpenAPI 3.0 specification for a JSON-based HTTP API into one or more tools without writing integration code at all. MCP toolkits let you expose external MCP servers, which can themselves front whole systems, as tools to your agents. And Langflow tools allow you to import visual LLM flows built in Langflow and treat them as tools within Orchestrate, which is handy when you already have complex RAG or agent chains modeled there.
As a concrete example: in one workflow I built, I needed to look up OpenProject users by display name and map natural-language priority strings to internal IDs before creating tasks. Both are a few lines of Python.
The day-to-day cycle looks like this:
1orchestrate tools import -k python -f tools/my_tool.py
2orchestrate agents import -f agents/my_agent.yaml
1orchestrate chat start
The chat UI also lets you manage agents, tools, and connections directly – you don't have to drop back to the CLI every time.
With a multi-agent setup across several files, step 2 adds up. A wrapper script that imports everything in the right order with configurable LLM targets reduces that to a single command and is worth writing early.
For simple flows this “edit → import → chat” loop is enough. As soon as agents call multiple tools or collaborate, observability becomes important. watsonx Orchestrate can send traces to Langfuse, an open-source observability stack for LLM applications: each agent run becomes a trace with spans for prompts, tool calls and intermediate results, plus timing and error information. In practice this turns “the model did something weird” into “step 3 of this run called the wrong tool with these parameters”, which is exactly what you want when debugging multi-agent setups or explaining behaviour to other teams.
What worked well for me was a small set of “canonical” meeting transcripts and project keys that I regularly run through the workflow, plus Langfuse traces to verify tool selection, step order and output formats. It is not a full-blown automated test suite yet, but it is close to a regression test set that can later be driven via the Orchestrate APIs instead of the chat UI.
One other thing to account for: large unstructured inputs can strain the context window. In my case, raw VTT meeting files caused agents to loop. Preprocessing the input down to clean text resolved it, but that may be something to watch for in other scenarios. The general point is that large or noisy inputs often need a cleaning step before they reach an agent – but that's true for agentic systems generally, not specific to this platform.
Programmatic triggering. The ADK workflow runs through the Orchestrate chat UI. Triggering an agent flow from an external system, a webhook, or CI/CD requires additional work that you assemble yourself. This is worth knowing if programmatic invocation is a hard requirement.
Prompt engineering takes iteration. The flexibility of natural-language orchestration is also where things can drift if instructions aren't tight. Getting consistent step ordering and reliable tool selection across varied inputs takes real testing cycles. This isn't a surprise if you've worked with agentic systems before, but it's not a one-afternoon task either. Guidelines can help here when you discover patterns that need to be enforced reliably, for example “when the user asks about ticket status, always call the status tool first” – instead of trying to encode every such pattern in prose instructions.
Local LLM output consistency. If you're running local models via Ollama, expect more variability in output format than with a cloud LLM in JSON mode. Parsers and cleanup nodes are often necessary. Switching to a cloud model removes most of that friction.
Local vs. cloud LLMs. The Developer Edition makes it tempting to run everything locally, especially with Ollama as a quick way to spin up models on your own hardware. For experiments, internal tooling or non-critical workflows this is great: low latency, no external API traffic, and full control over where data goes. As soon as you feed larger structured outputs into downstream systems or expose flows to customers, SLAs and stability start to matter. In those cases, hosted models with a proper JSON mode and enforcement of output formats tend to be the safer choice, even if that means giving up some sovereignty and paying per token.
watsonx Orchestrate offers a 30-day free trial, and the local Developer Edition can be set up in a few steps:
pip install ibm-watsonx-orchestrateorchestrate server start -e server.env -f docker-compose.ymlorchestrate chat start – the chat UI opens in your browserThe ADK documentation covers the setup in more detail. If you want some inspiration for what to build or how to get started, you can access the full code for the meeting workflow I built or get in touch with us directly. We're here to help you boost your AI automation journey.
You are interested in our courses or you simply have a question that needs answering? You can contact us at anytime! We will do our best to answer all your questions.
Contact us