Skip to main content
Scenarios define test cases for workflows. They provide simulated events (mocking LLM and tool interactions) and expectations (assertions about what should happen).

Scenario

Scenario defines a complete test case for a workflow. A scenario provides simulated events that mock external interactions (LLM calls, tool executions) and expectations that verify the workflow behaves correctly. Example - Basic LLM response test:
name: simple_response
description: Tests basic LLM response handling
events:
  - node: call_llm
    type: llm_response
    text: "Hello!"
expect:
  outcome: completed
  reached: [call_llm]
Example - Agent loop with multiple iterations:
name: agent_two_iterations
description: Agent uses a tool then completes
events:
  - node: agent_loop.call_llm
    type: llm_response
    tool_calls: [{name: bash, input: {command: ls}}]
  - node: agent_loop.execute_tools
    type: tool_result
    tool: bash
    output: {result: "file.txt"}
  - node: agent_loop.call_llm
    type: llm_response
    text: "Done!"
expect:
  outcome: completed
  reached: [agent_loop.call_llm, agent_loop.execute_tools]
Example - Partial testing with state:
name: test_from_middle
description: Test from a specific point with pre-populated state
start_at: process_result
state:
  call_llm:
    message: {role: assistant, content: "Previous response"}
events:
  - node: process_result
    output: {formatted: "Processed: Previous response"}
expect:
  outcome: completed

Fields

FieldTypeRequiredDescription
namestringYesName uniquely identifies this scenario. Required.
apiVersionstringNoApiVersion specifies the schema version. Optional, for future compatibility.
descriptionstringNoDescription documents what this scenario tests.
eventsSimulatedEvent[]YesEvents lists the simulated events in execution order. Required.
expectExpectationNoExpect defines the expected outcome and assertions. Optional.
inputsobjectNoInputs overrides workflow inputs for this scenario. Optional.
start_atstringNoStartAt begins execution at a specific node instead of the entry point. Optional.
statemap[string]objectNoState pre-populates node outputs before execution. Optional.

SimulatedEvent

SimulatedEvent represents a single mocked event in a test scenario. Events target specific nodes and provide mock outputs. There are two modes:
  1. Raw output mode: Set Output directly with the activity’s output structure
  2. Typed mode: Set Type with typed fields (text, tool_calls, tool_output)
Typed mode is automatically converted to the appropriate output structure. Example - LLM response with text:
  • node: call_llm
type: llm_response
text: "Hello, how can I help you?"
Example - LLM response with tool calls:
  • node: agent_loop.call_llm
type: llm_response
tool_calls:
  • name: bash
input: {command: "ls -la"}
Example - Tool result:
  • node: agent_loop.execute_tools
type: tool_result
tool: bash
output: {result: "file1.txt\nfile2.txt"}
Example - Raw output mode:
  • node: custom_node
output:
message:
role: assistant
content: "Custom response"

Fields

FieldTypeRequiredDescription
nodestringNoNode targets a specific node using dot-notation for qualified IDs.
outputobjectNoOutput is the raw mock output (mutually exclusive with Type).
typestringNoType specifies the event type for automatic conversion.
textstringNoText is the LLM response text (for type: llm_response).
tool_callsSimToolCall[]NoToolCalls are tool invocations from the LLM (for type: llm_response).
toolstringNoTool is the tool name (for type: tool_result or tool_error).
tool_outputobjectNoToolOutput is the tool execution result (for type: tool_result).

SimToolCall

SimToolCall represents a tool call in a simulated LLM response. Example:
tool_calls:
  - name: bash
    input: {command: "echo hello"}
  - name: search
    input: {query: "test results"}

Fields

FieldTypeRequiredDescription
namestringYesName is the tool name (e.g., “bash”, “search”, “edit”).
inputobjectNoInput contains the tool’s input parameters.

Expectation

Expectation defines what to verify after a scenario runs. All fields are optional - specify only what you want to assert. Node references support qualified IDs for inner loop nodes. Example - Assert completion and reached nodes:
expect:
  outcome: completed
  reached: [call_llm, process_result]
  not_reached: [error_handler]
Example - Assert error state:
expect:
  outcome: error
  error_contains: "rate limit exceeded"
  error_node: call_llm
Example - Assert specific output values:
expect:
  outcome: completed
  node_outputs:
    classify:
      category: "question"
      confidence: 0.95

Fields

FieldTypeRequiredDescription
outcomestringNoOutcome specifies whether the workflow should complete or error.
reachedstring[]NoReached lists nodes that must be scheduled during the scenario (completed, skipped, or errored).
not_reachedstring[]NoNotReached lists nodes that must NOT be scheduled during the scenario.
completedstring[]NoCompleted lists nodes that must have executed successfully.
skippedstring[]NoSkipped lists nodes that must have been skipped due to a false condition.
error_containsstringNoErrorContains specifies a substring that should appear in the error message.
error_nodestringNoErrorNode specifies which node should produce the error.
node_outputsmap[string]objectNoNodeOutputs specifies expected output values for specific nodes.

Targeting Nodes

The node field on events targets specific nodes by ID:
  • Top-level nodes: node: "call_llm"
  • Inner loop nodes: node: "loop_id.inner_node_id" (dot-separated)
  • Inner workflow nodes: node: "workflow_id.inner_node_id" (for type: workflow with inline:)
  • Nested structures: node: "outer.inner.node_id"
For inline loops and inline workflow nodes, the simulator executes each inner node individually, evaluates conditions, and tracks skipped nodes with their qualified IDs. For ref-based nodes (external workflow reference), the node is mocked as a black box using the ref name. Event matching:
  • Events with a node field are matched to that specific node
  • Events without a node field are consumed sequentially in order
  • Multiple events with the same node are consumed in order per-node (use for multi-iteration loops)

Event Types

TypeDescriptionRequired Fields
llm_responseSimulate LLM returning text and/or tool_callstext or tool_calls
tool_resultSimulate tool returning outputtool, tool_output
tool_errorSimulate tool errortool, tool_output (with error)
llm_errorSimulate LLM erroroutput (with error structure)
user_inputSimulate user messagetext

Complete Example

name: agent_tool_usage
description: Tests agent loop with tool execution

events:
  # First iteration: LLM requests a tool
  - node: agent_loop.call_llm
    type: llm_response
    tool_calls:
      - name: bash
        input: {command: "ls -la"}
  
  # Tool returns result
  - node: agent_loop.execute_tools
    type: tool_result
    tool: bash
    output: {result: "file1.txt\nfile2.txt"}
  
  # Second iteration: LLM completes
  - node: agent_loop.call_llm
    type: llm_response
    text: "I found 2 files: file1.txt and file2.txt"

expect:
  outcome: completed
  reached:
    - agent_loop.call_llm
    - agent_loop.execute_tools
    - agent_loop.save_result
  not_reached:
    - error_handler
  node_outputs:
    agent_loop:
      iterations: 2