How LLM Calls Work
When Reliant calls an LLM, it sends a structured request:- Text response (assistant message)
- Tool calls (requests to use tools)
- Stop reason (why it stopped:
end_turn,tool_use,max_tokens)
Tool Call Requirements
Critical caveat: When the LLM requests a tool, the tool must exist in the available tools list.Tool Matching Rules
- Tool calls reference tools by name
- The name must exactly match an available tool
- Tool inputs must match the tool’s schema
- Unknown tools cause execution errors
Best Practice
Usetools: true to include all default tools, or explicitly list needed tools:
System Messages
System prompts set the LLM’s behavior and context. They’re sent first in every request.Where System Prompts Come From
- Workflow-level
system_promptinput - Preset configuration
- Node-level override on
call_llm
System Prompt Best Practices
- Be specific about the task
- Define output format expectations
- Include examples if needed
- Keep it focused (long prompts waste tokens)
Context and Tokens
Context Window
Each model has a maximum context window (total tokens for input + output):| Model | Context Window |
|---|---|
| Claude Sonnet 4 | 200K tokens |
| Claude Opus 4 | 200K tokens |
| GPT-4o | 128K tokens |
| Gemini Pro | 1M tokens |
Token Counting
Reliant tracks token usage per-thread. When context grows large:- Compaction summarizes older messages
- Filtering reduces tool result verbosity
- Truncation as a last resort
Caching
Some providers cache repeated prefixes for efficiency:| Provider | Caching |
|---|---|
| Anthropic | Prompt caching (automatic) |
| OpenAI | No automatic caching |
| Context caching available |
Provider Differences
Anthropic (Claude)
Strengths:- Excellent code understanding
- Strong tool use
- Good at following complex instructions
- Tool calls must include
thinkingfor Opus models - Rate limits vary by tier
- No image generation
OpenAI (GPT-4)
Strengths:- Fast response times
- Good general knowledge
- Image understanding
- Different tool call format (function calling)
- JSON mode requires explicit enabling
- Rate limits per-minute
Google (Gemini)
Strengths:- Large context window
- Good multimodal support
- Competitive pricing
- Different safety filter behavior
- Tool use syntax differences
- Regional availability varies
Common Pitfalls
1. Tool Call Without Matching Tool
2. Context Overflow
3. Infinite Loops
4. Empty Tool Calls Array
5. Stop Reason Misunderstanding
Response Tools
Response tools force structured output from the LLM:- Routing decisions
- Classification tasks
- Structured data extraction
Debugging LLM Issues
Check the Raw Response
In the UI, expand a message to see:- Full prompt sent
- Raw LLM response
- Token counts
- Timing information
Common Fixes
| Symptom | Likely Cause | Fix |
|---|---|---|
| ”Tool not found” | Tool not in available list | Add tool to tools or tool_filter |
| Truncated output | Hit max_tokens | Increase limit or use streaming |
| Repeated responses | No stop condition | Add iteration limits to loops |
| Slow responses | Large context | Enable compaction or filter results |
| Wrong tool called | Ambiguous prompt | Clarify in system prompt |
See also: CEL Expressions