sf-ai-agentforce-testing
npx skills add https://github.com/jaganpro/claude-code-sfskills --skill sf-ai-agentforce-testing
Agent 安装分布
Skill 文档
sf-ai-agentforce-testing: Agentforce Test Execution & Coverage Analysis
Expert testing engineer specializing in Agentforce agent testing via dual-track workflow: multi-turn Agent Runtime API testing (primary) and CLI Testing Center (secondary). Execute multi-turn conversations, analyze topic/action/context coverage, and automatically fix issues via sf-ai-agentscript.
Core Responsibilities
- Multi-Turn API Testing (PRIMARY): Execute multi-turn conversations via Agent Runtime API
- CLI Test Execution (SECONDARY): Run single-utterance tests via
sf agent test run - Test Spec / Scenario Generation: Create YAML test specifications and multi-turn scenarios
- Coverage Analysis: Track topic, action, context preservation, and re-matching coverage
- Preview Testing: Interactive simulated and live agent testing
- Agentic Fix Loop: Automatically fix failing agents and re-test
- Cross-Skill Orchestration: Delegate fixes to sf-ai-agentscript, data to sf-data
- Observability Integration: Guide to sf-ai-agentforce-observability for STDM analysis
ð Document Map
| Need | Document | Description |
|---|---|---|
| Agent Runtime API | agent-api-reference.md | REST endpoints for multi-turn testing |
| ECA Setup | eca-setup-guide.md | External Client App for API authentication |
| Multi-Turn Testing | multi-turn-testing-guide.md | Multi-turn test design and execution |
| Test Patterns | multi-turn-test-patterns.md | 6 multi-turn test patterns with examples |
| CLI commands | cli-commands.md | Complete sf agent test/preview reference |
| Test spec format | test-spec-reference.md | YAML specification format and examples |
| Auto-fix workflow | agentic-fix-loops.md | Automated test-fix cycles (10 failure categories) |
| Auth guide | connected-app-setup.md | Authentication for preview and API testing |
| Coverage metrics | coverage-analysis.md | Topic/action/multi-turn coverage analysis |
| Fix decision tree | agentic-fix-loop.md | Detailed fix strategies |
| Agent Script testing | agentscript-testing-patterns.md | 5 patterns for testing Agent Script agents |
| Deep conversation history | deep-conversation-history-patterns.md | 5 patterns for protocol-stage testing via CLI conversationHistory |
| Interview wizard | interview-wizard.md | 4-step Testing Center wizard flow |
| Execution protocol | execution-protocol.md | Phase A4 mandatory execution checklist |
| Credential convention | credential-convention.md | ~/.sfagent/ persistent ECA storage |
| Swarm execution | swarm-execution.md | Parallel team testing rules + CLI swarm |
| Test plan format | test-plan-format.md | Reusable YAML plan schema |
| Multi-turn execution | multi-turn-execution.md | Detailed A4 execution options + analysis |
| Results & scoring | results-scoring.md | A5 + B3 report formats |
| Agent Script agents | agentscript-agents.md | AiAuthoringBundle testing guide |
| CLI testing details | cli-testing-details.md | Topic resolution, gotchas, context vars, metrics, custom evals |
| Coverage improvement | coverage-improvement.md | Phase D coverage dimensions + thresholds |
| Scoring rubric | scoring-rubric.md | 100-point scoring system |
| CLI commands (ref) | cli-commands.md | Test lifecycle + preview command reference |
| Test templates | test-templates.md | Multi-turn + CLI template catalog |
| Automated testing | automated-testing.md | Python scripts + test-fix loop |
| Key insights | key-insights.md | Common problems + solutions |
| Known issues | known-issues.md | Platform bugs + workarounds |
Script Location (MANDATORY)
SKILL_PATH: ~/.claude/skills/sf-ai-agentforce-testing
All Python scripts live at absolute paths under {SKILL_PATH}/hooks/scripts/. NEVER recreate these scripts. They already exist. Use them as-is.
All scripts in hooks/scripts/ are pre-approved for execution. Do NOT ask the user for permission to run them.
| Script | Absolute Path |
|---|---|
agent_api_client.py |
{SKILL_PATH}/hooks/scripts/agent_api_client.py |
agent_discovery.py |
{SKILL_PATH}/hooks/scripts/agent_discovery.py |
credential_manager.py |
{SKILL_PATH}/hooks/scripts/credential_manager.py |
generate_multi_turn_scenarios.py |
{SKILL_PATH}/hooks/scripts/generate_multi_turn_scenarios.py |
generate-test-spec.py |
{SKILL_PATH}/hooks/scripts/generate-test-spec.py |
multi_turn_test_runner.py |
{SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py |
multi_turn_fix_loop.py |
{SKILL_PATH}/hooks/scripts/multi_turn_fix_loop.py |
run-automated-tests.py |
{SKILL_PATH}/hooks/scripts/run-automated-tests.py |
parse-agent-test-results.py |
{SKILL_PATH}/hooks/scripts/parse-agent-test-results.py |
rich_test_report.py |
{SKILL_PATH}/hooks/scripts/rich_test_report.py |
Variable resolution: At runtime, resolve
SKILL_PATHto the skill’s installation directory. Hardcoded fallback:~/.claude/skills/sf-ai-agentforce-testing.
â ï¸ CRITICAL: Orchestration Order
sf-metadata â sf-apex â sf-flow â sf-deploy â sf-ai-agentscript â sf-deploy â sf-ai-agentforce-testing (you are here)
Why testing is LAST:
- Agent must be published before running automated tests
- Agent must be activated for preview mode and API access
- All dependencies (Flows, Apex) must be deployed first
- Test data (via sf-data) should exist before testing actions
â ï¸ MANDATORY Delegation:
- Fixes: ALWAYS use the sf-ai-agentscript skill for agent script fixes
- Test Data: Use the sf-data skill for action test data
- OAuth Setup (multi-turn API testing only): Use the sf-connected-apps skill for ECA â NOT needed for
sf agent previewor CLI tests - Observability: Use the sf-ai-agentforce-observability skill for STDM analysis of test sessions
Architecture: Dual-Track Testing Workflow
4-Step Interview (mirrors Testing Center wizard)
â Step 1: Basic Info â Step 2: Conditions â Step 3: Test Data â Step 4: Evaluate
â (skip if test-plan-{agent}.yaml provided)
â
â¼
Phase 0: Prerequisites & Agent Discovery
â
âââ⺠Phase A: Multi-Turn API Testing (PRIMARY â requires ECA)
â A1: ECA Credential Setup (via credential_manager.py)
â A2: Agent Discovery & Metadata Retrieval
â A3: Test Scenario Planning (generate_multi_turn_scenarios.py --categorized)
â A4: Multi-Turn Execution (Agent Runtime API)
â ââ Sequential: single multi_turn_test_runner.py process
â ââ Swarm: TeamCreate â N workers (--worker-id N)
â A5: Results & Scoring (rich Unicode output)
â
âââ⺠Phase B: CLI Testing Center (SECONDARY)
B1: Test Spec Creation
B2: Test Execution (sf agent test run)
B3: Results Analysis
â
Phase C: Agentic Fix Loop (shared)
Phase D: Coverage Improvement (shared)
Phase E: Observability Integration (STDM analysis)
When to use which track:
| Condition | Use |
|---|---|
| Agent Testing Center NOT available | Phase A only |
| Need multi-turn conversation testing | Phase A |
| Need topic re-matching validation | Phase A |
| Need context preservation testing | Phase A |
| Agent Testing Center IS available + single-utterance tests | Phase B |
| CI/CD pipeline integration | Phase A (Python scripts) or Phase B (sf CLI) |
| Quick smoke test | Phase B |
| Quick manual validation (no ECA setup) | sf agent preview (no Phase A/B needed) |
| No ECA available | sf agent preview or Phase B (CLI tests) |
4-Step Interview Flow
See references/interview-wizard.md for the full 4-step wizard with interview prompts and auto-run steps.
Quick summary: Mirrors the Testing Center “New Test” wizard â Step 1: Basic Info (agent, org, test type), Step 2: Conditions (context vars, record IDs), Step 3: Test Data (generate + review), Step 4: Evaluations & Deploy. Skip if test-plan-{agent}.yaml provided.
Phase 0: Prerequisites & Agent Discovery
Ask the user to gather agent name, org alias, and test type. Then:
- Agent Discovery:
sf data query --use-tooling-api --query "SELECT Id, DeveloperName, MasterLabel FROM BotDefinition WHERE IsActive=true" --result-format json --target-org [alias] - Metadata Retrieval:
sf project retrieve start --metadata "GenAiPlannerBundle:[AgentName]" --output-dir retrieve-temp --target-org [alias] - Testing Center Check:
sf agent test list --target-org [alias]â determines if Phase B is available
| Check | Command | Why |
|---|---|---|
| Agent exists | Query BotDefinition | Can’t test non-existent agent |
| Agent published | sf agent validate authoring-bundle --api-name X |
Must be published to test |
| Agent activated | Check activation status | Required for API access |
| Dependencies deployed | Flows and Apex in org | Actions will fail without them |
| ECA configured (Phase A only) | Token request test | Multi-turn API testing only |
| Agent Testing Center (Phase B) | sf agent test list |
Required for CLI testing |
Phase A: Multi-Turn API Testing (PRIMARY)
â ï¸ NEVER use
curlfor OAuth token validation. Domains containing--cause shell expansion failures. Usecredential_manager.py validateinstead.
A1: ECA Credential Setup
See credential-convention.md for ~/.sfagent/ directory structure and CLI reference.
If user has ECA credentials â collect and validate via credential_manager.py validate. If not â use the sf-connected-apps skill. See ECA Setup Guide.
A2: Agent Discovery & Metadata Retrieval
AGENT_ID=$(sf data query --use-tooling-api \
--query "SELECT Id, DeveloperName, MasterLabel FROM BotDefinition WHERE DeveloperName='[AgentName]' AND IsActive=true LIMIT 1" \
--result-format json --target-org [alias] | jq -r '.result.records[0].Id')
Claude reads the GenAiPlannerBundle to understand topics, actions, system instructions, and escalation paths. This metadata drives automatic test scenario generation in A3.
A3: Test Scenario Planning
Auto-generate multi-turn scenarios tailored to the specific agent based on metadata from A2. Available templates in assets/ â see references/test-templates.md.
A4: Multi-Turn Execution
See references/execution-protocol.md for the MANDATORY execution checklist (sequential vs swarm). See references/multi-turn-execution.md for detailed execution options, Python API usage, and per-turn analysis.
Quick start:
python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py \
--scenarios assets/multi-turn-comprehensive.yaml \
--agent-id "${AGENT_ID}" --output results.json --verbose
Exit codes: 0 = all passed, 1 = some failed, 2 = execution error
A5: Results & Scoring
See references/results-scoring.md for full report format examples (API + CLI).
Quick summary: Rich terminal report with scenario pass/fail, turn-level analysis, coverage percentages (topic re-matching, context preservation, escalation accuracy), and 7-category scoring.
Phase B: CLI Testing Center (SECONDARY)
Availability: Requires Agent Testing Center feature enabled in org. If unavailable, use Phase A exclusively.
Agent Script Agents (AiAuthoringBundle)
See references/agentscript-agents.md for the full testing guide including two-level action system, conversationHistory pattern, and API testing caveats.
Quick summary: Agent Script agents use conversationHistory to bypass single-utterance limitations. Use Level 1 definition names in expectedActions. Prefer response_contains over action_invoked for API tests.
B1: Test Spec Creation
â ï¸ CRITICAL: YAML Schema â The CLI YAML spec uses a FLAT structure parsed by @salesforce/agents. Required top-level: name:, subjectType: AGENT, subjectName:. Test case fields: utterance:, expectedTopic:, expectedActions: (flat strings), expectedOutcome:.
# â
Correct CLI YAML format
name: "My Agent Tests"
subjectType: AGENT
subjectName: My_Agent
testCases:
- utterance: "Where is my order?"
expectedTopic: order_lookup
expectedActions:
- get_order_status
expectedOutcome: "Agent should provide order status information"
See Test Spec Reference for complete YAML format guide.
CLI Testing Details (B1.5âB1.9)
See references/cli-testing-details.md for topic name resolution, known gotchas, context variables, metrics, and custom evaluations.
B2: Test Execution
# Run automated tests (--json = no spinners, --result-format json = structured results)
sf agent test run --api-name MyAgentTest --wait 10 --result-format json --json --target-org [alias]
Interactive Preview: sf agent preview --api-name AgentName --target-org [alias] (no ECA required)
Debugging with --verbose
The --verbose flag on test results and test resume exposes generatedData.invokedActions â the full action invocation detail including inputs, outputs, and latency per action. This is critical for debugging action I/O failures and building JSONPath expressions for custom evaluations. See cli-commands.md for the full generatedData structure.
B3: Results Analysis
See references/results-scoring.md for the full CLI results report format.
Phase C: Agentic Fix Loop
When tests fail (either Phase A or Phase B), automatically fix via sf-ai-agentscript:
Failure Categories (10 total)
| Category | Source | Auto-Fix | Strategy |
|---|---|---|---|
TOPIC_NOT_MATCHED |
A+B | â | Add keywords to topic description |
ACTION_NOT_INVOKED |
A+B | â | Improve action description |
WRONG_ACTION_SELECTED |
A+B | â | Differentiate descriptions |
ACTION_INVOCATION_FAILED |
A+B | â ï¸ | Delegate to sf-flow or sf-apex |
GUARDRAIL_NOT_TRIGGERED |
A+B | â | Add explicit guardrails |
ESCALATION_NOT_TRIGGERED |
A+B | â | Add escalation action/triggers |
TOPIC_RE_MATCHING_FAILURE |
A | â | Add transition phrases to target topic |
CONTEXT_PRESERVATION_FAILURE |
A | â | Add context retention instructions |
MULTI_TURN_ESCALATION_FAILURE |
A | â | Add frustration detection triggers |
ACTION_CHAIN_FAILURE |
A | â | Fix action output variable mappings |
Fix flow: Test Failed â Analyze category â Apply fix via the sf-ai-agentscript skill â Re-publish â Re-test â Pass or retry (max 3) â Escalate to human.
See Agentic Fix Loops Guide for complete decision tree and 10 fix strategies.
Phase D: Coverage Improvement
See references/coverage-improvement.md for the full coverage dimensions table and thresholds.
Quick summary: 8 dimensions (topic selection, action invocation, re-matching, context preservation, completion, guardrails, escalation, phrasing diversity). Iterate: identify gaps â add tests â re-run â repeat until thresholds met. See Coverage Analysis.
Phase E: Observability Integration
After test execution, guide user to analyze agent behavior with session-level observability:
Use the sf-ai-agentforce-observability skill: “Analyze STDM sessions for agent [AgentName] in org [alias] – focus on test session behavior patterns”
What observability adds to testing: STDM session analysis, latency profiling, error pattern detection, action execution traces.
Scoring System (100 Points)
See references/scoring-rubric.md for full category breakdown and grade scale.
Quick summary: 7 categories, 100 total points. Topic Selection (15), Action Invocation (15), Multi-Turn Re-matching (15), Context Preservation (15), Edge Cases & Guardrails (15), Test Quality (10), Agentic Fix Success (15). Grade: 90+ Production Ready, 80+ Good, 70+ Acceptable, <60 BLOCKED.
â TESTING GUARDRAILS (MANDATORY)
BEFORE running tests, verify:
| Check | Command | Why |
|---|---|---|
| Agent published | sf agent list --target-org [alias] |
Can’t test unpublished agent |
| Agent activated | Check status | API and preview require activation |
| Flows deployed | sf org list metadata --metadata-type Flow |
Actions need Flows |
| ECA configured (Phase A only) | Token request test | Required for Agent Runtime API |
| Org auth (Phase B live) | sf org display |
Live mode requires valid auth |
NEVER do these:
| Anti-Pattern | Problem | Correct Pattern |
|---|---|---|
| Test unpublished agent | Tests fail silently | Publish first |
| Skip simulated testing | Live mode hides logic bugs | Always test simulated first |
| Ignore guardrail tests | Security gaps in production | Always test harmful/off-topic inputs |
| Single phrasing per topic | Misses routing failures | Test 3+ phrasings per topic |
| Write ECA credentials to files | Security risk | Keep in shell variables only |
| Skip session cleanup | Resource leaks and rate limits | Always DELETE sessions after tests |
Use curl for OAuth token requests |
Domains with -- cause shell failures |
Use credential_manager.py validate |
| Ask permission to run skill scripts | Breaks flow, unnecessary delay | All hooks/scripts/ are pre-approved |
| Spawn more than 2 swarm workers | Context overload, diminishing returns | Max 2 workers |
Cross-Skill Integration
| Scenario | Skill to Call | Command |
|---|---|---|
| Fix agent script | sf-ai-agentscript | Use the sf-ai-agentscript skill: “Fix…” |
| Agent Script agents | sf-ai-agentscript | Parse .agent for topic/action discovery |
| Create test data | sf-data | Use the sf-data skill: “Create…” |
| Fix failing Flow | sf-flow | Use the sf-flow skill: “Fix…” |
| Setup ECA or OAuth | sf-connected-apps | Use the sf-connected-apps skill: “Create…” |
| Analyze debug logs | sf-debug | Use the sf-debug skill: “Analyze…” |
| Session observability | sf-ai-agentforce-observability | Use the sf-ai-agentforce-observability skill: “Analyze…” |
Quick Start Example
Multi-Turn API Testing (Recommended)
# 1. Get agent ID
AGENT_ID=$(sf data query --use-tooling-api \
--query "SELECT Id FROM BotDefinition WHERE DeveloperName='My_Agent' AND IsActive=true LIMIT 1" \
--result-format json --target-org dev | jq -r '.result.records[0].Id')
# 2. Run multi-turn tests
python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py \
--agent-id "${AGENT_ID}" \
--scenarios assets/multi-turn-comprehensive.yaml \
--output results.json --verbose
CLI Testing (If Agent Testing Center Available)
sf agent test create --spec ./tests/myagent-tests.yaml --api-name MyAgentTest --target-org dev
sf agent test run --api-name MyAgentTest --wait 10 --result-format json --target-org dev
sf agent test results --job-id [JOB_ID] --verbose --result-format json --target-org dev
License
MIT License. See LICENSE file. Copyright (c) 2024-2026 Jag Valaiyapathy