devtu-create-tool
npx skills add https://github.com/mims-harvard/tooluniverse --skill devtu-create-tool
Agent 安装分布
Skill 文档
ToolUniverse Tool Creator
Create new scientific tools for the ToolUniverse framework following established best practices.
Table of Contents
- Critical Knowledge
- Core Concepts
- Implementation Guide
- Testing Strategy
- Common Patterns
- Troubleshooting
- Reference
Critical Knowledge
Top 6 Mistakes (90% of Failures)
- Missing
default_config.pyEntry – Tools silently won’t load - Fake test_examples – Tests fail, agents get bad examples
- Single-level Testing – Misses registration bugs
- Skipping
test_new_tools.py– Misses schema/API issues - Tool Names > 55 chars – Breaks MCP compatibility
- Raising Exceptions – Should return error dicts instead
Tool Creator vs SDK User
| SDK User (Using) | Tool Creator (Building) |
|---|---|
tu.tools.ToolName() |
@register_tool() + JSON |
| Handle responses | Design schemas |
| One-level usage | Three-step registration |
Core Concepts
Two-Stage Architecture
Stage 1: Tool Class Stage 2: Wrappers (Auto-Generated)
Python Implementation From JSON Configs
â â
@register_tool("MyTool") MyAPI_list_items()
class MyTool(BaseTool): MyAPI_search()
def run(arguments): MyAPI_get_details()
Key Points:
- One class handles multiple operations
- JSON defines individual tool wrappers
- Users call wrappers, which route to class
- Need BOTH for tools to work
Three-Step Registration
Step 1: Class Registration
@register_tool("MyAPITool") # Decorator registers class
class MyAPITool(BaseTool):
pass
Step 2: Config Registration â ï¸ MOST COMMONLY MISSED
# In src/tooluniverse/default_config.py
TOOLS_CONFIGS = {
"my_category": os.path.join(current_dir, "data", "my_category_tools.json"),
}
Step 3: Wrapper Generation (Automatic)
tu = ToolUniverse()
tu.load_tools() # Auto-generates wrappers in tools/
Verification Script:
import sys
sys.path.insert(0, 'src')
# Step 1: Check class registered
from tooluniverse.tool_registry import get_tool_registry
import tooluniverse.your_tool_module
registry = get_tool_registry()
assert "YourToolClass" in registry, "â Step 1 FAILED"
print("â
Step 1: Class registered")
# Step 2: Check config registered
from tooluniverse.default_config import TOOLS_CONFIGS
assert "your_category" in TOOLS_CONFIGS, "â Step 2 FAILED"
print("â
Step 2: Config registered")
# Step 3: Check wrappers generated
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()
assert hasattr(tu.tools, 'YourCategory_operation1'), "â Step 3 FAILED"
print("â
Step 3: Wrappers generated")
print(f"â
All steps complete!")
Standard Response Format
All tools must return:
{
"status": "success" | "error",
"data": {...}, // On success
"error": "message" // On failure
}
Why: Consistent error handling, composability, user expectations
Implementation Guide
File Structure
Required Files:
src/tooluniverse/my_api_tool.py– Implementationsrc/tooluniverse/data/my_api_tools.json– Tool definitionstests/unit/test_my_api_tool.py– Testsexamples/my_api_examples.py– Usage examples
Auto-Generated (don’t create manually):
src/tooluniverse/tools/MyAPI_*.py– Wrappers
Pattern 1: Multi-Operation Tool (Recommended)
Python Class:
from typing import Dict, Any
from tooluniverse.tool import BaseTool
from tooluniverse.tool_utils import register_tool
import requests
@register_tool("MyAPITool")
class MyAPITool(BaseTool):
"""Tool for MyAPI database."""
BASE_URL = "https://api.example.com/v1"
def __init__(self, tool_config):
super().__init__(tool_config)
self.parameter = tool_config.get("parameter", {})
self.required = self.parameter.get("required", [])
def run(self, arguments: Dict[str, Any]) -> Dict[str, Any]:
"""Route to operation handler."""
operation = arguments.get("operation")
if not operation:
return {"status": "error", "error": "Missing: operation"}
if operation == "list_items":
return self._list_items(arguments)
elif operation == "search":
return self._search(arguments)
else:
return {"status": "error", "error": f"Unknown: {operation}"}
def _list_items(self, arguments: Dict[str, Any]) -> Dict[str, Any]:
"""List items with pagination."""
try:
params = {}
if "limit" in arguments:
params["limit"] = arguments["limit"]
response = requests.get(
f"{self.BASE_URL}/items",
params=params,
timeout=30
)
response.raise_for_status()
data = response.json()
return {
"status": "success",
"data": data.get("items", []),
"total": data.get("total", 0)
}
except requests.exceptions.Timeout:
return {"status": "error", "error": "Timeout after 30s"}
except requests.exceptions.HTTPError as e:
return {"status": "error", "error": f"HTTP {e.response.status_code}"}
except Exception as e:
return {"status": "error", "error": str(e)}
def _search(self, arguments: Dict[str, Any]) -> Dict[str, Any]:
"""Search items by query."""
query = arguments.get("query")
if not query:
return {"status": "error", "error": "Missing: query"}
try:
response = requests.get(
f"{self.BASE_URL}/search",
params={"q": query},
timeout=30
)
response.raise_for_status()
data = response.json()
return {
"status": "success",
"results": data.get("results", []),
"count": data.get("count", 0)
}
except requests.exceptions.RequestException as e:
return {"status": "error", "error": f"API failed: {str(e)}"}
JSON Configuration:
[
{
"name": "MyAPI_list_items",
"class": "MyAPITool",
"description": "List items from database with pagination. Returns item IDs and names. Supports filtering by status and type. Example: limit=10 returns first 10 items.",
"parameter": {
"type": "object",
"required": ["operation"],
"properties": {
"operation": {
"const": "list_items",
"description": "Operation type (fixed)"
},
"limit": {
"type": "integer",
"description": "Max results (1-100)",
"minimum": 1,
"maximum": 100
}
}
},
"return": {
"type": "object",
"properties": {
"status": {"type": "string", "enum": ["success", "error"]},
"data": {"type": "array"},
"total": {"type": "integer"},
"error": {"type": "string"}
},
"required": ["status"]
},
"test_examples": [
{
"operation": "list_items",
"limit": 10
}
]
}
]
Pattern 2: Async Polling (Job-Based APIs)
import time
def _submit_job(self, arguments: Dict[str, Any]) -> Dict[str, Any]:
"""Submit job and poll for results."""
try:
# Submit
submit_response = requests.post(
f"{self.BASE_URL}/jobs/submit",
json={"data": arguments.get("data")},
timeout=30
)
submit_response.raise_for_status()
job_id = submit_response.json().get("job_id")
# Poll
for attempt in range(60): # 2 min max
status_response = requests.get(
f"{self.BASE_URL}/jobs/{job_id}/status",
timeout=30
)
status_response.raise_for_status()
result = status_response.json()
if result.get("status") == "completed":
return {
"status": "success",
"data": result.get("results"),
"job_id": job_id
}
elif result.get("status") == "failed":
return {
"status": "error",
"error": result.get("error"),
"job_id": job_id
}
time.sleep(2) # Poll every 2s
return {"status": "error", "error": "Timeout after 2 min"}
except requests.exceptions.RequestException as e:
return {"status": "error", "error": str(e)}
API Key Configuration
Tools can specify API key requirements in JSON config:
required_api_keys – Tool won’t load without these:
{
"name": "NVIDIA_ESMFold_predict",
"required_api_keys": ["NVIDIA_API_KEY"],
...
}
optional_api_keys – Tool loads and works without keys, but with reduced performance:
{
"name": "PubMed_search_articles",
"optional_api_keys": ["NCBI_API_KEY"],
"description": "Search PubMed. Rate limits: 3 req/sec without key, 10 req/sec with NCBI_API_KEY.",
...
}
When to use optional_api_keys:
- APIs with rate limits that improve with a key (NCBI: 3â10 req/sec, Semantic Scholar: 1â100 req/sec)
- APIs that work anonymously but have better quotas with authentication
- Tools that should always be available, even for users without API keys
Implementation pattern for optional keys:
def __init__(self, tool_config):
super().__init__(tool_config)
# Read from environment variable only (not as parameter)
self.api_key = os.environ.get("NCBI_API_KEY", "")
def run(self, arguments):
# Adjust behavior based on key availability
has_key = bool(self.api_key)
rate_limit = 0.1 if has_key else 0.4 # Faster with key
...
Key rules:
- â Don’t add
api_keyas a tool parameter for optional keys - â Read optional keys from environment variables only
- â Include rate limit info in tool description
- â Tool must work (with degraded performance) when key is missing
JSON Best Practices
Tool Naming (â¤55 chars for MCP):
- Template:
{API}_{action}_{target} - â
Good:
FDA_get_drug_info(20 chars) - â Bad:
FDA_get_detailed_drug_information_with_history(55+ chars)
Description (150-250 chars, high-context):
{
"description": "Search database for items. Returns up to 100 results with scores. Supports wildcards (* ?) and Boolean operators (AND, OR, NOT). Example: 'protein AND membrane' finds membrane proteins."
}
Include: What it returns, data source, use case, input format, example
test_examples (MUST be real):
{
"test_examples": [
{
"operation": "search",
"query": "protein", // â
Real, common term
"limit": 10
}
]
}
â Don’t use: "id": "XXXXX", "placeholder": "example_123"
â
Do use: Real IDs from actual API documentation
JSON Config Conventions
- The
"type"field in JSON must match the Python class name from@register_tool - Include
return_schemato define tool output structure (not justreturn) - Put example inputs in
test_examplesonly; avoidexamplesblocks insideparameter/return_schemaâ they bloat configs and drift from reality - For large allow-lists, document values in
descriptionand enforce in Python (avoid giant schemaenum) - Auto-discovery: tools placed in
src/tooluniverse/are auto-discovered; no__init__.pymodification needed - Optional: implement
validate_parameters(arguments)for custom validation
For the full implementation plan, maintenance checklist, and large API expansion guidance, see references/implementation-guide.md
Testing Strategy
Final Validation with test_new_tools.py (MANDATORY)
All new tools MUST pass scripts/test_new_tools.py before submission.
This script tests tools using their test_examples from JSON configs and validates responses against return_schema.
# Test your specific tools
python scripts/test_new_tools.py your_tool_name
# Test with verbose output
python scripts/test_new_tools.py your_tool_name -v
# Test all tools (for full validation)
python scripts/test_new_tools.py
# Stop on first failure
python scripts/test_new_tools.py your_tool_name --fail-fast
What it validates:
- Tool execution succeeds (no exceptions)
- Response matches
return_schema(if defined) - 404 errors tracked separately (indicates bad test_examples)
- Tools with missing API keys are skipped gracefully
Common failures and fixes:
| Failure | Cause | Fix |
|---|---|---|
| 404 ERROR | Invalid ID in test_examples | Use real IDs from API docs |
| Schema Mismatch | Response doesn’t match return_schema | Update schema or fix response format |
| Exception | Code bug or missing dependency | Check error message, fix implementation |
Two-Level Testing (MANDATORY)
Level 1: Direct Class Testing
import json
from tooluniverse.your_tool_module import YourToolClass
def test_direct_class():
"""Test implementation logic."""
with open("src/tooluniverse/data/your_tools.json") as f:
tools = json.load(f)
config = next(t for t in tools if t["name"] == "YourTool_operation1")
tool = YourToolClass(config)
result = tool.run({"operation": "operation1", "param": "value"})
assert result["status"] == "success"
assert "data" in result
Level 2: ToolUniverse Interface Testing
import pytest
from tooluniverse import ToolUniverse
class TestYourTools:
@pytest.fixture
def tu(self):
tu = ToolUniverse()
tu.load_tools() # CRITICAL
return tu
def test_tools_load(self, tu):
"""Verify registration."""
assert hasattr(tu.tools, 'YourTool_operation1')
def test_execution(self, tu):
"""Test via ToolUniverse (how users call it)."""
result = tu.tools.YourTool_operation1(**{
"operation": "operation1",
"param": "value"
})
assert result["status"] == "success"
def test_error_handling(self, tu):
"""Test missing params."""
result = tu.tools.YourTool_operation1(**{
"operation": "operation1"
# Missing required param
})
assert result["status"] == "error"
Level 3: Real API Testing
def test_real_api():
"""Verify actual API integration."""
tu = ToolUniverse()
tu.load_tools()
result = tu.tools.YourTool_operation1(**{
"operation": "operation1",
"param": "real_value_from_docs"
})
if result["status"] == "success":
assert "data" in result
print("â
Real API works")
else:
print(f"â ï¸ API error (may be down): {result['error']}")
Why Both Levels:
- Level 1: Tests implementation, catches code bugs
- Level 2: Tests registration, catches config bugs
- Level 3: Tests integration, catches API issues
Common Patterns
Error Handling Checklist
â Always set timeout (30s recommended) â Catch specific exceptions (Timeout, ConnectionError, HTTPError) â Return error dicts, never raise in run() â Include helpful context in error messages â Handle JSON parsing errors â Validate required parameters
Dependency Management
Check package size FIRST:
curl -s https://pypi.org/pypi/PACKAGE/json | python3 -c "
import json, sys
data = json.load(sys.stdin)
print(f'Dependencies: {len(data[\"info\"][\"requires_dist\"] or [])}')
"
Classification:
- Core (<100MB, universal use) â
[project.dependencies] - Optional (>100MB or niche) â
[project.optional-dependencies]
In code:
try:
import optional_package
except ImportError:
return {
"status": "error",
"error": "Install with: pip install optional_package"
}
Pagination Pattern
def _list_items(self, arguments):
params = {}
if "page" in arguments:
params["page"] = arguments["page"]
if "limit" in arguments:
params["limit"] = arguments["limit"]
response = requests.get(url, params=params, timeout=30)
data = response.json()
return {
"status": "success",
"data": data.get("items", []),
"page": data.get("page", 0),
"total_pages": data.get("total_pages", 1),
"total_items": data.get("total", 0)
}
Troubleshooting
Tool Doesn’t Load (90% of Issues)
Symptoms: Tool count doesn’t increase, no error, AttributeError when calling
Cause: Missing Step 2 of registration (default_config.py)
Solution:
# Edit src/tooluniverse/default_config.py
TOOLS_CONFIGS = {
# ... existing ...
"your_category": os.path.join(current_dir, "data", "your_category_tools.json"),
}
Verify:
grep "your_category" src/tooluniverse/default_config.py
ls src/tooluniverse/tools/YourCategory_*.py
python3 -c "from tooluniverse import ToolUniverse; tu = ToolUniverse(); tu.load_tools(); print(hasattr(tu.tools, 'YourCategory_op1'))"
Tests Fail with Real APIs
Mock vs Real Testing:
- Mocks test code structure
- Real calls test API integration
- Both needed for confidence
What Real Testing Catches:
- Response structure differences
- Parameter name mismatches
- Unexpected pagination
- Timeout issues
- Data type surprises
Reference
Complete Workflow
- Create Python class with
@register_tool - Create JSON config with realistic test_examples and
return_schema - Add to
default_config.pyâ CRITICAL - Generate wrappers:
tu.load_tools() - Test Level 1 (direct class)
- Test Level 2 (ToolUniverse interface)
- Test Level 3 (real API calls)
- Run
python scripts/test_new_tools.py your_tool -vâ MANDATORY - Run
python scripts/check_tool_name_lengths.py --test-shortening - Create examples file
- Verify all 3 registration steps
- Document in verification report
For the full development checklist and maintenance phases, see references/implementation-guide.md
Quick Commands
# Validate JSON
python3 -m json.tool src/tooluniverse/data/your_tools.json
# Check Python syntax
python3 -m py_compile src/tooluniverse/your_tool.py
# Verify registration
grep "your_category" src/tooluniverse/default_config.py
# Generate wrappers
PYTHONPATH=src python3 -m tooluniverse.generate_tools --force
# List wrappers
ls src/tooluniverse/tools/YourCategory_*.py
# Run unit tests
pytest tests/unit/test_your_tool.py -v
# MANDATORY: Run test_new_tools.py validation
python scripts/test_new_tools.py your_tool -v
# Count tools
python3 << 'EOF'
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()
print(f"Total: {len([t for t in dir(tu.tools) if 'YourCategory' in t])} tools")
EOF
Critical Reminders
â ï¸ ALWAYS add to default_config.py (Step 2)
â ï¸ NEVER raise exceptions in run()
â ï¸ ALWAYS use real test_examples
â ï¸ ALWAYS test both levels
â ï¸ RUN python scripts/test_new_tools.py your_tool -v before submitting
â ï¸ KEEP tool names â¤55 characters
â ï¸ RETURN standard response format
â ï¸ SET timeout on all HTTP requests
â ï¸ VERIFY all 3 registration steps
â ï¸ USE optional_api_keys for rate-limited APIs that work without keys
â ï¸ NEVER add api_key parameter for optional keys – use env vars only
Success Criteria
â
All 3 registration steps verified
â
Level 1 tests passing (direct class)
â
Level 2 tests passing (ToolUniverse interface)
â
Real API calls working (Level 3)
â
test_new_tools.py passes with 0 failures â MANDATORY
â
Tool names â¤55 characters
â
test_examples use real IDs
â
Standard response format used
â
Helpful error messages
â
Examples file created
â
No raised exceptions in run()
When all criteria met â Production Ready ð