mixseek-evaluator-config
npx skills add https://github.com/drillan/mixseek-plus --skill mixseek-evaluator-config
Agent 安装分布
Skill 文档
MixSeek è©ä¾¡è¨å®çæ
æ¦è¦
MixSeek-Coreã®è©ä¾¡è¨å®ãã¡ã¤ã«ï¼evaluator.tomlï¼ã¨å¤å®è¨å®ãã¡ã¤ã«ï¼judgment.tomlï¼ãçæãã¾ããTUMIXãã¼ãã¡ã³ãã«ãããSubmissionã®è©ä¾¡åºæºãã¹ã³ã¢ãªã³ã°æ¹æ³ãæçµå¤å®ãã¸ãã¯ãå®ç¾©ãã¾ãã
åææ¡ä»¶
- ã¯ã¼ã¯ã¹ãã¼ã¹ãåæåããã¦ãããã¨ï¼
mixseek-workspace-initåç §ï¼ - ç°å¢å¤æ°
MIXSEEK_WORKSPACEãè¨å®ããã¦ãããã¨ï¼æ¨å¥¨ï¼
çæãã¡ã¤ã«
| ãã¡ã¤ã« | ç¨é | é ç½®å ´æ |
|---|---|---|
evaluator.toml |
Submissionã®ã¹ã³ã¢ãªã³ã°è¨å® | configs/evaluators/ |
judgment.toml |
æçµå¤å®ã®è¨å® | configs/judgment/ |
ä½¿ç¨æ¹æ³
Step 1: è¦ä»¶ã®ãã¢ãªã³ã°
ã¦ã¼ã¶ã¼ã«ä»¥ä¸ã確èªãã¦ãã ãã:
- è©ä¾¡ã®éç¹: ä½ãéè¦ãã¦è©ä¾¡ãããï¼æç¢ºæ§ãã«ãã¬ãã¸ãé¢é£æ§ãªã©ï¼
- éã¿ä»ã: åã¡ããªã¯ã¹ã®éè¦åº¦ï¼åç or ã«ã¹ã¿ã ï¼
- å¤å®ã¹ã¿ã¤ã«: 決å®è«çï¼temperature=0ï¼or 夿§æ§éè¦
Step 2: ã¡ããªã¯ã¹è¨å®ã®ææ¡
æ¨æºã¡ããªã¯ã¹ãã鏿:
| ã¡ããªã¯ã¹ | 説æ | ç¨é |
|---|---|---|
ClarityCoherence |
æç¢ºæ§ã¨ä¸è²«æ§ | èªã¿ãããéè¦ã®ã¿ã¹ã¯ |
Coverage |
ã«ãã¬ã㸠| ç¶²ç¾ æ§éè¦ã®ã¿ã¹ã¯ |
LLMPlain |
æ±ç¨LLMè©ä¾¡ | ã«ã¹ã¿ã è©ä¾¡åºæºãå¿ è¦ãªã¿ã¹ã¯ |
Relevance |
é¢é£æ§ | ç確ãéè¦ã®ã¿ã¹ã¯ |
Step 3: è¨å®ãã¡ã¤ã«ã®çæ
evaluator.toml:
default_model = "google-gla:gemini-2.5-pro"
temperature = 0.0
[[metrics]]
name = "ClarityCoherence"
weight = 0.34
[[metrics]]
name = "Coverage"
weight = 0.33
[[metrics]]
name = "Relevance"
weight = 0.33
judgment.toml:
model = "google-gla:gemini-2.5-pro"
temperature = 0.0
timeout_seconds = 60
Step 4: ãã¡ã¤ã«ã®ä¿å
$MIXSEEK_WORKSPACE/configs/evaluators/evaluator.toml
$MIXSEEK_WORKSPACE/configs/judgment/judgment.toml
éè¦: ã«ã¹ã¿ã ãã¹ï¼configs/evaluators/ãconfigs/judgment/ï¼ã使ç¨ããå ´åã¯ãå¿
ãorchestrator.tomlã§ãã¹ãæç¤ºçã«æå®ãã¦ãã ãããæå®ããªãã¨ããã©ã«ããã¹ï¼configs/evaluator.tomlãconfigs/judgment.tomlï¼ãæ¤ç´¢ãããè¨å®ãåæ ããã¾ããã
# orchestrator.toml
[orchestrator]
evaluator_config = "configs/evaluators/evaluator.toml"
judgment_config = "configs/judgment/judgment.toml"
Step 5: è¨å®ãã¡ã¤ã«ã®æ¤è¨¼ï¼å¿ é ï¼
çæå¾ã¯å¿ ãæ¤è¨¼ãå®è¡ãã¦ãã ããã
# Evaluatorè¨å®ã®æ¤è¨¼
uv run python skills/mixseek-config-validate/scripts/validate-config.py \
$MIXSEEK_WORKSPACE/configs/evaluators/evaluator.toml --type evaluator
# Judgmentè¨å®ã®æ¤è¨¼
uv run python skills/mixseek-config-validate/scripts/validate-config.py \
$MIXSEEK_WORKSPACE/configs/judgment/judgment.toml --type judgment
æ¤è¨¼ãæåããããã¦ã¼ã¶ã¼ã«çµæãå ±åãã¾ãã失æããå ´åã¯ãã¨ã©ã¼å 容ã確èªãã¦è¨å®ãä¿®æ£ãã¦ãã ããã
æ¨æºã¡ããªã¯ã¹è©³ç´°
ClarityCoherenceï¼æç¢ºæ§ã»ä¸è²«æ§ï¼
åçã®èªã¿ãããã¨è«ççä¸è²«æ§ãè©ä¾¡ãã¾ãã
è©ä¾¡è¦³ç¹:
- æç« æ§é ã®æç¢ºã
- è«ççãªæµã
- å°éç¨èªã®é©åãªä½¿ç¨
- çµè«ã®æç¢ºã
æ¨å¥¨ç¨é:
- 説ææã®çæ
- ã¬ãã¼ã使
- æè²ã³ã³ãã³ã
Coverageï¼ã«ãã¬ãã¸ï¼
質åã«å¯¾ããåçã®ç¶²ç¾ æ§ãè©ä¾¡ãã¾ãã
è©ä¾¡è¦³ç¹:
- 質åã®å ¨å´é¢ã¸ã®å¯¾å¿
- é¢é£ãããã¯ã®å å«
- ä¾ç¤ºã®å å®åº¦
- è£è¶³æ å ±ã®æç¡
æ¨å¥¨ç¨é:
- ãªãµã¼ãã¿ã¹ã¯
- FAQ使
- æè¡ããã¥ã¡ã³ã
Relevanceï¼é¢é£æ§ï¼
åçã質åã«å¯¾ãã¦ã©ãã ãç確ããè©ä¾¡ãã¾ãã
è©ä¾¡è¦³ç¹:
- 質åã¸ã®ç´æ¥çãªåç
- ä¸è¦ãªæ å ±ã®æé¤
- ç¦ç¹ã®ç¶æ
- æèã¸ã®é©å
æ¨å¥¨ç¨é:
- Q&A
- ã«ã¹ã¿ãã¼ãµãã¼ã
- æ¤ç´¢çµæã®è©ä¾¡
LLMPlainï¼æ±ç¨LLMè©ä¾¡ï¼
system_instructionã§å®ç¾©ããã«ã¹ã¿ã è©ä¾¡åºæºã«åºã¥ãã¦LLMãè©ä¾¡ãã¾ãã
ç¹å¾´:
- äºåå®ç¾©ãããè©ä¾¡ãã¸ãã¯ãæããªã
system_instructionã§å®å ¨ã«ã«ã¹ã¿ãã¤ãºå¯è½- ç¹æ®ãªè©ä¾¡åºæºãå¿ è¦ãªå ´åã«ä½¿ç¨
æ¨å¥¨ç¨é:
- ãã¡ã¤ã³åºæã®è©ä¾¡ï¼æ³å¾ãå»çãªã©ï¼
- ããã¸ã§ã¯ãåºæã®åè³ªåºæº
- ä»ã®ã¡ããªã¯ã¹ã§ã«ãã¼ã§ããªã観ç¹
è¨å®ä¾:
[[metrics]]
name = "LLMPlain"
weight = 0.5
system_instruction = """
ã»ãã¥ãªãã£è¦³ç¹ããåçãè©ä¾¡ãã¦ãã ãã:
1. æ©å¯æ
å ±ã®æ¼æ´©ãªã¹ã¯
2. å®å
¨ãªã³ã¼ãã£ã³ã°å®è·µ
3. èå¼±æ§ã®æç¡
0-100ã®ã¹ã³ã¢ã§è©ä¾¡ãã¦ãã ããã
"""
ä¾
åçéã¿ä»ãè¨å®
User: è©ä¾¡è¨å®ã使ãã¦
Agent: è©ä¾¡è¨å®ãææ¡ãã¾ãã
ã¡ããªã¯ã¹ï¼åçéã¿ä»ãï¼:
- ClarityCoherence: 33.4%
- Coverage: 33.3%
- Relevance: 33.3%
ãã®æ§æã§ããããã§ããï¼
User: ã¯ã
Agent: è¨å®ãã¡ã¤ã«ãçæãã¾ãã:
â configs/evaluators/evaluator.toml
â configs/judgment/judgment.toml
ã«ã¹ã¿ã éã¿ä»ãè¨å®
User: æ£ç¢ºæ§ãéè¦ããè©ä¾¡è¨å®ãä½ã£ã¦
Agent: é¢é£æ§ï¼Relevanceï¼ãéè¦ããè¨å®ãææ¡ãã¾ãã
ã¡ããªã¯ã¹:
- Relevance: 50%ï¼éè¦ï¼
- ClarityCoherence: 30%
- Coverage: 20%
ãã®æ§æã§ããããã§ããï¼
User: ã¯ã
çæãããè¨å®ãã¡ã¤ã«ä¾
evaluator.tomlï¼ã«ã¹ã¿ã éã¿ä»ãï¼:
# MixSeek Evaluator Configuration
# Generated by mixseek-evaluator-config skill
default_model = "google-gla:gemini-2.5-pro"
temperature = 0.0
timeout_seconds = 300
max_retries = 3
[[metrics]]
name = "Relevance"
weight = 0.5
[[metrics]]
name = "ClarityCoherence"
weight = 0.3
[[metrics]]
name = "Coverage"
weight = 0.2
judgment.toml:
# MixSeek Judgment Configuration
# Generated by mixseek-evaluator-config skill
model = "google-gla:gemini-2.5-pro"
temperature = 0.0
timeout_seconds = 60
max_retries = 3
éã¿ä»ãã«ã¼ã«
éã¿ä»ãã«ã¯ä»¥ä¸ã®ã«ã¼ã«ãããã¾ã:
- å ¨ã¦æå® or å ¨ã¦çç¥: ä¸é¨ã®ã¡ããªã¯ã¹ã ãã«éã¿ãæå®ãããã¨ã¯ã§ãã¾ãã
- åè¨1.0: å ¨ã¦ã®éã¿ã®åè¨ã¯1.0ï¼Â±0.001ï¼ã§ããå¿ è¦ãããã¾ã
- çç¥æã¯åç: éã¿ãçç¥ããã¨èªåçã«åçé åããã¾ã
# æå¹: å
¨ã¦æå®
[[metrics]]
name = "ClarityCoherence"
weight = 0.5
[[metrics]]
name = "Coverage"
weight = 0.5
# æå¹: å
¨ã¦çç¥ï¼åçé
åï¼
[[metrics]]
name = "ClarityCoherence"
[[metrics]]
name = "Coverage"
# ç¡å¹: ä¸é¨ã®ã¿æå®
[[metrics]]
name = "ClarityCoherence"
weight = 0.5 # â
[[metrics]]
name = "Coverage"
# weightçç¥ â
ãã©ãã«ã·ã¥ã¼ãã£ã³ã°
éã¿åè¨ã¨ã©ã¼
Error: Weights must sum to 1.0
è§£æ±ºæ¹æ³:
- å ¨ã¦ã®éã¿ã®åè¨ã1.0ã«ãªããã調æ´
- ã¾ãã¯å ¨ã¦ã®éã¿ãçç¥ãã¦åçé å
ã¡ããªã¯ã¹åã¨ã©ã¼
Error: Unknown metric name
è§£æ±ºæ¹æ³:
- æå¹ãªã¡ããªã¯ã¹åã使ç¨:
ClarityCoherence,Coverage,LLMPlain,Relevance - 大æåå°æåã«æ³¨æ
å¤å®ãä¸å®å®
è§£æ±ºæ¹æ³:
judgment.tomlã®temperatureã0.0ã«è¨å®ï¼æ±ºå®è«çï¼seedãåºå®å¤ã«è¨å®
åç §
- TOMLã¹ãã¼ã詳細:
references/TOML-SCHEMA.md - æ¨æºã¡ããªã¯ã¹:
references/METRICS.md - ãªã¼ã±ã¹ãã¬ã¼ã¿ã¼è¨å®:
skills/mixseek-orchestrator-config/