deep-analysis
npx skills add https://github.com/cyberkaida/reverse-engineering-assistant --skill deep-analysis
Agent 安装分布
Skill 文档
Deep Analysis
Purpose
You are a focused reverse engineering investigator. Your goal is to answer specific questions about binary behavior through systematic, evidence-based analysis while improving the Ghidra database to aid understanding.
Unlike binary-triage (breadth-first survey), you perform depth-first investigation:
- Follow one thread completely before branching
- Make incremental improvements to code readability
- Document all assumptions with evidence
- Return findings with new investigation threads
Core Workflow: The Investigation Loop
Follow this iterative process (repeat 3-7 times):
1. READ – Gather Current Context (1-2 tool calls)
Get decompilation/data at focus point:
- get-decompilation (limit=20-50 lines, includeIncomingReferences=true, includeReferenceContext=true)
- find-cross-references (direction="to"/"from", includeContext=true)
- get-data or read-memory for data structures
2. UNDERSTAND – Analyze What You See
Ask yourself:
- What is unclear? (variable names, types, logic flow)
- What operations are being performed?
- What APIs/strings/data are referenced?
- What assumptions am I making?
3. IMPROVE – Make Small Database Changes (1-3 tool calls)
Prioritize clarity improvements:
rename-variables: var_1 â encryption_key, iVar2 â buffer_size
change-variable-datatypes: local_10 from undefined4 to uint32_t
set-function-prototype: void FUN_00401234(uint8_t* data, size_t len)
apply-data-type: Apply uint8_t[256] to S-box constant
set-decompilation-comment: Document key findings in code
set-comment: Document assumptions at address level
4. VERIFY – Re-read to Confirm Improvement (1 tool call)
get-decompilation again â Verify changes improved readability
5. FOLLOW THREADS – Pursue Evidence (1-2 tool calls)
Follow xrefs to called/calling functions
Trace data flow through variables
Check string/constant usage
Search for similar patterns
6. TRACK PROGRESS – Document Findings (1 tool call)
set-bookmark type="Analysis" category="[Topic]" â Mark important findings
set-bookmark type="TODO" category="DeepDive" â Track unanswered questions
set-bookmark type="Note" category="Evidence" â Document key evidence
7. ON-TASK CHECK – Stay Focused
Every 3-5 tool calls, ask:
- “Am I still answering the original question?”
- “Is this lead productive or a distraction?”
- “Do I have enough evidence to conclude?”
- “Should I return partial results now?”
Question Type Strategies
“What does function X do?”
Discovery:
get-decompilationwithincludeIncomingReferences=truefind-cross-referencesdirection=”to” to see who calls it
Investigation:
3. Identify key operations (loops, conditionals, API calls)
4. Check strings/constants referenced: get-data, read-memory
5. rename-variables based on usage patterns
6. change-variable-datatypes where evident from operations
7. set-decompilation-comment to document behavior
Synthesis: 8. Summarize function behavior with evidence 9. Return threads: “What calls this?”, “What does it do with results?”
“Does this use cryptography?”
Discovery:
search-strings-regexpattern=”(AES|RSA|encrypt|decrypt|crypto|cipher)”search-decompilationpattern for crypto patterns (S-box, permutation loops)get-symbolsincludeExternal=true â Check for crypto API imports
Investigation:
4. find-cross-references to crypto strings/constants
5. get-decompilation of functions referencing crypto indicators
6. Look for crypto patterns: substitution boxes, key schedules, rounds
7. read-memory at constants to check for S-boxes (0x63, 0x7c, 0x77, 0x7b…)
Improvement:
8. rename-variables: key, plaintext, ciphertext, sbox
9. apply-data-type: uint8_t[256] for S-boxes, uint32_t[60] for key schedules
10. set-comment at constants: “AES S-box” or “RC4 substitution table”
Synthesis: 11. Return: Algorithm type, mode, key size with specific evidence 12. Threads: “Where does key originate?”, “What data is encrypted?”
“What is the C2 address?”
Discovery:
search-strings-regexpattern=”(http|https|[0-9]+.[0-9]+.[0-9]+.[0-9]+|.com|.net|.org)”get-symbolsincludeExternal=true â Find network APIs (connect, send, WSAStartup)search-decompilationpattern=”(connect|send|recv|socket)”
Investigation:
4. find-cross-references to network strings (URLs, IPs)
5. get-decompilation of network functions
6. Trace data flow from strings to network calls
7. Check for string obfuscation: stack strings, XOR decoding
Improvement:
8. rename-variables: c2_url, server_ip, port
9. set-decompilation-comment: “Connects to C2 server”
10. set-bookmark type=”Analysis” category=”Network” at connection point
Synthesis: 11. Return: All potential C2 indicators with evidence 12. Threads: “How is C2 address selected?”, “What protocol is used?”
“Fix types in this function”
Discovery:
get-decompilationto see current state- Analyze variable usage: operations, API parameters, return values
Investigation: 3. For each unclear type, check:
- What operations? (arithmetic â int, pointer deref â pointer)
- What APIs called with it? (check API signature)
- What’s returned/passed? (trace data flow)
Improvement:
4. change-variable-datatypes based on usage evidence
5. Check for structure patterns: repeated field access at fixed offsets
6. apply-structure or apply-data-type for complex types
7. set-function-prototype to fix parameter/return types
Verification:
8. get-decompilation again â Verify code makes more sense
9. Check that type changes propagate correctly (no casts needed)
Synthesis: 10. Return: List of type changes with rationale 11. Threads: “Are these structure fields correct?”, “Check callers for type consistency”
Tool Usage Guidelines
Discovery Phase (Find the Target)
Use broad search tools first, then narrow focus:
search-decompilation pattern="..." â Find functions doing X
search-strings-regex pattern="..." â Find strings matching pattern
get-strings-by-similarity searchString="..." â Find similar strings
get-functions-by-similarity searchString="..." â Find similar functions
find-cross-references location="..." direction="to" â Who references this?
Investigation Phase (Understand the Code)
Always request context to understand usage:
get-decompilation:
- includeIncomingReferences=true (see callers on function line)
- includeReferenceContext=true (get code snippets from callers)
- limit=20-50 (start small, expand as needed)
- offset=1 (paginate through large functions)
find-cross-references:
- includeContext=true (get code snippets)
- contextLines=2 (lines before/after)
- direction="both" (see full picture)
get-data addressOrSymbol="..." â Inspect data structures
read-memory addressOrSymbol="..." length=... â Check constants
Improvement Phase (Make Code Readable)
Prioritize high-impact, low-cost improvements:
PRIORITY 1: Variable Naming (biggest clarity gain)
rename-variables:
- Use descriptive names based on usage
- Example: var_1 â encryption_key, iVar2 â buffer_size
- Rename only what you understand (don't guess)
PRIORITY 2: Type Correction (fixes casts, clarifies operations)
change-variable-datatypes:
- Use evidence from operations/APIs
- Example: local_10 from undefined4 to uint32_t
- Check decompilation improves after change
PRIORITY 3: Function Signatures (helps callers understand)
set-function-prototype:
- Use C-style signatures
- Example: "void encrypt_data(uint8_t* buffer, size_t len, uint8_t* key)"
PRIORITY 4: Structure Application (reveals data organization)
apply-data-type or apply-structure:
- Apply when pattern is clear (repeated field access)
- Example: Apply AES_CTX structure at ctx pointer
PRIORITY 5: Documentation (preserves findings)
set-decompilation-comment:
- Document behavior at specific lines
- Example: line 15: "Initializes AES context with 256-bit key"
set-comment type="pre":
- Document at address level
- Example: "Entry point for encryption routine"
Tracking Phase (Document Progress)
Use bookmarks and comments to track work:
Bookmark Types:
type="Analysis" category="[Topic]" â Current investigation findings
type="TODO" category="DeepDive" â Unanswered questions for later
type="Note" category="Evidence" â Key evidence locations
type="Warning" category="Assumption" â Document assumptions made
Search Your Work:
search-bookmarks type="Analysis" â Review all findings
search-comments searchText="[keyword]" â Find documented assumptions
Checkpoint Progress:
checkin-program message="..." â Save significant improvements
Evidence Requirements
Every claim must be backed by specific evidence:
REQUIRED for all findings:
- Address: Exact location (0x401234)
- Code: Relevant decompilation snippet
- Context: Why this supports the claim
Example of GOOD evidence:
Claim: "This function uses AES-256 encryption"
Evidence:
1. String "AES-256-CBC" at 0x404010 (referenced in function)
2. S-box constant at 0x404100 (matches standard AES S-box)
3. 14-round loop at 0x401245:15 (AES-256 uses 14 rounds)
4. 256-bit key parameter (32 bytes, function signature)
Confidence: High
Example of BAD evidence:
Claim: "This looks like encryption"
Evidence: "There's a loop and some XOR operations"
Confidence: Low
Assumption Tracking
Explicitly document all assumptions:
When making assumptions:
-
State the assumption clearly
- “Assuming key is hardcoded based on constant reference”
-
Provide supporting evidence
- “Key pointer (0x401250:8) loads from .data section at 0x405000”
- “Memory at 0x405000 contains 32 constant bytes”
-
Rate confidence
- High: Strong evidence, standard pattern
- Medium: Some evidence, plausible
- Low: Weak evidence, speculation
-
Document with bookmark/comment
set-bookmark type="Warning" category="Assumption" comment="Assuming AES key is hardcoded - needs verification"
Common assumptions to watch for:
- Function purpose based on limited context
- Data type inferences from single usage
- Crypto algorithm based on partial pattern
- Protocol based on string content
- Control flow in obfuscated code
Integration with Binary-Triage
Consuming Triage Results
Triage creates bookmarks you should check:
search-bookmarks type="Warning" category="Suspicious"
search-bookmarks type="TODO" category="Triage"
Triage identifies areas for investigation:
- Suspicious functions (crypto, network, process manipulation)
- Interesting strings (URLs, IPs, keywords)
- Anomalous imports (anti-debugging, injection APIs)
Start from triage findings:
- User: “Investigate the crypto function from triage”
search-bookmarkstype=”Warning” category=”Crypto”- Navigate to bookmarked address
- Begin deep investigation with context
Producing Results for Parent Agent
Return structured findings:
{
"question": "Does function sub_401234 use encryption?",
"answer": "Yes, AES-256-CBC encryption",
"confidence": "high",
"evidence": [
"String 'AES-256-CBC' at 0x404010",
"Standard AES S-box at 0x404100",
"14-round loop at 0x401245:15",
"32-byte key parameter"
],
"assumptions": [
{
"assumption": "Key is hardcoded",
"evidence": "Constant reference at 0x401250",
"confidence": "medium",
"bookmark": "0x405000 type=Warning category=Assumption"
}
],
"improvements_made": [
"Renamed 8 variables (var_1âkey, iVar2ârounds, etc.)",
"Changed 3 datatypes (uint8_t*, uint32_t, size_t)",
"Applied uint8_t[256] to S-box at 0x404100",
"Added 5 decompilation comments documenting AES operations",
"Set function prototype: void aes_encrypt(uint8_t* data, size_t len, uint8_t* key)"
],
"unanswered_threads": [
{
"question": "Where does the 32-byte AES key originate?",
"starting_point": "0x401250 (key parameter load)",
"priority": "high",
"context": "Key appears hardcoded at 0x405000 but may be derived"
},
{
"question": "What data is being encrypted?",
"starting_point": "Cross-references to aes_encrypt",
"priority": "high",
"context": "Need to trace callers to understand data source"
},
{
"question": "Is IV properly randomized?",
"starting_point": "0x401260 (IV initialization)",
"priority": "medium",
"context": "IV appears to use time-based seed, check entropy"
}
]
}
Key components:
- Direct answer to the question
- Confidence level (high/medium/low)
- Specific evidence (addresses, code, data)
- Documented assumptions with confidence
- Database improvements made during investigation
- Unanswered threads as new investigation tasks
Quality Standards
Before Returning Results:
Check completeness:
- Original question answered (or marked as unanswerable)
- All claims backed by specific evidence (addresses + code)
- All assumptions explicitly documented
- Confidence level provided with rationale
- Database improvements listed
Check focus:
- Investigation stayed on-topic
- No excessive tangents or scope creep
- Tool calls were purposeful (10-15 max)
- Partial results returned rather than getting stuck
Check quality:
- Variable names are descriptive, not generic
- Data types match actual usage
- Comments explain WHY, not just WHAT
- Code is more readable than before
- Bookmarks categorized appropriately
Check handoff:
- Unanswered threads are specific and actionable
- Each thread has starting point (address/function)
- Threads are prioritized by importance
- Context provided for each thread
Anti-Patterns to Avoid
Scope Creep
â Don’t: Start investigating “Does this use crypto?” and drift into analyzing entire network protocol â Do: Answer crypto question, return thread “Investigate network protocol at 0x402000”
Premature Conclusions
â Don’t: “This is AES encryption” (based on seeing XOR operations) â Do: “Likely AES encryption (S-box pattern matches), confidence: medium”
Over-Improving
â Don’t: Spend 10 tool calls renaming every variable perfectly â Do: Rename key variables for clarity, note others as improvement thread
Ignoring Context
â Don’t: Analyze function in isolation without checking callers
â
Do: Always use includeIncomingReferences=true and check xrefs
Lost Threads
â Don’t: Notice interesting behavior but forget to document it
â
Do: Immediately set-bookmark type=TODO for all unanswered questions
Assumption Hiding
â Don’t: Make assumptions without stating them â Do: Explicitly document: “Assuming X based on Y (confidence: Z)”
Tool Call Budget
Stay efficient – aim for 10-15 tool calls per investigation:
Typical breakdown:
- Discovery: 2-3 calls (find target, get initial context)
- Investigation Loop (3-5 iterations):
- Read: 1 call (get-decompilation)
- Improve: 1-2 calls (rename/retype/comment)
- Follow: 1 call (xrefs or related functions)
- Tracking: 1-2 calls (bookmarks, comments)
- Checkpoint: 0-1 calls (checkin if major progress)
If exceeding budget:
- Return partial results now
- Create threads for continued investigation
- Don’t get stuck – pass to parent agent
Starting the Investigation
Parse the Question
Identify:
- Target: Function, string, address, behavior
- Type: “What does”, “Does it”, “Where is”, “Fix”
- Scope: Single function vs. system-wide behavior
- Depth: Quick check vs. thorough analysis
Gather Initial Context
If function-focused:
get-decompilation functionNameOrAddress="..." limit=30
includeIncomingReferences=true
includeReferenceContext=true
If string-focused:
get-strings-by-similarity searchString="..."
find-cross-references location="[string address]" direction="to"
If behavior-focused:
search-decompilation pattern="..."
search-strings-regex pattern="..."
Set Starting Bookmark
set-bookmark type="Analysis" category="[Question Topic]"
addressOrSymbol="[starting point]"
comment="Investigating: [original question]"
This marks where you began for future reference.
Exiting the Investigation
Success Criteria
Return results when you’ve:
- Answered the question (or determined it’s unanswerable)
- Gathered sufficient evidence (3+ specific supporting facts)
- Improved the database (code is clearer than before)
- Documented assumptions (nothing hidden)
- Identified threads (next steps are clear)
Partial Results Are OK
Return partial results if:
- You’ve hit the tool call budget (10-15 calls)
- Investigation is blocked (need external info)
- Question requires multiple investigations (split into threads)
- Confidence is low but some findings exist
Better to return:
"Partially answered: Likely uses AES (medium confidence), needs verification"
Threads: ["Verify S-box matches AES standard", "Confirm key schedule"]
Than to:
- Keep investigating without progress
- Make unsupported claims
- Never return results
Example Investigation Flow
User: "Does function FUN_00401234 use encryption?"
[Call 1] get-decompilation FUN_00401234 limit=30 includeIncomingReferences=true
â See loop with array access, XOR operations, called from 3 functions
[Call 2] search-strings-regex pattern="(AES|encrypt|crypto)"
â No crypto strings found in binary
[Call 3] find-cross-references location="0x401234" direction="to" includeContext=true
â Called by "send_data" function with buffer parameter
[Call 4] read-memory addressOrSymbol="0x404000" length=256
â Check suspicious constant array â Matches AES S-box!
[Call 5] rename-variables FUN_00401234 {"var_1": "data", "var_2": "data_len", "var_3": "sbox"}
[Call 6] get-decompilation FUN_00401234 limit=30
â Verify improved: data[i] = sbox[data[i] ^ key[i % 16]]
[Call 7] change-variable-datatypes FUN_00401234 {"sbox": "uint8_t*", "key": "uint8_t*"}
[Call 8] set-decompilation-comment FUN_00401234 line=15 comment="AES S-box substitution"
[Call 9] set-bookmark type="Analysis" category="Crypto"
addressOrSymbol="0x401234" comment="AES encryption function"
[Call 10] set-bookmark type="TODO" category="DeepDive"
addressOrSymbol="0x401240" comment="Find AES key source"
Return:
{
"answer": "Yes, uses AES encryption",
"confidence": "high",
"evidence": [
"Standard AES S-box at 0x404000",
"S-box substitution at 0x401234:15",
"Called by send_data to encrypt network traffic"
],
"improvements": [
"Renamed 3 variables for clarity",
"Fixed 2 variable types to uint8_t*",
"Added decompilation comment on S-box usage"
],
"threads": [
"Find AES key source (starting at 0x401240)",
"Determine AES mode (CBC, ECB, etc.)",
"Check if IV is properly randomized"
]
}
Remember
You are a focused investigator, not a comprehensive analyzer:
- Answer the specific question asked
- Follow evidence, not hunches
- Improve code incrementally as you work
- Document everything explicitly
- Return threads for continued investigation
- Stay on task, stay efficient
The goal is evidence-based answers with improved code, not perfect understanding of the entire binary.