xhs
npx skills add https://github.com/cryinglee/openclaw-skill-xhs --skill xhs
Agent 安装分布
Skill 文档
å°çº¢ä¹¦ Research ð
Research tool for Chinese user-generated content â travel, food, lifestyle, local discoveries.
When to Use
- Travel planning and itineraries
- Restaurant/cafe/bar recommendations
- Activity and weekend planning
- Product reviews and comparisons
- Local discovery and hidden gems
- Any question where Chinese perspectives help
Recommended Model
When spawning as a sub-agent: Sonnet 4.5 (model: "claude-sonnet-4-5-20250929")
- Fast enough for the slow XHS API calls
- Good at Chinese content understanding
- More cost-effective than Opus for research grunt work
- Opus overkill for search â synthesize workflow
Context Management (Always Use)
ALWAYS use dynamic context monitoring â even 5 posts with images can hit 75-300k tokens.
The Problem
- Each post with images = 15-60k tokens
- 200k context fills fast
- Context is append-only (can’t “forget” within session)
The Solution: Monitor + Checkpoint + Continue
1. After EACH post, do two things:
a) Write findings to disk immediately:
/research/{task-id}/findings/post-{n}.md
b) Check context usage:
session_status â look for "Context: XXXk/200k (YY%)"
2. When context hits 70%, STOP and checkpoint:
Write state file:
/research/{task-id}/state.json
{
"processed": 15,
"pendingUrls": ["url16", "url17", ...],
"summaries": ["Post 1: ç«å¡...", ...]
}
Return to caller:
{
"complete": false,
"processed": 15,
"remaining": 25,
"statePath": "/research/{task-id}/state.json",
"findingsDir": "/research/{task-id}/findings/"
}
3. Caller spawns fresh sub-agent to continue:
spawn_subagent(
task="Continue XHS research from /research/{task-id}/state.json",
model="claude-sonnet-4-5-20250929"
)
New sub-agent has fresh 200k context, reads state.json, continues from post 16.
State File Schema
{
"taskId": "kunming-food-2026-02-01",
"query": "ææç¾é£",
"searchesCompleted": ["ææç¾é£", "ææç¾é£æ¨è"], // Keywords already searched
"processedUrls": ["url1", "url2", ...], // Explicit URL tracking (prevents duplicates)
"pendingUrls": ["url3", "url4", ...], // Remaining URLs to process
"nextPostNumber": 16, // Next post-XXX.md number
"summaries": [ // 1-liner per post for final synthesis
"Post 1: ç«å¡é¤å
| ð¢ | Â¥80 | æ¬å°äººæ¨è",
"Post 2: éçèç«é
| ð¢ | Â¥120 | èåæ°é²"
],
"batchNumber": 1,
"contextCheckpoint": "70%"
}
Critical fields for handoff:
processedUrls: Prevents re-processing same post across sub-agentspendingUrls: Exact work remainingnextPostNumber: Ensures sequential file namingsearchesCompleted: Prevents duplicate searches
Workflow for Large Research
Caller should use longer timeout:
sessions_spawn(
task="...",
model="claude-sonnet-4-5-20250929",
runTimeoutSeconds=1800 // 30 minutes for research tasks
)
Default is 600s (10 min) â too short for XHS research with slow API calls.
Interleave search and processing (don’t collect all URLs first):
[XHS Sub-agent 1]
âââ Check for state.json (none = fresh start)
âââ Search keyword 1 â get 20 URLs
âââ Process 5-10 posts immediately (writing each to disk)
âââ Search keyword 2 â get more URLs (dedupe)
âââ Process more posts
âââ Context hits 70% â write state.json
âââ Return {complete: false, remaining: N}
This prevents timeout from losing all work â each post is saved as processed.
Full continuation pattern:
[Caller]
â spawn (runTimeoutSeconds=1800)
[XHS Sub-agent 1]
âââ Search + process interleaved
âââ Context hits 70% â write state.json
âââ Return {complete: false, remaining: 25}
[Caller sees incomplete]
â spawn continuation (runTimeoutSeconds=1800)
[XHS Sub-agent 2] â fresh 200k context!
âââ Read state.json (has processedUrls, pendingUrls)
âââ Continue processing + more searches if needed
âââ Context hits 70% â write state.json
âââ Return {complete: false, remaining: 10}
[Caller sees incomplete]
â spawn continuation
[XHS Sub-agent 3]
âââ Read state.json
âââ Process remaining posts
âââ All done â write synthesis.md
âââ Return {complete: true, synthesisPath: "..."}
Output Directory Structure
/research/{task-id}/
âââ state.json # Checkpoint for continuation
âââ findings/
â âââ post-001.md # Full analysis + image paths
â âââ post-002.md
â âââ ...
âââ images/
â âââ post-001/
â â âââ 1.jpg
â â âââ 2.jpg
â âââ ...
âââ summaries.md # All 1-liners (for quick scan)
âââ synthesis.md # Final output (when complete)
Key Rules (ALWAYS FOLLOW)
- Write after EVERY post â crash-safe, no work lost
- Check context after EVERY post â use
session_statustool - Stop at 70% â leave room for synthesis + buffer
- Return structured result â caller decides next step
- Read all images â they’re pre-compressed (600px, q85)
- Skip videos â already marked in fetch-post
â ï¸ This is not optional. Even small research can overflow context with image-heavy posts.
Scripts (Mechanical Tasks)
These scripts handle the repetitive CLI work:
| Script | Purpose |
|---|---|
bin/preflight |
Verify tool is working before research |
bin/search "keywords" [limit] [timeout] [sort] |
Search for posts (sort: general/newest/hot) |
bin/get-content "url" |
Get full note content (text only) |
bin/get-comments "url" |
Get comments on a note |
bin/get-images "url" [dir] |
Download images only |
bin/fetch-post "url" [cache] [retries] |
Fetch content + comments + images (with retries) |
All scripts are at /root/clawd/skills/xhs/bin/
Preflight (always run first)
/root/clawd/skills/xhs/bin/preflight
Checks: rednote-mcp installed, cookies valid, stealth patches, test search. Don’t proceed until preflight passes.
Search
/root/clawd/skills/xhs/bin/search "ææç¾é£æ¨è" [limit] [timeout] [sort]
Returns JSON with post results.
Parameters:
| Param | Default | Description |
|---|---|---|
| keywords | (required) | Search terms in Chinese |
| limit | 10 | Max results (scroll pagination when >20) |
| timeout | 180 | Seconds before giving up |
| sort | general | Sort order (see below) |
Sort options:
| Value | XHS Label | When to use |
|---|---|---|
general |
综å | Default â XHS algorithm balances relevance + engagement. Best for most research. |
newest |
ææ° | èæ çæ§, breaking news, recent experiences, time-sensitive topics |
hot |
æç | Finding viral/popular posts, trending content |
Examples:
# Default sort (recommended for most research)
bin/search "ææç¾é£æ¨è" 20
# Recent posts first (èæ
, current events)
bin/search "æåç è¯ä»·" 20 180 newest
# Most popular posts
bin/search "ç½çº¢æå¡å°" 15 180 hot
Scroll pagination enabled (patched): When limit > 20, the tool scrolls to load more results via XHS infinite scroll. Actual results depend on available content.
For maximum coverage, combine:
- Higher limits (e.g.,
limit=50) to scroll for more - Multiple keyword variations for different result sets:
- é¦èæå²©, é¦èæå²©é¦, é¦èæå²©ä½éª, é¦èæå²©è¯ä»·
- ææç¾é£, ææç¾é£æ¨è, ææå¿ å, æææ¬å°äººæ¨è
Results vary by query â popular topics may return 30-50+, niche topics fewer.
Choosing sort order:
- Most research â
general(default). Let XHS’s algorithm surface the best content. - èæ
çæ§ / sentiment tracking â
newest. You want recent opinions, not old viral posts. - Trend discovery â
hot. See what’s currently popular.
Get Content
/root/clawd/skills/xhs/bin/get-content "FULL_URL_WITH_XSEC_TOKEN"
â ï¸ Must use full URL with xsec_token from search results.
Get Comments
/root/clawd/skills/xhs/bin/get-comments "FULL_URL_WITH_XSEC_TOKEN"
Get Images
Download all images from a post to local files:
/root/clawd/skills/xhs/bin/get-images "FULL_URL" /tmp/my-images
Fetch Post (Deep Dive with Images)
Fetch content, comments, and images in one call â with built-in retries:
/root/clawd/skills/xhs/bin/fetch-post "FULL_URL" /path/to/cache [max_retries]
Features:
- Retries on timeout (60s â 90s â 120s)
- Clear error reporting in JSON output
- Images cached locally, bypassing CDN protection
Returns JSON:
{
"success": true,
"postId": "abc123",
"content": {
"title": "...",
"author": "...",
"desc": "...",
"likes": "983",
"tags": [...],
"postDate": "2025-09-04" // â Added via patch!
},
"comments": [{ "author": "...", "content": "...", "likes": "3" }, ...],
"imagePaths": ["/cache/images/abc123/1.jpg", ...],
"errors": []
}
Date filtering: Use postDate to filter out old posts. Skip posts older than your threshold (e.g., 6-12 months for restaurants).
Workflow:
1. fetch-post â JSON + cached images
2. Read each imagePath directly (Claude sees images natively)
3. Combine text + comments + what you see into findings
Viewing images:
Read("/path/to/1.jpg") # Claude sees it directly - no special tool needed
Look for: visible text (addresses, prices, hours), atmosphere, food presentation, crowd levels.
Research Methodology (Judgment Tasks)
This is where you think. Scripts do the fetching; you do the analyzing.
Depth Levels
| Depth | Posts | When to Use |
|---|---|---|
| Minimum | 5+ | Quick checks, simple queries |
| Standard | 8-10 | Default for most research |
| Deep | 15+ | Complex topics, trip planning |
Minimum is 5 â unless fewer exist. Note limited coverage if <5 results.
Research Workflow
Step 0: Preflight
Run bin/preflight. Don’t proceed until it passes.
Step 1: Plan Your Searches
Think: “What would a Chinese user search on å°çº¢ä¹¦?”
- Include location when relevant
- Add qualifiers: æ¨è, æ»ç¥, æµè¯, æ¢åº, æå¡, é¿å
- Consider synonyms and variations
- Plan 2-3 different search angles
Date filtering:
Posts include postDate field (e.g., “2025-09-04”). The calling agent specifies the date filter based on research type:
| Research Type | Suggested Filter | Why |
|---|---|---|
| èæ çæ§ (sentiment) | 1-4 weeks | Only current discourse matters |
| Breaking news/events | 1-7 days | Time-critical |
| Travel planning | 6-12 months | Recent but reasonable window |
| Product reviews | 1-2 years | Longer product cycles |
| Trend analysis | Custom range | Compare specific periods |
| Historical/general | No limit | Want the full archive |
Caller should specify in task description, e.g.:
- “Only posts from last 30 days” (èæ )
- “Posts from 2025 or later” (travel)
- “No date filter” (general research)
If no filter specified: Default to 12 months (safe middle ground).
Fallback when postDate is null: Use keyword hints: 2025, æè¿, ææ°
Language strategy:
| Location | Language | Example |
|---|---|---|
| China | Chinese | æææå²© |
| English-named venues | Both | Rock Tenet ææ |
| International | Chinese | 巴黿
游 |
Step 2: Search & Scan
Run your searches. Results are already ranked by XHS’s algorithm (relevance + engagement).
Use judgment based on preview â like a human deciding what to click:
Think: “Given my research goal, would this post likely contain useful information?”
| Research Type | What to prioritize |
|---|---|
| èæ çæ§ (sentiment) | Any opinion/experience, even low engagement â complaints matter! |
| Travel planning | High engagement + detailed experiences |
| Product reviews | Mix of positive AND negative reviews |
| Trend analysis | Variety of perspectives |
| Preview Signal | Action |
|---|---|
| Relevant content in preview | â Fetch |
| Matches research goal | â Fetch |
| Low engagement but relevant opinion | â Fetch (esp. for èæ ) |
| High engagement but off-topic | â Skip |
| Official announcements only | â ï¸ Context-dependent |
| 广å/åä½ markers | â ï¸ Note as sponsored if fetching |
| Clearly off-topic | â Skip |
| Duplicate content | â Skip |
Key insight: For èæ çæ§, a 3-like complaint post may be more valuable than a 500-like promotional post. Engagement â relevance for all research types.
Step 3: Deep Dive Each Post
For each selected post, use fetch-post to get everything:
bin/fetch-post "url_from_search" {{RESEARCH_DIR}}/xhs
Returns JSON with content, comments, and cached images. Has built-in retries. Then:
A. Review content
- Extract key facts from title/description
- Note author’s perspective/bias
- Check tags for categorization
B. View images (critical!)
For each imagePath in the result, just read it:
Read("/path/to/1.jpg") # You see it directly
- Look for text overlays: addresses, prices, hours
- Note visual details: ambiance, crowd levels, food presentation
â ï¸ Don’t describe images in isolation. Synthesize what you see with the post content and comments to form a holistic view. An image of a crowded restaurant + author saying “卿«æé1å°æ¶” + comments confirming “äººè¶ å¤” = that’s your finding about crowds.
C. Review comments (gold for updates)
- “å·²ç»å ³é¨äº” = already closed
- Real experiences vs sponsored hype
- Tips not in main post
D. Return picked images Include paths to the best/most informative images in your findings. The calling agent decides whether and how to use them (embed in reports, reference, etc.). You’re curating â pick images that show something useful (venue exterior, menu with prices, actual food, atmosphere) not just decorative shots.
Step 4: Synthesize
- What do multiple sources agree on?
- Any contradictions?
- What’s the overall consensus?
- What would you actually recommend?
Step 5: Output
Facts + Flavor â structured findings that preserve the XHS voice.
## XHS Research: [Topic]
### Search Summary
| Search | Results | Notes |
|--------|---------|-------|
| æææå²© | 10 | Good coverage |
### Findings
#### [Venue Name] (䏿å)
- **Type:** Restaurant / Activity / Attraction
- **Address:** [from post or image]
- **Price:** ¥XX/person
- **Hours:** [if found]
- **The vibe:** [atmosphere, energy â preserved voice]
- **Why people like it:** [opinions, impressions]
- **Watch out for:** [warnings from comments]
- **Source:** [full URL]
- **Engagement:** X likes
- **Images:** [paths for calling agent to use]
- `/path/to/1.jpg` â exterior/entrance
- `/path/to/3.jpg` â menu with prices
> "å¼ç¨åæ..." â @username
### Overall Impressions
- Consensus across posts
- Patterns in preferences
- Things only locals know
- Disagreements worth noting
The XHS value is the human perspective. A recommendation that says “ç¯å¢ä¸è¬ä½æ¯å³éç»äº” tells you more than “Rating: 4.2/5”.
Think: “What would a friend who just spent an hour on XHS tell me?”
Quality Signals
Trustworthy:
- 100+ likes with real comments
- Detailed personal experience
- Multiple photos from actual visit
- Specific details (prices, hours)
- Recent posts (look for date mentions in content: “ä¸å¨”, “æ¨å¤©”, “2025å¹´Xæ”)
- Year in title (e.g., “2025䏿µ·åå¡å¿ 忦”)
Checking recency:
- Look for dates in post text/title
- Check if prices seem current
- Comments mentioning “è¿å¨å” or “ç°å¨è¿æå” = might be outdated
- Comments with recent dates confirm post is still relevant
Suspicious:
- 广å/åä½/èµå© markers
- Overly positive, no specifics
- Stock photos only
- No comments or generic ones
- Very old posts
Timing & Efficiency
XHS is SLOW â Plan Accordingly
The rednote-mcp CLI is slow (30-90s per search). Don’t rapid-fire poll.
When running searches via exec:
# GOOD: Give it time to complete
exec(command, yieldMs: 60000) # Wait 60s before checking
process(poll) # Then poll every 30s if still running
DON’T:
- Poll every 2-3 seconds (wastes tokens, no benefit)
- Start multiple searches simultaneously (overloads)
- Wait indefinitely without writing partial results
Write Incrementally
Don’t wait until you’ve analyzed everything to start writing. After each batch of 3-5 posts:
- Append findings to your output file
- This protects against timeout/termination losing all work
## Findings (in progress)
### Batch 1: ç¾é£æç´¢ (3 posts analyzed)
[findings...]
### Batch 2: æ»ç¥æç´¢ (analyzing...)
Time Budget Awareness
If you’ve been running 15+ minutes:
- Prioritize writing what you have
- Note incomplete searches in output
- Better to deliver 80% findings than lose 100% to termination
Retry Pattern
rednote-mcp is slow. If a command times out:
Attempt 1: default timeout
Attempt 2: +60s
Attempt 3: +120s
If all fail, report the failure. Do NOT fall back to web_search â defeats the purpose.
Error Handling
| Error | Cause | Fix |
|---|---|---|
| Timeout | Network/XHS slow | Retry with longer timeout |
| Login/cookie error | Session expired | xvfb-run -a rednote-mcp init |
| 404 / xsec_token | Missing token | Use full URL from search |
| Empty results | No posts | Try different keywords |
Setup & Maintenance
First-Time Setup
npm install -g rednote-mcp
npx playwright install
/root/clawd/skills/xhs/patches/apply-all.sh
xvfb-run -a rednote-mcp init
Re-login (when cookies expire)
xvfb-run -a rednote-mcp init
After rednote-mcp updates
/root/clawd/skills/xhs/patches/apply-all.sh
Role Clarification
This skill = Research tool that outputs structured findings Calling agent = Synthesizes XHS + other sources into final reports, decides which images to embed
You return:
- Synthesized findings (text + images + comments â holistic view)
- Curated image paths (calling agent decides how to use them)
- Preserved human voice (opinions, vibes, tips)
You don’t:
- Describe images in isolation (“I see a restaurant…”)
- Generate final reports (that’s the caller’s job)
- Decide image layout/placement
XHS is like having a Chinese-speaking friend spend an hour researching for you. They’d give you facts, but also opinions, vibes, and insider tips. That’s what you’re capturing.
Remember: Research like a curious human. Explore, cross-reference, look at pictures, read comments. The “è¿å®¶ççç»äº” matters as much as the address.