xhs

📁 cryinglee/openclaw-skill-xhs 📅 4 days ago

总安装量

周安装量

#51182

全站排名

安装命令

npx skills add https://github.com/cryinglee/openclaw-skill-xhs --skill xhs

Agent 安装分布

replit 1

openclaw 1

Skill 文档

å°çº¢ä¹¦ Research ð

Research tool for Chinese user-generated content â travel, food, lifestyle, local discoveries.

When to Use

Travel planning and itineraries
Restaurant/cafe/bar recommendations
Activity and weekend planning
Product reviews and comparisons
Local discovery and hidden gems
Any question where Chinese perspectives help

Recommended Model

When spawning as a sub-agent: Sonnet 4.5 (model: "claude-sonnet-4-5-20250929")

Fast enough for the slow XHS API calls
Good at Chinese content understanding
More cost-effective than Opus for research grunt work
Opus overkill for search â synthesize workflow

Context Management (Always Use)

ALWAYS use dynamic context monitoring â even 5 posts with images can hit 75-300k tokens.

The Problem

Each post with images = 15-60k tokens
200k context fills fast
Context is append-only (can’t “forget” within session)

The Solution: Monitor + Checkpoint + Continue

1. After EACH post, do two things:

a) Write findings to disk immediately:
   /research/{task-id}/findings/post-{n}.md

b) Check context usage:
   session_status â look for "Context: XXXk/200k (YY%)"

2. When context hits 70%, STOP and checkpoint:

Write state file:
/research/{task-id}/state.json
{
  "processed": 15,
  "pendingUrls": ["url16", "url17", ...],
  "summaries": ["Post 1: ç«å¡...", ...]
}

Return to caller:
{
  "complete": false,
  "processed": 15,
  "remaining": 25,
  "statePath": "/research/{task-id}/state.json",
  "findingsDir": "/research/{task-id}/findings/"
}

3. Caller spawns fresh sub-agent to continue:

spawn_subagent(
  task="Continue XHS research from /research/{task-id}/state.json",
  model="claude-sonnet-4-5-20250929"
)

New sub-agent has fresh 200k context, reads state.json, continues from post 16.

State File Schema

{
  "taskId": "kunming-food-2026-02-01",
  "query": "ææç¾é£",
  "searchesCompleted": ["ææç¾é£", "ææç¾é£æ¨è"],  // Keywords already searched
  "processedUrls": ["url1", "url2", ...],             // Explicit URL tracking (prevents duplicates)
  "pendingUrls": ["url3", "url4", ...],               // Remaining URLs to process
  "nextPostNumber": 16,                                // Next post-XXX.md number
  "summaries": [                                       // 1-liner per post for final synthesis
    "Post 1: ç«å¡é¤å | ð¢ | Â¥80 | æ¬å°äººæ¨è",
    "Post 2: éçèç«é | ð¢ | Â¥120 | èåæ°é²"
  ],
  "batchNumber": 1,
  "contextCheckpoint": "70%"
}

Critical fields for handoff:

processedUrls: Prevents re-processing same post across sub-agents
pendingUrls: Exact work remaining
nextPostNumber: Ensures sequential file naming
searchesCompleted: Prevents duplicate searches

Workflow for Large Research

Caller should use longer timeout:

sessions_spawn(
  task="...",
  model="claude-sonnet-4-5-20250929",
  runTimeoutSeconds=1800  // 30 minutes for research tasks
)

Default is 600s (10 min) â too short for XHS research with slow API calls.

Interleave search and processing (don’t collect all URLs first):

[XHS Sub-agent 1]
    âââ Check for state.json (none = fresh start)
    âââ Search keyword 1 â get 20 URLs
    âââ Process 5-10 posts immediately (writing each to disk)
    âââ Search keyword 2 â get more URLs (dedupe)
    âââ Process more posts
    âââ Context hits 70% â write state.json
    âââ Return {complete: false, remaining: N}

This prevents timeout from losing all work â each post is saved as processed.

Full continuation pattern:

[Caller]
    â spawn (runTimeoutSeconds=1800)
[XHS Sub-agent 1]
    âââ Search + process interleaved
    âââ Context hits 70% â write state.json
    âââ Return {complete: false, remaining: 25}
    
[Caller sees incomplete]
    â spawn continuation (runTimeoutSeconds=1800)
[XHS Sub-agent 2]  â fresh 200k context!
    âââ Read state.json (has processedUrls, pendingUrls)
    âââ Continue processing + more searches if needed
    âââ Context hits 70% â write state.json
    âââ Return {complete: false, remaining: 10}
    
[Caller sees incomplete]
    â spawn continuation
[XHS Sub-agent 3]
    âââ Read state.json
    âââ Process remaining posts
    âââ All done â write synthesis.md
    âââ Return {complete: true, synthesisPath: "..."}

Output Directory Structure

/research/{task-id}/
âââ state.json              # Checkpoint for continuation
âââ findings/
â   âââ post-001.md         # Full analysis + image paths
â   âââ post-002.md
â   âââ ...
âââ images/
â   âââ post-001/
â   â   âââ 1.jpg
â   â   âââ 2.jpg
â   âââ ...
âââ summaries.md            # All 1-liners (for quick scan)
âââ synthesis.md            # Final output (when complete)

Key Rules (ALWAYS FOLLOW)

Write after EVERY post â crash-safe, no work lost
Check context after EVERY post â use session_status tool
Stop at 70% â leave room for synthesis + buffer
Return structured result â caller decides next step
Read all images â they’re pre-compressed (600px, q85)
Skip videos â already marked in fetch-post

â ï¸ This is not optional. Even small research can overflow context with image-heavy posts.

Scripts (Mechanical Tasks)

These scripts handle the repetitive CLI work:

Script	Purpose
`bin/preflight`	Verify tool is working before research
`bin/search "keywords" [limit] [timeout] [sort]`	Search for posts (sort: general/newest/hot)
`bin/get-content "url"`	Get full note content (text only)
`bin/get-comments "url"`	Get comments on a note
`bin/get-images "url" [dir]`	Download images only
`bin/fetch-post "url" [cache] [retries]`	Fetch content + comments + images (with retries)

All scripts are at /root/clawd/skills/xhs/bin/

Preflight (always run first)

/root/clawd/skills/xhs/bin/preflight

Checks: rednote-mcp installed, cookies valid, stealth patches, test search. Don’t proceed until preflight passes.

Search

/root/clawd/skills/xhs/bin/search "ææç¾é£æ¨è" [limit] [timeout] [sort]

Returns JSON with post results.

Parameters:

Param	Default	Description
keywords	(required)	Search terms in Chinese
limit	10	Max results (scroll pagination when >20)
timeout	180	Seconds before giving up
sort	general	Sort order (see below)

Sort options:

Value	XHS Label	When to use
`general`	ç»¼å	Default â XHS algorithm balances relevance + engagement. Best for most research.
`newest`	ææ°	èæçæ§, breaking news, recent experiences, time-sensitive topics
`hot`	æç	Finding viral/popular posts, trending content

Examples:

# Default sort (recommended for most research)
bin/search "ææç¾é£æ¨è" 20

# Recent posts first (èæ, current events)
bin/search "æåç è¯ä»·" 20 180 newest

# Most popular posts
bin/search "ç½çº¢æå¡å°" 15 180 hot

Scroll pagination enabled (patched): When limit > 20, the tool scrolls to load more results via XHS infinite scroll. Actual results depend on available content.

For maximum coverage, combine:

Higher limits (e.g., limit=50) to scroll for more
Multiple keyword variations for different result sets:
- é¦èæå²©, é¦èæå²©é¦, é¦èæå²©ä½éª, é¦èæå²©è¯ä»·
- ææç¾é£, ææç¾é£æ¨è, ææå¿å, æææ¬å°äººæ¨è

Results vary by query â popular topics may return 30-50+, niche topics fewer.

Choosing sort order:

Most research â general (default). Let XHS’s algorithm surface the best content.
èæçæ§ / sentiment tracking â newest. You want recent opinions, not old viral posts.
Trend discovery â hot. See what’s currently popular.

Get Content

/root/clawd/skills/xhs/bin/get-content "FULL_URL_WITH_XSEC_TOKEN"

â ï¸ Must use full URL with xsec_token from search results.

Get Comments

/root/clawd/skills/xhs/bin/get-comments "FULL_URL_WITH_XSEC_TOKEN"

Get Images

Download all images from a post to local files:

/root/clawd/skills/xhs/bin/get-images "FULL_URL" /tmp/my-images

Fetch Post (Deep Dive with Images)

Fetch content, comments, and images in one call â with built-in retries:

/root/clawd/skills/xhs/bin/fetch-post "FULL_URL" /path/to/cache [max_retries]

Features:

Retries on timeout (60s â 90s â 120s)
Clear error reporting in JSON output
Images cached locally, bypassing CDN protection

Returns JSON:

{
  "success": true,
  "postId": "abc123",
  "content": { 
    "title": "...", 
    "author": "...", 
    "desc": "...", 
    "likes": "983", 
    "tags": [...],
    "postDate": "2025-09-04"  // â Added via patch!
  },
  "comments": [{ "author": "...", "content": "...", "likes": "3" }, ...],
  "imagePaths": ["/cache/images/abc123/1.jpg", ...],
  "errors": []
}

Date filtering: Use postDate to filter out old posts. Skip posts older than your threshold (e.g., 6-12 months for restaurants).

Workflow:

1. fetch-post â JSON + cached images
2. Read each imagePath directly (Claude sees images natively)
3. Combine text + comments + what you see into findings

Viewing images:

Read("/path/to/1.jpg")  # Claude sees it directly - no special tool needed

Look for: visible text (addresses, prices, hours), atmosphere, food presentation, crowd levels.

Research Methodology (Judgment Tasks)

This is where you think. Scripts do the fetching; you do the analyzing.

Depth Levels

Depth	Posts	When to Use
Minimum	5+	Quick checks, simple queries
Standard	8-10	Default for most research
Deep	15+	Complex topics, trip planning

Minimum is 5 â unless fewer exist. Note limited coverage if <5 results.

Research Workflow

Step 0: Preflight

Run bin/preflight. Don’t proceed until it passes.

Step 1: Plan Your Searches

Think: “What would a Chinese user search on å°çº¢ä¹¦?”

Include location when relevant
Add qualifiers: æ¨è, æ»ç¥, æµè¯, æ¢åº, æå¡, é¿å
Consider synonyms and variations
Plan 2-3 different search angles

Date filtering: Posts include postDate field (e.g., “2025-09-04”). The calling agent specifies the date filter based on research type:

Research Type	Suggested Filter	Why
èæçæ§ (sentiment)	1-4 weeks	Only current discourse matters
Breaking news/events	1-7 days	Time-critical
Travel planning	6-12 months	Recent but reasonable window
Product reviews	1-2 years	Longer product cycles
Trend analysis	Custom range	Compare specific periods
Historical/general	No limit	Want the full archive

Caller should specify in task description, e.g.:

“Only posts from last 30 days” (èæ)
“Posts from 2025 or later” (travel)
“No date filter” (general research)

If no filter specified: Default to 12 months (safe middle ground).

Fallback when postDate is null: Use keyword hints: 2025, æè¿, ææ°

Language strategy:

Location	Language	Example
China	Chinese	`æææå²©`
English-named venues	Both	`Rock Tenet ææ`
International	Chinese	`å·´é»ææ¸¸`

Step 2: Search & Scan

Run your searches. Results are already ranked by XHS’s algorithm (relevance + engagement).

Use judgment based on preview â like a human deciding what to click:

Think: “Given my research goal, would this post likely contain useful information?”

Research Type	What to prioritize
èæçæ§ (sentiment)	Any opinion/experience, even low engagement â complaints matter!
Travel planning	High engagement + detailed experiences
Product reviews	Mix of positive AND negative reviews
Trend analysis	Variety of perspectives

Preview Signal	Action
Relevant content in preview	â Fetch
Matches research goal	â Fetch
Low engagement but relevant opinion	â Fetch (esp. for èæ)
High engagement but off-topic	â Skip
Official announcements only	â ï¸ Context-dependent
å¹¿å/åä½ markers	â ï¸ Note as sponsored if fetching
Clearly off-topic	â Skip
Duplicate content	â Skip

Key insight: For èæçæ§, a 3-like complaint post may be more valuable than a 500-like promotional post. Engagement â relevance for all research types.

Step 3: Deep Dive Each Post

For each selected post, use fetch-post to get everything:

bin/fetch-post "url_from_search" {{RESEARCH_DIR}}/xhs

Returns JSON with content, comments, and cached images. Has built-in retries. Then:

A. Review content

Extract key facts from title/description
Note author’s perspective/bias
Check tags for categorization

B. View images (critical!) For each imagePath in the result, just read it:

Read("/path/to/1.jpg")  # You see it directly

Look for text overlays: addresses, prices, hours
Note visual details: ambiance, crowd levels, food presentation

â ï¸ Don’t describe images in isolation. Synthesize what you see with the post content and comments to form a holistic view. An image of a crowded restaurant + author saying “å¨æ«æé1å°æ¶” + comments confirming “äººè¶å¤” = that’s your finding about crowds.

C. Review comments (gold for updates)

“å·²ç»å³é¨äº” = already closed
Real experiences vs sponsored hype
Tips not in main post

D. Return picked images Include paths to the best/most informative images in your findings. The calling agent decides whether and how to use them (embed in reports, reference, etc.). You’re curating â pick images that show something useful (venue exterior, menu with prices, actual food, atmosphere) not just decorative shots.

Step 4: Synthesize

What do multiple sources agree on?
Any contradictions?
What’s the overall consensus?
What would you actually recommend?

Step 5: Output

Facts + Flavor â structured findings that preserve the XHS voice.

## XHS Research: [Topic]

### Search Summary
| Search | Results | Notes |
|--------|---------|-------|
| æææå²© | 10 | Good coverage |

### Findings

#### [Venue Name] (ä¸æå)
- **Type:** Restaurant / Activity / Attraction
- **Address:** [from post or image]
- **Price:** Â¥XX/person
- **Hours:** [if found]
- **The vibe:** [atmosphere, energy â preserved voice]
- **Why people like it:** [opinions, impressions]
- **Watch out for:** [warnings from comments]
- **Source:** [full URL]
- **Engagement:** X likes
- **Images:** [paths for calling agent to use]
  - `/path/to/1.jpg` â exterior/entrance
  - `/path/to/3.jpg` â menu with prices

> "å¼ç¨åæ..." â @username

### Overall Impressions
- Consensus across posts
- Patterns in preferences
- Things only locals know
- Disagreements worth noting

The XHS value is the human perspective. A recommendation that says “ç¯å¢ä¸è¬ä½æ¯å³éç»äº” tells you more than “Rating: 4.2/5”.

Think: “What would a friend who just spent an hour on XHS tell me?”

Quality Signals

Trustworthy:

100+ likes with real comments
Detailed personal experience
Multiple photos from actual visit
Specific details (prices, hours)
Year in title (e.g., “2025ä¸æµ·åå¡å¿åæ¦”)

Checking recency:

Look for dates in post text/title
Check if prices seem current
Comments mentioning “è¿å¨å” or “ç°å¨è¿æå” = might be outdated
Comments with recent dates confirm post is still relevant

Suspicious:

Overly positive, no specifics
Stock photos only
No comments or generic ones
Very old posts

Timing & Efficiency

XHS is SLOW â Plan Accordingly

The rednote-mcp CLI is slow (30-90s per search). Don’t rapid-fire poll.

When running searches via exec:

# GOOD: Give it time to complete
exec(command, yieldMs: 60000)  # Wait 60s before checking
process(poll)  # Then poll every 30s if still running

DON’T:

Poll every 2-3 seconds (wastes tokens, no benefit)
Start multiple searches simultaneously (overloads)
Wait indefinitely without writing partial results

Write Incrementally

Don’t wait until you’ve analyzed everything to start writing. After each batch of 3-5 posts:

Append findings to your output file
This protects against timeout/termination losing all work

## Findings (in progress)

### Batch 1: ç¾é£æç´¢ (3 posts analyzed)
[findings...]

### Batch 2: æ»ç¥æç´¢ (analyzing...)

Time Budget Awareness

If you’ve been running 15+ minutes:

Prioritize writing what you have
Note incomplete searches in output
Better to deliver 80% findings than lose 100% to termination

Retry Pattern

rednote-mcp is slow. If a command times out:

Attempt 1: default timeout
Attempt 2: +60s
Attempt 3: +120s

If all fail, report the failure. Do NOT fall back to web_search â defeats the purpose.

Error Handling

Error	Cause	Fix
Timeout	Network/XHS slow	Retry with longer timeout
Login/cookie error	Session expired	`xvfb-run -a rednote-mcp init`
404 / xsec_token	Missing token	Use full URL from search
Empty results	No posts	Try different keywords

Setup & Maintenance

First-Time Setup

npm install -g rednote-mcp
npx playwright install
/root/clawd/skills/xhs/patches/apply-all.sh
xvfb-run -a rednote-mcp init

Re-login (when cookies expire)

xvfb-run -a rednote-mcp init

After rednote-mcp updates

/root/clawd/skills/xhs/patches/apply-all.sh

Role Clarification

This skill = Research tool that outputs structured findings Calling agent = Synthesizes XHS + other sources into final reports, decides which images to embed

You return:

Synthesized findings (text + images + comments â holistic view)
Curated image paths (calling agent decides how to use them)
Preserved human voice (opinions, vibes, tips)

You don’t:

Describe images in isolation (“I see a restaurant…”)
Generate final reports (that’s the caller’s job)
Decide image layout/placement

XHS is like having a Chinese-speaking friend spend an hour researching for you. They’d give you facts, but also opinions, vibes, and insider tips. That’s what you’re capturing.

Remember: Research like a curious human. Explore, cross-reference, look at pictures, read comments. The “è¿å®¶ççç»äº” matters as much as the address.

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台

xhs