gemini-research-browser-use
npx skills add https://github.com/grasseed/google-search-browser-use --skill gemini-research-browser-use
Agent 安装分布
Skill 文档
Gemini Research Browser Use
Overview
Perform research or queries using Google Gemini via Chrome DevTools Protocol (CDP). This method reuses the user’s existing Chrome login session to interact with the Gemini web interface (https://gemini.google.com/).
Prerequisites
-
Python + websockets Verify:
python3 --version python3 -m pip show websocketsInstall if missing:
python3 -m pip install websockets -
Google Chrome Verify:
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" --version -
CDP Port Availability Verify Chrome is listening (after launch in Step 2):
curl -s http://localhost:9222/json | python3 -m json.tool -
Non-default user data directory (required by Chrome) Chrome CDP requires a non-default profile path. Use a cloned profile so you keep login state.
rm -rf /tmp/chrome-gemini-profile rsync -a "$HOME/Library/Application Support/Google/Chrome/" /tmp/chrome-gemini-profile/
Method Comparison
| Method | Pros | Cons | Recommended |
|---|---|---|---|
| Chrome Remote Debugging (CDP) | Uses existing login, full automation, reliable | Requires Chrome restart with debugging flag | â Yes |
browser-use --browser real |
Simple CLI | Opens new session without login | â No |
browser_subagent |
Visual feedback | Rate limited, may fail | â No |
â Recommended Method: Chrome Remote Debugging (CDP)
This is the most reliable method that uses your system Chrome with existing Google login.
Prerequisites
- Python 3 with
websocketslibrary - Google Chrome installed at
/Applications/Google Chrome.app/ - User logged into Google in Chrome
Step 1: Install websockets (if needed)
pip3 install websockets
# Or in virtual environment:
python3 -m venv .venv && ./.venv/bin/pip install websockets
Step 2: Launch Chrome with Remote Debugging (Non-default profile)
Important: Close any existing Chrome windows first, or use a different debugging port.
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
--remote-debugging-port=9222 \
--user-data-dir="/tmp/chrome-gemini-profile" \
"https://gemini.google.com/" &
Parameters explained:
--remote-debugging-port=9222: Enables CDP on port 9222--user-data-dir: Points to your existing Chrome profile (with login session)- The URL opens Gemini directly
Step 3: Verify Connection (CDP)
curl -s http://localhost:9222/json | python3 -m json.tool
Look for the Gemini page entry:
{
"title": "Google Gemini",
"url": "https://gemini.google.com/app",
"webSocketDebuggerUrl": "ws://localhost:9222/devtools/page/XXXXXXXX"
}
Note: If URL shows /app instead of just /, it means you’re logged in.
Step 4: Send Query to Gemini
Save this as gemini_query.py or run inline:
import asyncio
import websockets
import json
import subprocess
import sys
async def query_gemini(query_text, wait_seconds=30):
# Get the Gemini page WebSocket URL
result = subprocess.run(
["curl", "-s", "http://localhost:9222/json"],
capture_output=True, text=True
)
pages = json.loads(result.stdout)
# Find Gemini page
gemini_page = None
for page in pages:
if page.get("type") == "page" and "gemini.google.com" in page.get("url", ""):
gemini_page = page
break
if not gemini_page:
print("Error: Gemini page not found. Make sure Chrome is open with Gemini.")
return None
ws_url = gemini_page["webSocketDebuggerUrl"]
print(f"Connecting to: {ws_url}")
async with websockets.connect(ws_url) as ws:
# Step 1: Input the query
input_js = f'''
const editor = document.querySelector('div[contenteditable="true"]');
if(editor) {{
editor.focus();
document.execCommand('insertText', false, `{query_text}`);
editor.dispatchEvent(new Event('input', {{bubbles: true}}));
'success';
}} else {{
'editor not found';
}}
'''
await ws.send(json.dumps({
"id": 1,
"method": "Runtime.evaluate",
"params": {"expression": input_js}
}))
response = await ws.recv()
result = json.loads(response)
print(f"Input result: {result.get('result', {}).get('result', {}).get('value', 'unknown')}")
# Step 2: Click send button
await asyncio.sleep(1)
click_js = '''
const btn = document.querySelector('button[aria-label="å³éè¨æ¯"]');
if(btn) { btn.click(); 'clicked'; } else { 'button not found'; }
'''
await ws.send(json.dumps({
"id": 2,
"method": "Runtime.evaluate",
"params": {"expression": click_js}
}))
response = await ws.recv()
result = json.loads(response)
print(f"Click result: {result.get('result', {}).get('result', {}).get('value', 'unknown')}")
# Step 3: Wait for response
print(f"Waiting {wait_seconds} seconds for Gemini to respond...")
await asyncio.sleep(wait_seconds)
# Step 4: Extract the response
extract_js = '''
const markdownEls = document.querySelectorAll('.markdown');
if(markdownEls.length > 0) {
markdownEls[markdownEls.length - 1].innerText;
} else {
'No response found';
}
'''
await ws.send(json.dumps({
"id": 3,
"method": "Runtime.evaluate",
"params": {"expression": extract_js}
}))
response = await ws.recv()
result = json.loads(response)
content = result.get('result', {}).get('result', {}).get('value', 'No content')
return content
# Main execution
if __name__ == "__main__":
query = sys.argv[1] if len(sys.argv) > 1 else "ç¯ä¾åé¡ï¼è«ç¨ç¹é«ä¸æåçä»éº¼æ¯åå¡éï¼"
result = asyncio.run(query_gemini(query, wait_seconds=30))
print("\n" + "="*50)
print("GEMINI RESPONSE:")
print("="*50)
print(result)
Step 5: Run the Query
python3 gemini_query.py "ç¯ä¾åé¡ï¼ä½ çæ¥è©¢åé¡"
Or inline for simple queries:
python3 << 'EOF'
import asyncio
import websockets
import json
async def send_to_gemini():
# Get WebSocket URL
import subprocess
result = subprocess.run(["curl", "-s", "http://localhost:9222/json"], capture_output=True, text=True)
pages = json.loads(result.stdout)
ws_url = next(p["webSocketDebuggerUrl"] for p in pages if "gemini.google.com" in p.get("url", ""))
async with websockets.connect(ws_url) as ws:
# Input query
await ws.send(json.dumps({
"id": 1,
"method": "Runtime.evaluate",
"params": {"expression": '''
const editor = document.querySelector('div[contenteditable="true"]');
editor.focus();
document.execCommand('insertText', false, 'ç¯ä¾åé¡ï¼è«åææ¯ç¹å¹£æªä¾ç广 ¼èµ°å¢');
editor.dispatchEvent(new Event('input', {bubbles: true}));
'''}
}))
await ws.recv()
# Click send
await asyncio.sleep(1)
await ws.send(json.dumps({
"id": 2,
"method": "Runtime.evaluate",
"params": {"expression": '''document.querySelector('button[aria-label="å³éè¨æ¯"]').click()'''}
}))
await ws.recv()
# Wait and extract
await asyncio.sleep(30)
await ws.send(json.dumps({
"id": 3,
"method": "Runtime.evaluate",
"params": {"expression": '''
document.querySelectorAll('.markdown')[document.querySelectorAll('.markdown').length - 1].innerText
'''}
}))
response = await ws.recv()
print(json.loads(response)['result']['result']['value'])
asyncio.run(send_to_gemini())
EOF
Alternative Method: browser-use CLI
This method is simpler but does not use your existing Chrome login. You’ll need to log in manually each time.
Prerequisites
# Create virtual environment
python3 -m venv .venv
# Install browser-use
./.venv/bin/pip install browser-use
Workflow
1) Open Gemini
./.venv/bin/browser-use --browser real open "https://gemini.google.com/"
2) Get Page State
./.venv/bin/browser-use --browser real state
Look for:
- The input textbox:
contenteditable=true role=textbox - The send button:
aria-label=å³éè¨æ¯
3) Input Text via JavaScript eval
./.venv/bin/browser-use --browser real eval "const editor = document.querySelector('div[contenteditable=\"true\"]'); editor.focus(); document.execCommand('insertText', false, 'YOUR QUERY HERE'); editor.dispatchEvent(new Event('input', {bubbles: true}));"
4) Click Send Button
# Get current state to find button index
./.venv/bin/browser-use --browser real state
# Click the send button (replace INDEX with actual number)
./.venv/bin/browser-use --browser real click INDEX
5) Close Session
./.venv/bin/browser-use close
Troubleshooting
Chrome Remote Debugging Issues
| Problem | Cause | Solution |
|---|---|---|
curl: (7) Failed to connect |
Chrome not running with debugging | Restart Chrome with --remote-debugging-port=9222 |
| WebSocket connection refused | Page ID changed | Re-fetch /json to get new WebSocket URL |
| “editor not found” | Page not fully loaded | Wait a few seconds before running script |
| “button not found” | Send button not visible | Check if text was actually input first |
| Login page instead of app | Wrong user-data-dir path | Verify path: "$HOME/Library/Application Support/Google/Chrome" |
DevTools remote debugging requires a non-default data directory |
Chrome disallows default profile for CDP | Launch with a cloned profile: /tmp/chrome-gemini-profile |
curl shows connection refused even though Chrome is running |
CDP not listening due to profile path | Ensure --user-data-dir is not default and the port is free |
No Gemini page found via CDP |
Gemini not loaded or not logged in | Open https://gemini.google.com/ in the launched Chrome and wait for /app |
browser-use Issues
| Problem | Cause | Solution |
|---|---|---|
| Not logged in | browser-use creates isolated session | Use Chrome Remote Debugging method instead |
Unknown key: "è«" error |
CLI doesn’t support Unicode | Use eval with JavaScript execCommand |
| Click doesn’t work | Element index changed | Re-run state before each click |
Best Practices
- Always use Chrome Remote Debugging for queries requiring authentication
- Wait 30+ seconds for complex queries (Gemini’s “Deep Think” mode takes longer)
- Check for
.markdownelements to verify response is complete - Use inline Python for one-off queries; use the full script for automation
- Close Chrome debugging session when done to avoid port conflicts
- Keep profile cloned in
/tmp/chrome-gemini-profileto avoid CDP blocking the default profile
Complete Example: Crypto Price Analysis
宿´å·¥ä½æµç¨
# Step 1: æºå Chrome è¨å®æªå¯æ¬ (é¿å
CDP é è¨ç®ééå¶)
rm -rf /tmp/chrome-gemini-profile
rsync -a "$HOME/Library/Application Support/Google/Chrome/" /tmp/chrome-gemini-profile/
# Step 2: åå Chrome é 端é¤é¯æ¨¡å¼
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
--remote-debugging-port=9222 \
--user-data-dir="/tmp/chrome-gemini-profile" \
"https://gemini.google.com/" > /dev/null 2>&1 &
# Step 3: çå¾
é é¢è¼å
¥ä¸¦é©è飿¥
sleep 8
curl -s http://localhost:9222/json | python3 -c "import sys, json; pages = json.load(sys.stdin); gemini = [p for p in pages if p.get('type') == 'page' and 'gemini.google.com' in p.get('url', '')]; print(f\"æ¾å° Gemini é é¢: {gemini[0]['url'] if gemini else 'æªæ¾å°'}\")"
æ¹æ³ 1: 宿´æ¥è©¢è ³æ¬ (query_gemini.py)
å°ä»¥ä¸å
§å®¹å²åçº query_gemini.py:
import asyncio
import websockets
import json
import subprocess
import sys
async def query_gemini(query_text, wait_seconds=60):
# Get the Gemini page WebSocket URL
result = subprocess.run(
["curl", "-s", "http://localhost:9222/json"],
capture_output=True, text=True
)
pages = json.loads(result.stdout)
# Find Gemini page
gemini_page = None
for page in pages:
if page.get("type") == "page" and "gemini.google.com" in page.get("url", ""):
gemini_page = page
break
if not gemini_page:
print("é¯èª¤:æ¾ä¸å° Gemini é é¢ãè«ç¢ºä¿ Chrome å·²éå Geminiã")
return None
ws_url = gemini_page["webSocketDebuggerUrl"]
print(f"æ£å¨é£æ¥å°: {ws_url}")
async with websockets.connect(ws_url) as ws:
# Step 1: Input the query
input_js = f'''
const editor = document.querySelector('div[contenteditable="true"]');
if(editor) {{
editor.focus();
document.execCommand('insertText', false, `{query_text}`);
editor.dispatchEvent(new Event('input', {{bubbles: true}}));
'success';
}} else {{
'editor not found';
}}
'''
await ws.send(json.dumps({
"id": 1,
"method": "Runtime.evaluate",
"params": {"expression": input_js}
}))
response = await ws.recv()
result = json.loads(response)
print(f"輸å
¥çµæ: {result.get('result', {}).get('result', {}).get('value', 'unknown')}")
# Step 2: Click send button
await asyncio.sleep(1)
click_js = '''
const btn = document.querySelector('button[aria-label="å³éè¨æ¯"]');
if(btn) { btn.click(); 'clicked'; } else { 'button not found'; }
'''
await ws.send(json.dumps({
"id": 2,
"method": "Runtime.evaluate",
"params": {"expression": click_js}
}))
response = await ws.recv()
result = json.loads(response)
print(f"é»æçµæ: {result.get('result', {}).get('result', {}).get('value', 'unknown')}")
# Step 3: Wait for response
print(f"çå¾
{wait_seconds} ç§è® Gemini åæ...")
await asyncio.sleep(wait_seconds)
# Step 4: Extract the response - try to get complete content
extract_js = '''
const markdownEls = document.querySelectorAll('.markdown');
if(markdownEls.length > 0) {
const lastMarkdown = markdownEls[markdownEls.length - 1];
// Get all text content including nested elements
lastMarkdown.innerText || lastMarkdown.textContent || 'Empty response';
} else {
'No response found';
}
'''
await ws.send(json.dumps({
"id": 3,
"method": "Runtime.evaluate",
"params": {"expression": extract_js}
}))
response = await ws.recv()
result = json.loads(response)
content = result.get('result', {}).get('result', {}).get('value', 'No content')
return content
# Main execution
if __name__ == "__main__":
query = """ç¯ä¾åé¡ï¼è«è©³ç´°åæ BTCãETH ç广 ¼é 測走å¢ã
éå
å«ç¸éå°æ¥ææ¨ï¼ä¸¦ç¨ç¹é«ä¸æåçã"""
result = asyncio.run(query_gemini(query, wait_seconds=60))
print("\n" + "="*50)
print("GEMINI åæ:")
print("="*50)
print(result)
å·è¡æ¹å¼:
python3 query_gemini.py
æ¹æ³ 2: ç²åå·²åå¨çåæ (get_gemini_response.py)
妿 Gemini é é¢å·²ç¶æåæ,å¯ä»¥ä½¿ç¨æ¤è ³æ¬ç´æ¥æå:
import asyncio
import websockets
import json
import subprocess
async def get_all_gemini_content():
# Get the Gemini page WebSocket URL
result = subprocess.run(
["curl", "-s", "http://localhost:9222/json"],
capture_output=True, text=True
)
pages = json.loads(result.stdout)
# Find Gemini page
gemini_page = None
for page in pages:
if page.get("type") == "page" and "gemini.google.com" in page.get("url", ""):
gemini_page = page
break
if not gemini_page:
print("é¯èª¤:æ¾ä¸å° Gemini é é¢ã")
return None
ws_url = gemini_page["webSocketDebuggerUrl"]
print(f"æ£å¨é£æ¥å°: {ws_url}\n")
async with websockets.connect(ws_url) as ws:
# Extract all markdown content from the page
extract_js = '''
(function() {
const markdownEls = document.querySelectorAll('.markdown');
console.log('Found markdown elements:', markdownEls.length);
if(markdownEls.length === 0) {
return 'No markdown elements found';
}
// Get the last two markdown elements (user query and AI response)
const responses = [];
const startIdx = Math.max(0, markdownEls.length - 2);
for(let i = startIdx; i < markdownEls.length; i++) {
const text = markdownEls[i].innerText || markdownEls[i].textContent || '';
if(text.trim()) {
responses.push(`[åæ ${i+1}]:\\n${text}`);
}
}
return responses.join('\\n\\n' + '='.repeat(80) + '\\n\\n');
})()
'''
await ws.send(json.dumps({
"id": 1,
"method": "Runtime.evaluate",
"params": {"expression": extract_js, "returnByValue": True}
}))
response = await ws.recv()
result = json.loads(response)
content = result.get('result', {}).get('result', {}).get('value', 'No content')
return content
# Main execution
if __name__ == "__main__":
result = asyncio.run(get_all_gemini_content())
print("="*80)
print("GEMINI å°è©±å
§å®¹:")
print("="*80)
print(result)
å·è¡æ¹å¼:
python3 get_gemini_response.py
實é使ç¨ç¯ä¾
# 宿´æµç¨
rm -rf /tmp/chrome-gemini-profile && \
rsync -a "$HOME/Library/Application Support/Google/Chrome/" /tmp/chrome-gemini-profile/ && \
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
--remote-debugging-port=9222 \
--user-data-dir="/tmp/chrome-gemini-profile" \
"https://gemini.google.com/" > /dev/null 2>&1 &
# çå¾
並å·è¡æ¥è©¢
sleep 8 && python3 query_gemini.py
æ¸ çè³æº
宿æ¥è©¢å¾,å»ºè°æ¸ çè¨ææä»¶åè³æº:
# 1. éé Chrome é¤é¯æè©±
pkill -9 "Google Chrome"
# 2. æ¸
çè¨æè¨å®æª (å¯é¸,éæ¾ç£ç¢ç©ºé)
rm -rf /tmp/chrome-gemini-profile
# 3. æ¸
çæ¸¬è©¦éç¨ä¸çæçè¨æè
³æ¬åè¼¸åºæä»¶
rm -f query_gemini.py get_gemini_response.py get_all_gemini_content.py
rm -f gemini_response.txt gemini_full_response.txt
æä½³å¯¦è¸:
- æ¯æ¬¡ä½¿ç¨å¾éé Chrome – é¿å ä½ç¨ 9222 端å£
- 宿æ¸
çè¨æè¨å®æª –
/tmp/chrome-gemini-profileå¯è½ä½ç¨æ¸ç¾ MB - ä¿æå·¥ä½ç®éæ´æ½ – åªé¤æ¸¬è©¦è ³æ¬,å°å¸¸ç¨è ³æ¬æ´åå°å°æ¡ä¸
- 使ç¨å®æ´è
³æ¬ – å°ä¸è¿°
query_gemini.pyå²åçºå°æ¡æä»¶,èéæ¯æ¬¡éæ°å»ºç«
注æäºé
- çå¾
æéèª¿æ´ – è¤éæ¥è©¢(妿·±åº¦åæ)建è°
wait_seconds=60ææ´é· - åææªæ·åé¡ – 妿åæå¾é·,å¯è½éè¦å¤æ¬¡æåæä½¿ç¨
get_all_gemini_content.pyæ¹æ³ - ç»å ¥çæ – ç¢ºä¿ Chrome è¨å®æªä¸å·²ç»å ¥ Google 帳è
- ç¶²è·¯ç©©å®æ§ – CDP 飿¥éè¦ç©©å®ç網路ç°å¢
- 並ç¼éå¶ – é¿å åæéåå¤å Chrome é¤é¯æè©±å¨åä¸ç«¯å£