alicloud-ai-multimodal-qwen-vl
55
总安装量
55
周安装量
#7157
全站排名
安装命令
npx skills add https://github.com/cinience/alicloud-skills --skill alicloud-ai-multimodal-qwen-vl
Agent 安装分布
github-copilot
54
codex
54
kimi-cli
54
amp
54
gemini-cli
54
cursor
54
Skill 文档
Category: provider
Model Studio Qwen VL (Image Understanding)
Use Qwen VL models for image input + text output understanding tasks via DashScope compatible-mode API.
Prerequisites
- Install dependencies (recommended in a venv):
python3 -m venv .venv
. .venv/bin/activate
python -m pip install requests
- Set
DASHSCOPE_API_KEYin environment, or adddashscope_api_keyto~/.alibabacloud/credentials.
Critical model names
Prefer the Qwen3 VL family:
qwen3-vl-plusqwen3-vl-flash
When you need explicit “latest” routing or reproducible snapshots, use supported aliases/snapshots from the official model list, such as:
qwen3-vl-plus-latestqwen3-vl-plus-2025-12-19qwen3-vl-flash-latest
Legacy names still seen in some workloads:
qwen-vl-max-latestqwen-vl-plus-latest
Normalized interface (multimodal.chat)
Request
prompt(string, required): user question/instruction about image.image(string, required): HTTPS URL, local path, ordata:URL.model(string, optional): defaultqwen3-vl-plus.max_tokens(int, optional): default512.temperature(float, optional): default0.2.detail(string, optional):auto/low/high, defaultauto.json_mode(bool, optional): return JSON-only response when possible.schema(object, optional): JSON Schema for structured extraction.max_retries(int, optional): retry count for429/5xx, default2.retry_backoff_s(float, optional): exponential backoff base seconds, default1.5.
Response
text(string): primary model answer.model(string): model actually used.usage(object): token usage if returned by backend.
Quickstart
python skills/ai/multimodal/alicloud-ai-multimodal-qwen-vl/scripts/analyze_image.py \
--request '{"prompt":"è¯·æ¦æ¬è¿å¼ å¾éç主è¦å
容","image":"https://example.com/demo.jpg"}' \
--print-response
Using local image:
python skills/ai/multimodal/alicloud-ai-multimodal-qwen-vl/scripts/analyze_image.py \
--request '{"prompt":"æåå¾çä¸çå
³é®ä¿¡æ¯","image":"./samples/invoice.png","model":"qwen3-vl-plus"}' \
--print-response
Structured extraction (JSON mode):
python skills/ai/multimodal/alicloud-ai-multimodal-qwen-vl/scripts/analyze_image.py \
--request '{"prompt":"æååæ®µ: title, amount, date","image":"./samples/invoice.png"}' \
--json-mode \
--print-response
Structured extraction (JSON Schema):
python skills/ai/multimodal/alicloud-ai-multimodal-qwen-vl/scripts/analyze_image.py \
--request '{"prompt":"æååç¥¨åæ®µ","image":"./samples/invoice.png"}' \
--schema skills/ai/multimodal/alicloud-ai-multimodal-qwen-vl/references/examples/invoice.schema.json \
--print-response
cURL (compatible mode)
curl -sS https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model":"qwen3-vl-plus",
"messages":[
{
"role":"user",
"content":[
{"type":"image_url","image_url":{"url":"https://example.com/demo.jpg"}},
{"type":"text","text":"请æè¿°è¿å¼ å¾å¹¶ååºå¯æ§è¡å¨ä½"}
]
}
],
"max_tokens":512,
"temperature":0.2
}'
Output location
- If
--outputis set, JSON response is saved to that file. - Default output dir convention:
output/ai-multimodal-qwen-vl/.
Smoke test
python tests/ai/multimodal/alicloud-ai-multimodal-qwen-vl-test/scripts/smoke_test_qwen_vl.py \
--image output/ai-image-qwen-image/images/vl_test_cat.png
Error handling
| Error | Likely cause | Action |
|---|---|---|
| 401/403 | Missing or invalid key | Check DASHSCOPE_API_KEY and account permissions. |
| 400 | Invalid request schema or unsupported image source | Validate messages content and image URL/path format. |
| 429 | Rate limit | Retry with exponential backoff and lower concurrency. |
| 5xx | Temporary backend issue | Retry with backoff and idempotent request design. |
Operational guidance
- For stable production behavior, pin snapshot model IDs instead of pure
-latest. - Compress very large images before upload to reduce latency and cost.
- Add explicit extraction constraints in prompt (fields, JSON shape, language).
- For OCR-like output, ask for confidence notes and unresolved text markers.
References
- Source list:
references/sources.md - API notes:
references/api_reference.md