silicon-paddle-ocr
1
总安装量
1
周安装量
#50420
全站排名
安装命令
npx skills add https://github.com/aotenjou/silicon-paddleocr --skill silicon-paddle-ocr
Agent 安装分布
amp
1
opencode
1
kimi-cli
1
codex
1
github-copilot
1
gemini-cli
1
Skill 文档
OCR – Image Text Recognition
Use PaddleOCR to extract text content from images. Supports single image or batch processing.
Overview
This skill provides optical character recognition (OCR) capabilities using the PaddlePaddle/PaddleOCR-VL-1.5 model via the SiliconFlow API. Extract text from JPG, PNG, WebP, BMP, and GIF images.
When to Use
Invoke this skill when:
- User wants to extract text from an image
- User asks to OCR a screenshot or photo
- User needs to read text from an image file
- User mentions text recognition from images
How to Use
Prerequisites
Ensure the SILICONFLOW_API_KEY environment variable is set:
export SILICONFLOW_API_KEY="your_api_key"
Basic Usage
Execute the OCR script:
python3 scripts/ocr_skill.py [options] image_path
Arguments
| Argument | Description |
|---|---|
images |
Image file path(s) or glob pattern (required) |
-k, --api-key |
API key (default: from SILICONFLOW_API_KEY env) |
-m, --model |
OCR model name (default: PaddlePaddle/PaddleOCR-VL-1.5) |
-p, --prompt |
Recognition prompt for custom behavior |
-j, --json |
Output results in JSON format |
-o, --output |
Save results to specified file |
--max-tokens |
Maximum tokens in response (default: 300) |
Examples
Single image:
python3 scripts/ocr_skill.py /path/to/image.jpg
Multiple images with glob:
python3 scripts/ocr_skill.py /path/to/images/*.png
JSON output format:
python3 scripts/ocr_skill.py --json /path/to/image.jpg
Custom prompt for table extraction:
python3 scripts/ocr_skill.py -p "Please identify and format table content as Markdown" /path/to/table.jpg
Save to file:
python3 scripts/ocr_skill.py --json --output results.json /path/to/images/*.jpg
Output Format
Text output (default):
--- image.jpg ---
è¯å«å°çæåå
容
JSON output:
{
"image.jpg": "è¯å«å°çæåå
容",
"image2.png": "第äºå¼ å¾ççæå"
}
Supported Image Formats
- JPG/JPEG
- PNG
- WebP
- BMP
- GIF
Error Handling
If processing fails:
- Check that the image file exists
- Verify the SILICONFLOW_API_KEY is valid
- Ensure the API endpoint is reachable
Images that fail to process will show an error message, and other images will continue processing.
Additional Resources
Reference Files
references/api-configuration.md– API configuration details
Example Files
examples/sample-usage.sh– Example usage script
Scripts
scripts/ocr_skill.py– The main OCR implementation