silicon-paddle-ocr

📁 aotenjou/silicon-paddleocr 📅 4 days ago
1
总安装量
1
周安装量
#50420
全站排名
安装命令
npx skills add https://github.com/aotenjou/silicon-paddleocr --skill silicon-paddle-ocr

Agent 安装分布

amp 1
opencode 1
kimi-cli 1
codex 1
github-copilot 1
gemini-cli 1

Skill 文档

OCR – Image Text Recognition

Use PaddleOCR to extract text content from images. Supports single image or batch processing.

Overview

This skill provides optical character recognition (OCR) capabilities using the PaddlePaddle/PaddleOCR-VL-1.5 model via the SiliconFlow API. Extract text from JPG, PNG, WebP, BMP, and GIF images.

When to Use

Invoke this skill when:

  • User wants to extract text from an image
  • User asks to OCR a screenshot or photo
  • User needs to read text from an image file
  • User mentions text recognition from images

How to Use

Prerequisites

Ensure the SILICONFLOW_API_KEY environment variable is set:

export SILICONFLOW_API_KEY="your_api_key"

Basic Usage

Execute the OCR script:

python3 scripts/ocr_skill.py [options] image_path

Arguments

Argument Description
images Image file path(s) or glob pattern (required)
-k, --api-key API key (default: from SILICONFLOW_API_KEY env)
-m, --model OCR model name (default: PaddlePaddle/PaddleOCR-VL-1.5)
-p, --prompt Recognition prompt for custom behavior
-j, --json Output results in JSON format
-o, --output Save results to specified file
--max-tokens Maximum tokens in response (default: 300)

Examples

Single image:

python3 scripts/ocr_skill.py /path/to/image.jpg

Multiple images with glob:

python3 scripts/ocr_skill.py /path/to/images/*.png

JSON output format:

python3 scripts/ocr_skill.py --json /path/to/image.jpg

Custom prompt for table extraction:

python3 scripts/ocr_skill.py -p "Please identify and format table content as Markdown" /path/to/table.jpg

Save to file:

python3 scripts/ocr_skill.py --json --output results.json /path/to/images/*.jpg

Output Format

Text output (default):

--- image.jpg ---
识别到的文字内容

JSON output:

{
  "image.jpg": "识别到的文字内容",
  "image2.png": "第二张图片的文字"
}

Supported Image Formats

  • JPG/JPEG
  • PNG
  • WebP
  • BMP
  • GIF

Error Handling

If processing fails:

  • Check that the image file exists
  • Verify the SILICONFLOW_API_KEY is valid
  • Ensure the API endpoint is reachable

Images that fail to process will show an error message, and other images will continue processing.

Additional Resources

Reference Files

  • references/api-configuration.md – API configuration details

Example Files

  • examples/sample-usage.sh – Example usage script

Scripts

  • scripts/ocr_skill.py – The main OCR implementation