hf-papers-reporter
2
总安装量
2
周安装量
#75415
全站排名
安装命令
npx skills add https://github.com/xdrshjr/jr-openclaw-skills --skill hf-papers-reporter
Agent 安装分布
trae
2
gemini-cli
2
replit
2
antigravity
2
claude-code
2
codex
2
Skill 文档
Hugging Face Daily Papers Reporter
Generate professional Word reports from Hugging Face Daily Papers with full text extraction and image capture.
What This Skill Does
- Scrapes huggingface.co/papers for the top papers
- Downloads PDFs from arXiv
- Extracts Abstract and Introduction sections
- Extracts figures/images from PDFs
- Generates a formatted Word document (.docx) with:
- Paper titles and arXiv links
- Cover images from HF
- Full abstracts
- Introduction sections
- Extracted figures from papers
Quick Start
Run the main script to generate today’s report:
cd /path/to/hf-papers-reporter
python3 scripts/process_papers.py
Output will be saved to output/HF_Daily_Papers_Report.docx
Dependencies
Install required packages:
pip3 install PyMuPDF python-docx Pillow beautifulsoup4 requests
How It Works
Step 1: Fetch Paper List
- Scrapes huggingface.co/papers
- Extracts arXiv IDs, titles, and cover image URLs
Step 2: Download & Process (per paper)
Download PDF from arxiv.org/pdf/{id}.pdf
â
Extract text (first 5 pages)
- Abstract (regex match)
- Introduction (regex match)
â
Extract images (first 5 pages, max 3 per page)
- Compress to 600x400
â
Download cover image from HF CDN
- Compress to 800x600
Step 3: Generate Word Document
- Title page with report name and date
- Each paper as a section with:
- Cover image (centered)
- Abstract section
- Introduction section
- Extracted figures (up to 4)
Output Structure
hf_papers/
âââ pdfs/ # Downloaded PDFs
âââ images/ # Cover images + extracted figures
âââ output/
âââ HF_Daily_Papers_Report.docx
âââ papers_data.json
Known Issues & Solutions
| Issue | Cause | Fix |
|---|---|---|
| XML encoding error | PDF text contains control characters | Script auto-cleans 0x00-0x1F chars |
| No abstract found | PDF structure varies | Multiple regex patterns tried |
| Large PDFs | Some papers are 20MB+ | Only first 5 pages processed |
Customization
To modify the number of papers (default: 10), edit the PAPERS list in scripts/process_papers.py.
To change image sizes, modify the thumbnail() calls in the script.