ai-paper-reader
npx skills add https://github.com/frostant/awesome-claude-skills --skill ai-paper-reader
Agent 安装分布
Skill 文档
AI 论æé 读ç¬è®°çæå¨
æ ¸å¿ç®æ
çæå¯ç´æ¥åå¸å°ææ¯ç¤¾åºç论æé 读ç¬è®°ï¼ç¥ä¹ãæéãå ¬ä¼å·çï¼ã
ç¬è®°è¦æ±ï¼
- å 容宿´ï¼æ ¸å¿ææ¯ç»èä¸éæ¼ï¼åæ°ç¹æ·±å»éè¿°
- ä¸ä¸æè¯»ï¼ææ¯å客飿 ¼ï¼æ¢ææ·±åº¦å便äºçè§£
- 客è§åç¡®ï¼åºäºè®ºæå 容åæï¼ä¸æ·»å 主è§èæ
- 深度æèï¼éè¿ Q&A ç¯è帮å©è¯»è æ·±å ¥çè§£
åä½è§è
åºè¯¥å
-
ä¸ä¸åç¡®ç表述
- 使ç¨é¢åå è§èçæ¯è¯
- å ¬å¼å符å·ä¸¥æ ¼å¯¹åºè®ºæåæ
- ææ¯ç»èæè¿°æ¸ æ°æ æ§ä¹
-
æ·±å ¥æµ åºçè§£é
- 夿æ¦å¿µå ç»ç´è§ï¼åç»ç»è
- ç¨ç±»æ¯å¸®å©çè§£æ½è±¡æ¦å¿µ
- å ¬å¼é项解éåéå«ä¹
-
ç»ææ¸ æ°çç»ç»
- é»è¾å±æ¬¡åæ
- éç¹å 容çªåº
- éå½ä½¿ç¨å¾è¡¨è¾ å©è¯´æ
-
æä»·å¼ç深度åæ
- åæè®¾è®¡éæ©èåçåå
- 对æ¯ä¸ç¸å ³å·¥ä½çå¼å
- æåºæ¹æ³çéç¨èå´åå±é
å¿ é¡»é¿å
-
AI å¥è¯å模æ¿å¥å¼
- â “æ¬æçæ ¸å¿è´¡ç®æ¯…”
- â “è¯¥æ¹æ³çä¼å¿å¨äº…”
- â “ç»¼ä¸æè¿°…”
- â “å¼å¾æ³¨æç毅”
- â “å ·æéè¦æä¹/广æ³åºç¨å毅”
-
空æ´çæ»ç»åè¯ä»·
- â “è¿æ¯ä¸ç¯éè¦ç工佔
- ⠓为该é¢åæä¾äºæ°æè·¯”
- â ä¸å¸¦å ·ä½åæçæ³æ³èè°
-
è¿åº¦çæ ¼å¼è£ 饰
- â 大é emoji
- â æ¯å¥è¯é½å ç²
- â è¿å¤å±çº§åµå¥
-
ä¸å¿ è¦ç第ä¸äººç§°
- â “æè®¤ä¸º…”
- â “æçç解毅”
- ä¿æå®¢è§åè¿°è§è§
ç¬è®°ç»æ
é¶ãå ä¿¡æ¯ï¼ç¬è®°å¼å¤´ï¼
æ¯ç¯ç¬è®°å¼å¤´éå å«ä»¥ä¸ä¿¡æ¯ï¼å¸®å©è¯»è å¿«é夿æ¯å¦ç»§ç»é 读ï¼
> **论æ**ï¼Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations
> **ä½è
**ï¼Meta AI
> **å表**ï¼ICML 2024
> **é
读æ¶é¿**ï¼çº¦ 15 åé
> **é¾åº¦**ï¼ââââ (éè¦ Transformerãæ¨èç³»ç»åºç¡)
> **åç½®ç¥è¯**ï¼Attention æºå¶ãDLRMãScaling Law æ¦å¿µ
é¾åº¦ç级说æï¼
- â å ¥é¨çº§ï¼æ éä¸ä¸èæ¯
- ââ åºç¡çº§ï¼äºè§£æ·±åº¦å¦ä¹ åºç¡
- âââ è¿é¶çº§ï¼çæç¸å ³é¢å
- ââââ ä¸ä¸çº§ï¼éè¦è¾æ·±çé¢åç¥è¯
- âââââ ä¸å®¶çº§ï¼æ¶å夿æ°å¦æå沿ç ç©¶
ä¸ãTL;DR
ç¨ 2-3 å¥è¯æ¦æ¬è®ºæçæ ¸å¿åæ°ï¼è®©æ²¡æ¶é´ç»è¯»ç读è å¿«éæä½éç¹ã
## TL;DR
DLRM çä¼ ç»æ¨è模åä¾èµå¤§é人工ç¹å¾ä¸æ æ³ scaleï¼æ¬ææåºå°æ¨èé®é¢è½¬å为åºåçæé®é¢ãæ ¸å¿åæ°æ¯ HSTU æ¶æï¼ç¨ Pointwise Attention æ¿ä»£ Softmax æ¥ä¿çç¨æ·å好çç»å¯¹å¼ºåº¦ä¿¡æ¯ï¼ä½¿æ¨èç³»ç»é¦æ¬¡å±ç°åºç±»ä¼¼ LLM ç Scaling Lawã
è¦æ±ï¼
- 2-3 å¥è¯ï¼ä¸è¶ è¿ 100 å
- å¿ é¡»å å«ï¼é®é¢èæ¯ + æ ¸å¿æ¹æ¡ + å ³é®åæ°ç¹
- é¿å æ³æ³èè°ï¼è¦æå ·ä½ææ¯ç¹
äºãè®ºææ¦è¿°
ç®ææ¼è¦å°åçä¸ä¸ªé®é¢ï¼
- è§£å³ä»ä¹é®é¢ï¼ä¸å¥è¯æè¿°
- æ ¸å¿æ¹æ¡ï¼ä¸å¥è¯æ¦æ¬
- 主è¦è´¡ç®ï¼2-3 ç¹å举
## è®ºææ¦è¿°
**é®é¢**ï¼å¤§è§æ¨¡æ¨èç³»ç»æ æ³å LLM 䏿 ·éè¿å¢å 计ç®éæç»æåè´¨é
**æ¹æ¡**ï¼å°æ¨èé®é¢ä»"ç¹å¾å·¥ç¨+å¤å«å¼æ¨¡å"转å为"åºå建模+çæå¼æ¨¡å"
**è´¡ç®**ï¼
1. æåº Generative Recommenders (GRs) èå¼ï¼å®ç°æ¨èç³»ç»ç Scaling Law
2. 设计 HSTU æ¶æï¼ç¨ Pointwise Attention æ¿ä»£ Softmax ä¿ç强度信æ¯
3. æåº M-FALCON æ¨çç®æ³ï¼å®ç°é«æçåéæå
ä¸ãèæ¯ä¸å¨æº
说æç°ææ¹æ³çé®é¢ï¼ä»¥å为ä»ä¹éè¦æ°æ¹æ³ï¼
- ç°ææ¹æ³æä¹åç
- åå¨ä»ä¹é®é¢/ç¶é¢
- é®é¢çæ ¹æ¬åå æ¯ä»ä¹
åãæ ¸å¿æ¹æ³ï¼éç¹ï¼
è¿æ¯ç¬è®°çæ ¸å¿é¨åï¼è¦æ±å®æ´ãæ·±å ¥ãä¸éæ¼ã
ç»ç»æ¹å¼
-
æ´ä½æ¶æ
- ç»åºæ¶æå¾
- è¯´ææ°æ®æµå
- æ æ³¨å ³é®æ¨¡å
-
æ ¸å¿æ¨¡å详解ï¼å¯¹æ¯ä¸ªå ³é®æ¨¡åï¼
- è¾å ¥è¾åºè¯´æ
- æ ¸å¿å ¬å¼ + é项解é
- 伪代ç /代ç å®ç°
- è®¾è®¡éæ©çåå åæ
-
å ³é®ææ¯ç»è
- è®ç»çç¥
- è¶ åæ°è®¾ç½®
- å®ç° tricks
ç¤ºä¾æ ¼å¼
## æ ¸å¿æ¹æ³
### æ´ä½æ¶æ
[æ¶æå¾]
æ°æ®æµï¼ç¨æ·åå²åºå â Embedding â HSTU Layers à L â 颿µå¤´
### HSTU Layer 详解
#### è¾å
¥è¾åº
- è¾å
¥ï¼X â R^{NÃd}ï¼N 为åºåé¿åº¦ï¼d 为åµå
¥ç»´åº¦
- è¾åºï¼Y â R^{NÃd}
#### æ ¸å¿å
¬å¼
**Pointwise Projection**:
$$U, V, Q, K = \text{Split}(\phi_1(f_1(X)))$$
å
¶ä¸ï¼
- $\phi_1$ï¼SiLU æ¿æ´»å½æ°
- $f_1$ï¼åå±çº¿æ§åæ¢
- Split å°è¾åºå为å个åé
**Spatial Aggregation**:
$$A(X)V(X) = \phi_2(Q(X)K(X)^T + r_{ab}) V(X)$$
å
³é®ç¹ï¼ä½¿ç¨ SiLU èé Softmaxï¼ä¿ç注æåçç»å¯¹å¼ºåº¦ä¿¡æ¯ã
#### 代ç å®ç°
```python
class HSTULayer(nn.Module):
def forward(self, x):
# Pointwise Projection
projected = F.silu(self.proj_in(x))
u, v, q, k = projected.split([...], dim=-1)
# Spatial Aggregation (䏿¯ Softmax!)
attn = F.silu(q @ k.T + self.rel_bias)
out = self.norm(attn @ v) * u
return x + self.proj_out(out)
设计åæ
为ä»ä¹ç¨ SiLU è䏿¯ Softmaxï¼
Softmax ä¼å°æ³¨æåå½ä¸åå°æ¦çåå¸ï¼è¿å¨æ¨èåºæ¯ä¸ä¼ä¸¢å¤±éè¦ä¿¡æ¯…
### äºãå®éªåæ
䏿¯ç½åæ°åï¼èæ¯æç¼å
³é®ç»è®ºï¼
- **主å®éªç»æ**ï¼ä¸ baseline 对æ¯çæ ¸å¿åç°
- **æ¶èå®éª**ï¼åç»ä»¶çè´¡ç®åæ
- **Scaling åæ**ï¼è®¡ç®é䏿§è½çå
³ç³»ï¼å¦æï¼
- **å±éæ§**ï¼æ¹æ³å¨ä»ä¹æ
åµä¸ææä¸å¥½
### å
ãæ·±åº¦çè§£é®ç
éè¿ç²¾å¿è®¾è®¡çé®é¢ï¼å¸®å©è¯»è
æ·±å
¥ç解论æçå
³é®ç¹ã
**é®çç´æ¥å±ç¤ºï¼ä¸ä½¿ç¨æå **ã
```markdown
## 深度çè§£é®ç
### Q1: 为ä»ä¹ Softmax Attention ä¸é忍èåºæ¯ï¼
æ¨èåºæ¯éè¦é¢æµç¨æ·å好ç**ç»å¯¹å¼ºåº¦**ï¼å¦è§çæ¶é¿ï¼ï¼èéåªæ¯**ç¸å¯¹æåº**ã
èèä¸¤ä¸ªç¨æ·ï¼
- ç¨æ· Aï¼10 次åå²äº¤äº
- ç¨æ· Bï¼100 次åå²äº¤äº
ä½¿ç¨ Softmax æ¶ï¼ä¸¤è
çæ³¨æåæéé½ä¼è¢«å½ä¸åå° [0,1]ï¼å¯¼è´"ç¨æ· B æ´æ´»è·"è¿ä¸ä¿¡æ¯ä¸¢å¤±ã
è Pointwise Attention ä¿çäºç´¯å çåå§ magnitudeï¼æ¨¡åå¯ä»¥å¦å°æ´»è·åº¦å·®å¼ã
### Q2: HSTU å¦ä½ç¨ 2 个线æ§å±æ¿ä»£ Transformer ç 6 个ï¼
æ å Transformer æ¯å±éè¦ï¼
- Q, K, V æå½±ï¼3 个线æ§å±
- Output Projectionï¼1 个线æ§å±
- FFNï¼2 个线æ§å±ï¼æ©å±+å缩ï¼
HSTU çç®åï¼
1. **èå Q, K, V, U æå½±**ï¼ä¸ä¸ªçº¿æ§å±åæ¶çæå个åé
2. **ç¨ U 鍿§æ¿ä»£ FFN**ï¼`output * U` å®ç°ç±»ä¼¼çé线æ§åæ¢
代价æ¯åå±è¡¨è¾¾è½åä¸éï¼ä½å¯ä»¥éè¿å å æ´å¤å±æ¥è¡¥å¿ã
### Q3: Stochastic Length è®ç»ä¸ºä»ä¹è½ä¸¢å¼ 70% token èææå ä¹ä¸åï¼
å
³é®å¨äºç¨æ·è¡ä¸ºç**ç»è®¡ç¹æ§**ï¼
1. **æ¶é´é夿§**ï¼ç¨æ·ä¼åå¤ä¸ç¸ä¼¼å
容交äºï¼ä¿¡æ¯åä½åº¦é«
2. **å
´è¶£ä½ç§©æ§**ï¼10000 次交äºå¯è½åªæ¶å 20 个主è¦å
´è¶£ç¹
3. **æè¿ä¼å
**ï¼éæ ·æ¶å¯¹æè¿è¡ä¸ºå æï¼ä¿çæç¸å
³çä¿¡æ¯
åªè¦éæ ·æ°é大äºå
´è¶£ç±»å«æ°çä¸å®åæ°ï¼å°±è½ä»¥é«æ¦çè¦çææå
´è¶£ã
ä¸ãæ»ç»ä¸æè
å®¢è§æ»ç»è®ºæçè´¡ç®åå±éï¼
## æ»ç»
### æ ¸å¿è´¡ç®
- è¯æäºæ¨èç³»ç»å¯ä»¥éµå¾ª Scaling Law
- æåºäºé忍èåºæ¯ç Attention åä½
### å±éæ§
- å·å¯å¨åºæ¯ï¼åå²åºåå¤ªçæ¶ä¼å¿ä¸ææ¾
- è®¡ç®ææ¬ï¼éè¦å¤§é GPU èµæº
- 宿¶æ§ï¼é¿åºåæ¨ççå»¶è¿ææ
### éç¨åºæ¯
- ç¨æ·åå²ä¸°å¯çåºæ¯ï¼>100 次交äºï¼
- æå
足计ç®èµæº
- 坹宿¶æ§è¦æ±ä¸æ¯æç«¯ä¸¥æ ¼
ç®å½ç»æè§è
论æåé 读ç¬è®°åºç»ç»å¨ç»ä¸çåç®å½ä¸ï¼ä¾¿äºç®¡çåæ£ç´¢ï¼
paper-notes/
âââ hstu/ # æ¯ç¯è®ºæä¸ä¸ªç®å½ï¼ä½¿ç¨ç®çåç§°
â âââ paper.pdf # åå§è®ºæ PDF
â âââ README.md # é
读ç¬è®°ï¼ä¸»æä»¶ï¼
â âââ images/ # æåçå¾è¡¨
â âââ fig1_architecture.png
â âââ fig2_method.png
â âââ fig3_scaling.png
â
âââ attention-is-all-you-need/
â âââ paper.pdf
â âââ README.md
â âââ images/
â
âââ din-deep-interest-network/
âââ paper.pdf
âââ README.md
âââ images/
å½åè§èï¼
- ç®å½åï¼è®ºæç®ç§°æå
³é®è¯ï¼å°åï¼ç¨
-è¿æ¥ - ç¬è®°æä»¶ï¼ç»ä¸å½å为
README.mdï¼ä¾¿äº GitHub ç´æ¥é¢è§ - å¾çç®å½ï¼ç»ä¸å½å为
images/
å¾çå½åè§èï¼
fig{åºå·}_{ç±»å}_{ç®è¿°}.png
ç±»åï¼
- arch: æ¶æå¾
- method: æ¹æ³æµç¨
- result: å®éªç»æ
- ablation: æ¶èå®éª
- compare: 对æ¯å¾
示ä¾ï¼
- fig1_arch_overall.png
- fig2_method_attention.png
- fig3_result_scaling.png
Q&A ç¯è设计æå
é®é¢ç±»å
-
åçç解类
- 为ä»ä¹è¿æ ·è®¾è®¡ï¼
- 䏿¿ä»£æ¹æ¡ç¸æ¯æä½ä¼å¿ï¼
-
ç»è辨æç±»
- æä¸ªç¬¦å·/æä½çå ·ä½å«ä¹
- å®¹ææ··æ·çæ¦å¿µåºå
-
è¾¹çæ¡ä»¶ç±»
- ä»ä¹æ åµä¸æ¹æ³ä¼å¤±æï¼
- å设æ¡ä»¶æ¯ä»ä¹ï¼
-
延伸æèç±»
- è½å¦è¿ç§»å°å ¶ä»åºæ¯ï¼
- æåªäºå¯è½çæ¹è¿æ¹åï¼
çæ¡è¦æ±
- ç´æ¥å±ç¤ºï¼ä¸ä½¿ç¨æå ï¼è¯»è å¯ä»¥é¡ºç é 读
- æçææ®ï¼çæ¡è¦æè®ºè¯ï¼ä¸æ¯ç®åæè¨
- éå½ä¸¾ä¾ï¼ç¨å ·ä½ä¾å帮å©çè§£
- æ¿è®¤ä¸ç¡®å®ï¼å¯¹äºè®ºææªè¯´æçé¨åï¼å¯ä»¥æ 注”æ¨æµ”
å¾è¡¨å¤ç
å¿ é¡»æåçå¾è¡¨
- æ´ä½æ¶æå¾
- æ ¸å¿æ¹æ³æµç¨å¾
- å ³é®å®éªç»æï¼Scaling Law æ²çº¿çï¼
å¾è¡¨è¯´æè§è

**å¾ç¤ºå
容**ï¼HSTU çæ´ä½æ¶æï¼å·¦ä¾§ä¸º DLRM 对æ¯
**å
³é®ä¿¡æ¯**ï¼
- è¾å
¥ä¸ºç»ä¸çç©å-è¡ä¸ºäº¤æ¿åºå
- HSTU Layer å¯ä»¥æ éå å
- è¾åºä¸ºå¤ä»»å¡é¢æµå¤´
**䏿£æå¯¹åº**ï¼ç¬¬ 3.2 èè¯¦ç»æè¿°
å¾çæåå·¥å ·
妿¯è®ºæä¸çå¾è¡¨æä¸¤ç§ç±»åï¼éè¦ä¸åçæåæ¹å¼ï¼
| ç±»å | ç¹ç¹ | æåæ¹æ³ |
|---|---|---|
| åµå ¥å¼å¾ç | ä½è æå ¥ç PNG/JPEG | get_images() |
| ç¢éå¾å½¢ | æ¶æå¾ãæµç¨å¾çç»å¶çå¾å½¢ | cluster_drawings() |
æ¹æ³ 1: æååµå ¥å¼å¾ç
éç¨äºè®ºæä¸ç´æ¥æå ¥çä½å¾ï¼å¦å®éªç»ææªå¾ãç §ççï¼ï¼
import fitz # PyMuPDF
import os
def extract_embedded_images(pdf_path, output_dir):
"""æå PDF ä¸åµå
¥çä½å¾"""
os.makedirs(output_dir, exist_ok=True)
doc = fitz.open(pdf_path)
for page_num in range(len(doc)):
page = doc[page_num]
images = page.get_images(full=True)
for img_idx, img in enumerate(images):
xref = img[0]
base = doc.extract_image(xref)
image_bytes = base["image"]
image_ext = base["ext"]
# è¿æ»¤è¿å°çå¾çï¼å¯è½æ¯å¾æ /è£
饰ï¼
if base["width"] > 100 and base["height"] > 100:
output_path = f"{output_dir}/page{page_num+1}_img{img_idx+1}.{image_ext}"
with open(output_path, "wb") as f:
f.write(image_bytes)
doc.close()
æ¹æ³ 2: æåç¢éå¾å½¢ï¼æ¨èï¼
éç¨äºè®ºæä¸ç»å¶çæ¶æå¾ãæµç¨å¾ãå¾è¡¨çç¢éå¾å½¢ï¼
import fitz
import os
def extract_vector_figures(pdf_path, output_dir, dpi=200, min_size=100):
"""
ä½¿ç¨ cluster_drawings() è¯å«ç¢éå¾å½¢åºåå¹¶æªå¾
Args:
pdf_path: PDF æä»¶è·¯å¾
output_dir: è¾åºç®å½
dpi: è¾åºå辨çï¼é»è®¤ 200ï¼å¯æé«å° 300 è·å¾æ´æ¸
æ°çå¾çï¼
min_size: æå°å°ºå¯¸éå¼ï¼è¿æ»¤è£
饰线æ¡ï¼é»è®¤ 100ptï¼
"""
os.makedirs(output_dir, exist_ok=True)
doc = fitz.open(pdf_path)
figures = []
for page_num in range(len(doc)):
page = doc[page_num]
# è¯å«ç¢éå¾å½¢çèç±»åºå
# x_tolerance/y_tolerance æ§å¶ç¸é»å
ç´ çåå¹¶è·ç¦»
try:
drawing_rects = page.cluster_drawings(
x_tolerance=3,
y_tolerance=3
)
except Exception:
# æäº PDF å¯è½ä¸æ¯æï¼è·³è¿
continue
for idx, rect in enumerate(drawing_rects):
# è¿æ»¤è¿å°çåºåï¼å¯è½æ¯çº¿æ¡/è£
饰ï¼
if rect.width < min_size or rect.height < min_size:
continue
# æ©å±è¾¹çï¼é¿å
è£å太紧
rect = rect + (-10, -10, 10, 10)
# ç¡®ä¿ä¸è¶
åºé¡µé¢è¾¹ç
rect = rect & page.rect
# é«åè¾¨çæªå¾
zoom = dpi / 72
mat = fitz.Matrix(zoom, zoom)
pix = page.get_pixmap(matrix=mat, clip=rect)
output_path = f"{output_dir}/page{page_num+1}_fig{idx+1}.png"
pix.save(output_path)
figures.append({
"page": page_num + 1,
"path": output_path,
"rect": rect
})
doc.close()
return figures
æ¹æ³ 3: æå¨æå®åºåæªå
å½èªå¨è¯å«ææä¸çæ³æ¶ï¼å¯æå¨æå®åæ ï¼
import fitz
def crop_figure(pdf_path, page_num, rect, output_path, dpi=200):
"""
ä» PDF æå®é¡µé¢è£åªç¹å®åºå
Args:
pdf_path: PDF è·¯å¾
page_num: 页ç ï¼ä» 1 å¼å§ï¼
rect: (x0, y0, x1, y1) åæ ï¼åä½ä¸ºç¹(pt)ï¼72pt = 1è±å¯¸
output_path: è¾åºå¾çè·¯å¾
dpi: å辨ç
"""
doc = fitz.open(pdf_path)
page = doc[page_num - 1]
clip = fitz.Rect(rect)
zoom = dpi / 72
mat = fitz.Matrix(zoom, zoom)
pix = page.get_pixmap(matrix=mat, clip=clip)
pix.save(output_path)
doc.close()
# 使ç¨ç¤ºä¾ï¼è£åªç¬¬ 2 页çæä¸ªåºå
# åæ å¯éè¿ PDF é
è¯»å¨æ¥çï¼æå
ç¨æ¹æ³ 2 è¯å«åå¾®è°
crop_figure(
"paper.pdf",
page_num=2,
rect=(50, 100, 550, 400), # å·¦ä¸è§(50,100) å° å³ä¸è§(550,400)
output_path="./images/fig1_architecture.png"
)
æºè½æåï¼ç»¼åæ¹æ¡ï¼
èªå¨å°è¯å¤ç§æ¹æ³ï¼æåææå¾è¡¨ï¼
import fitz
import os
def smart_extract_figures(pdf_path, output_dir, dpi=200):
"""
æºè½æå论æä¸çææå¾è¡¨
1. å
ä½¿ç¨ cluster_drawings è¯å«ç¢éå¾å½¢
2. åæååµå
¥å¼ä½å¾
3. èªå¨è¿æ»¤åå»é
"""
os.makedirs(output_dir, exist_ok=True)
doc = fitz.open(pdf_path)
results = {"vector": [], "embedded": []}
for page_num in range(len(doc)):
page = doc[page_num]
# 1. æåç¢éå¾å½¢
try:
rects = page.cluster_drawings(x_tolerance=3, y_tolerance=3)
for idx, rect in enumerate(rects):
if rect.width > 100 and rect.height > 100:
rect = (rect + (-10, -10, 10, 10)) & page.rect
zoom = dpi / 72
pix = page.get_pixmap(matrix=fitz.Matrix(zoom, zoom), clip=rect)
path = f"{output_dir}/p{page_num+1}_vec{idx+1}.png"
pix.save(path)
results["vector"].append(path)
except:
pass
# 2. æååµå
¥å¼å¾ç
for img_idx, img in enumerate(page.get_images(full=True)):
xref = img[0]
base = doc.extract_image(xref)
if base["width"] > 100 and base["height"] > 100:
path = f"{output_dir}/p{page_num+1}_img{img_idx+1}.{base['ext']}"
with open(path, "wb") as f:
f.write(base["image"])
results["embedded"].append(path)
doc.close()
print(f"æå宿: {len(results['vector'])} 个ç¢éå¾, {len(results['embedded'])} 个ä½å¾")
return results
# 使ç¨ç¤ºä¾
results = smart_extract_figures("paper.pdf", "./images/")
常è§é®é¢
Q: æåçå¾çå å«å¤ä¸ª Figure åå¨ä¸èµ·ï¼
è°å° x_tolerance å y_tolerance åæ°ï¼å¦ 1-2ï¼ï¼ä½¿èç±»æ´ä¸¥æ ¼ã
Q: åä¸ä¸ª Figure 被åæå¤åï¼
è°å¤§å®¹å·®åæ°ï¼å¦ 10-20ï¼ï¼ä½¿ç¸é»å ç´ åå¹¶ã
Q: æäº Figure 没æè¢«è¯å«ï¼
- å¯è½æ¯åµå
¥å¼å¾çï¼å°è¯
get_images()æ¹æ³ - ä½¿ç¨æå¨æå®åºåçæ¹æ³
Q: å¾ç模ç³ï¼
æé« dpi åæ°å° 300 ææ´é«ã
ä½¿ç¨æ¹å¼
åºæ¬ç¨æ³
请é
读è¿ç¯è®ºæï¼çæä¸ç¯ä¸ä¸çé
读ç¬è®°ï¼éååå¸å°ææ¯ç¤¾åºã
æå®éç¹
请é
读è¿ç¯è®ºæï¼éç¹åæï¼
1. HSTU 䏿 å Transformer çåºå«
2. Scaling Law å®éªç设置åç»è®º
3. å¨å·¥ä¸åºæ¯çè½å°å¯è¡æ§
对æ¯åæ
请对æ¯åæè¿ä¸¤ç¯è®ºæå¨ XXX é®é¢ä¸çä¸åè§£æ³ã
ææ¯è¦æ±
å 容宿´æ§
- æ ¸å¿å ¬å¼å¿ é¡»å å«ï¼é项解é
- å ³é®ç®æ³æä¼ªä»£ç å®ç°
- éè¦è¶ åæ°åè®ç»ç»èä¸çç¥
- æ¶èå®éªçå ³é®ç»è®ºè¦æç¼
æ·±åº¦è¦æ±
- åæ”ä¸ºä»ä¹è¿æ ·è®¾è®¡”
- ä¸ç¸å ³å·¥ä½å»ºç«èç³»
- æåºæ¹æ³çè¾¹çåå±é
å¯è¯»æ§
- å ç´è§åç»è
- 代ç åå ¬å¼é å
- é¿å ¬å¼åæ¥è§£é
ä¾èµé ç½®
# å¾çæå
pip install pymupdf
# PDF 转å¾çï¼å¯éï¼
pip install pdf2image
MCP é ç½®ï¼å¯éï¼
{
"mcpServers": {
"filesystem": {
"command": "npx",
"args": ["-y", "@anthropic/mcp-server-filesystem", "/path/to/papers"]
},
"notion": {
"command": "npx",
"args": ["-y", "@notionhq/notion-mcp-server"],
"env": {
"OPENAPI_MCP_HEADERS": "{\"Authorization\": \"Bearer YOUR_TOKEN\", \"Notion-Version\": \"2022-06-28\"}"
}
}
}
}