nlp-engineer

📁 404kidwiz/claude-supercode-skills 📅 Jan 24, 2026

总安装量

周安装量

#5787

全站排名

安装命令

npx skills add https://github.com/404kidwiz/claude-supercode-skills --skill nlp-engineer

Agent 安装分布

opencode 28

claude-code 27

gemini-cli 25

cursor 22

windsurf 18

Skill 文档

NLP Engineer

Purpose

Provides expertise in Natural Language Processing systems design and implementation. Specializes in text classification, named entity recognition, sentiment analysis, and integrating modern LLMs using frameworks like Hugging Face, spaCy, and LangChain.

When to Use

Building text classification systems
Implementing named entity recognition (NER)
Creating sentiment analysis pipelines
Fine-tuning transformer models
Designing LLM-powered features
Implementing text preprocessing pipelines
Building search and retrieval systems
Creating text generation applications

Quick Start

Invoke this skill when:

Building NLP pipelines (classification, NER, sentiment)
Fine-tuning transformer models
Implementing text preprocessing
Integrating LLMs for text tasks
Designing semantic search systems

Do NOT invoke when:

RAG architecture design â use /ai-engineer
LLM prompt optimization â use /prompt-engineer
ML model deployment â use /mlops-engineer
General data processing â use /data-engineer

Decision Framework

NLP Task Type?
âââ Classification
â   âââ Simple â Fine-tuned BERT/DistilBERT
â   âââ Zero-shot â LLM with prompting
âââ NER
â   âââ Standard entities â spaCy
â   âââ Custom entities â Fine-tuned model
âââ Generation
â   âââ LLM (GPT, Claude, Llama)
âââ Semantic Search
    âââ Embeddings + Vector store

Core Workflows

1. Text Classification Pipeline

Collect and label training data
Preprocess text (tokenization, cleaning)
Select base model (BERT, RoBERTa)
Fine-tune on labeled dataset
Evaluate with appropriate metrics
Deploy with inference optimization

2. NER System

Define entity types for domain
Create labeled training data
Choose framework (spaCy, Hugging Face)
Train custom NER model
Evaluate precision, recall, F1
Integrate with post-processing rules

3. Embedding-Based Search

Select embedding model (sentence-transformers)
Generate embeddings for corpus
Index in vector database
Implement query embedding
Add hybrid search (keyword + semantic)
Tune similarity thresholds

Best Practices

Start with pretrained models, fine-tune as needed
Use domain-specific preprocessing
Evaluate with task-appropriate metrics
Consider inference latency for production
Implement proper text cleaning pipelines
Use batching for efficient inference

Anti-Patterns

Anti-Pattern	Problem	Correct Approach
Training from scratch	Wastes data and compute	Fine-tune pretrained
No preprocessing	Noisy inputs hurt performance	Clean and normalize text
Wrong metrics	Misleading evaluation	Task-appropriate metrics
Ignoring class imbalance	Biased predictions	Balance or weight classes
Overfitting to eval set	Poor generalization	Proper train/val/test splits

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台