code-dedup
8
总安装量
7
周安装量
#33913
全站排名
安装命令
npx skills add https://github.com/xiaodong-wu/code-dedup-skills --skill code-dedup
Agent 安装分布
claude-code
5
windsurf
4
trae
4
opencode
4
cursor
3
codex
2
Skill 文档
Code Dedup
An AI-friendly code analysis platform that detects duplicate code, identifies dead code, and suggests structural improvements.
Use Cases
- Code Review: Automatically find duplicate code patterns during PR reviews
- Refactoring: Identify unused functions and variables before cleanup
- Quality Gates: Integrate into CI/CD to enforce code quality standards
- Technical Debt: Track and prioritize code quality improvements
- Code Audit: Analyze large codebases for maintainability issues
Installation
npm install
Usage
Analyze a Project
# Run all analyses
node src/cli.js analyze ./src
# Run specific analysis
node src/cli.js analyze ./src --analyses dedup,deadCode
# Generate JSON report for AI processing
node src/cli.js analyze ./src --format json --output report.json
Programmatic API
import { analyzeCode, checkDuplicates, checkDeadCode, checkStructure } from './index.js';
// Comprehensive analysis
const result = await analyzeCode('./src', {
analyses: ['dedup', 'deadCode', 'structure'],
format: 'json'
});
console.log(result.summary);
// { status: 'success', filesAnalyzed: 42, issuesFound: 15, durationMs: 234 }
// Individual checks
const duplicates = await checkDuplicates('./src', {
minSimilarity: 0.85
});
const deadCode = await checkDeadCode('./src', {
ignoreExports: true,
ignoreTestFiles: true
});
const structure = await checkStructure('./src', {
maxComplexity: 10,
maxFunctionLength: 50
});
Output Format
JSON Output
{
"summary": {
"status": "success",
"filesAnalyzed": 42,
"issuesFound": 15,
"durationMs": 234
},
"results": {
"duplicates": [
{
"file1": "src/auth.js",
"file2": "src/user.js",
"similarity": 0.92,
"type": "approximate",
"lines": 45
}
],
"deadCode": [
{
"symbol": "unusedFunction",
"file": "src/utils.js",
"line": 23,
"type": "function"
}
],
"structure": {
"complexFunctions": [
{
"name": "processData",
"file": "src/api.js",
"complexity": 15,
"line": 45
}
]
}
},
"recommendations": [
{
"type": "refactor",
"priority": "high",
"description": "Extract duplicate validation logic",
"files": ["src/auth.js", "src/user.js"]
}
]
}
Configuration
Duplicate Detection
minLines: Minimum lines for duplicate detection (default: 5)minSimilarity: Similarity threshold 0-1 (default: 0.85)ignoreWhitespace: Ignore whitespace differences (default: true)ignoreComments: Ignore comments (default: true)
Dead Code Detection
ignoreExports: Ignore exported symbols (default: true)ignoreTestFiles: Ignore test files (default: true)testPatterns: Custom test file patterns (default: [‘/*.test.js’, ‘/*.spec.js’])whitelistPatterns: Symbol patterns to ignore (default: [])
Structure Analysis
maxComplexity: Max cyclomatic complexity (default: 10)maxFunctionLength: Max function lines (default: 50)maxParameters: Max function parameters (default: 5)maxNestingDepth: Max nesting depth (default: 4)
CLI Commands
# Analyze with custom options
node src/cli.js analyze ./src \
--min-similarity 0.9 \
--max-complexity 8 \
--format json \
--output report.json
# Quick duplicate check
node src/cli.js check-dup ./src
# Quick dead code check
node src/cli.js check-dead ./src
# Quick structure check
node src/cli.js check-struct ./src
# Format code
node src/cli.js format ./src --output formatted.md
Features
O(n) Duplicate Detection
Uses MinHash LSH algorithm for fast approximate duplicate detection:
- Exact matching: MD5 hashing for identical code blocks
- Approximate matching: MinHash + LSH for similar code (Type-2 clones)
- Scalable: Analyzes 1000+ files in seconds, not minutes
AST-Based Dead Code Detection
Static analysis to find unused code:
- Detects unused functions, variables, and imports
- Builds reference graph to track symbol usage
- Configurable whitelist patterns
- Smart export/test file filtering
Structure Metrics
Code complexity and maintainability analysis:
- Cyclomatic complexity: Decision points per function
- Function length: Lines of code per function
- Nesting depth: Maximum indentation levels
- Parameter count: Number of function parameters
Performance
| Project Size | Files | Duration |
|---|---|---|
| Small | <100 | <2s |
| Medium | 100-1K | <10s |
| Large | >1K | <30s |
Supported Languages
- JavaScript (ES6+)
- TypeScript
- JSX/TSX
- Python (experimental)
Examples
Find High-Similarity Duplicates
const duplicates = await checkDuplicates('./src', {
minSimilarity: 0.95 // Only near-duplicates
});
duplicates.forEach(dup => {
console.log(`${dup.file1} â ${dup.file2}: ${dup.similarity * 100}%`);
});
Find Unused Exports
const deadCode = await checkDeadCode('./src', {
ignoreExports: false // Include exports in analysis
});
const unusedExports = deadCode.filter(item => item.exported);
console.log(`Found ${unusedExports.length} unused exports`);
Enforce Complexity Limits
const structure = await checkStructure('./src', {
maxComplexity: 8,
maxFunctionLength: 30
});
const violations = structure.complexFunctions.filter(
fn => fn.complexity > 8 || fn.length > 30
);
if (violations.length > 0) {
console.error('Complexity violations found:', violations);
process.exit(1);
}
Integration
CI/CD Pipeline
# .github/workflows/code-quality.yml
- name: Check code quality
run: |
node src/cli.js analyze ./src --format json --output report.json
if [ $(jq '.issuesFound' report.json) -gt 0 ]; then
echo "Code quality issues found"
exit 1
fi
Pre-commit Hook
// package.json
{
"husky": {
"hooks": {
"pre-commit": "node src/cli.js analyze ./src --format json"
}
}
}
Tips
- Start with defaults: The default settings work well for most projects
- Adjust thresholds: Tighten similarity thresholds to reduce false positives
- Use whitelist: Add framework patterns to avoid false positives
- Regular analysis: Run weekly to track technical debt
- Combine with formatters: Use with Prettier/ESLint for best results
Limitations
- Static analysis: May miss dynamically accessed code
- Best effort: Dead code detection has ~90% accuracy
- Language support: Currently optimized for JavaScript/TypeScript
- Large files: Files >10MB are skipped for performance