code-dedup

📁 xiaodong-wu/code-dedup-skills 📅 Jan 23, 2026
8
总安装量
7
周安装量
#33913
全站排名
安装命令
npx skills add https://github.com/xiaodong-wu/code-dedup-skills --skill code-dedup

Agent 安装分布

claude-code 5
windsurf 4
trae 4
opencode 4
cursor 3
codex 2

Skill 文档

Code Dedup

An AI-friendly code analysis platform that detects duplicate code, identifies dead code, and suggests structural improvements.

Use Cases

  • Code Review: Automatically find duplicate code patterns during PR reviews
  • Refactoring: Identify unused functions and variables before cleanup
  • Quality Gates: Integrate into CI/CD to enforce code quality standards
  • Technical Debt: Track and prioritize code quality improvements
  • Code Audit: Analyze large codebases for maintainability issues

Installation

npm install

Usage

Analyze a Project

# Run all analyses
node src/cli.js analyze ./src

# Run specific analysis
node src/cli.js analyze ./src --analyses dedup,deadCode

# Generate JSON report for AI processing
node src/cli.js analyze ./src --format json --output report.json

Programmatic API

import { analyzeCode, checkDuplicates, checkDeadCode, checkStructure } from './index.js';

// Comprehensive analysis
const result = await analyzeCode('./src', {
  analyses: ['dedup', 'deadCode', 'structure'],
  format: 'json'
});

console.log(result.summary);
// { status: 'success', filesAnalyzed: 42, issuesFound: 15, durationMs: 234 }

// Individual checks
const duplicates = await checkDuplicates('./src', {
  minSimilarity: 0.85
});

const deadCode = await checkDeadCode('./src', {
  ignoreExports: true,
  ignoreTestFiles: true
});

const structure = await checkStructure('./src', {
  maxComplexity: 10,
  maxFunctionLength: 50
});

Output Format

JSON Output

{
  "summary": {
    "status": "success",
    "filesAnalyzed": 42,
    "issuesFound": 15,
    "durationMs": 234
  },
  "results": {
    "duplicates": [
      {
        "file1": "src/auth.js",
        "file2": "src/user.js",
        "similarity": 0.92,
        "type": "approximate",
        "lines": 45
      }
    ],
    "deadCode": [
      {
        "symbol": "unusedFunction",
        "file": "src/utils.js",
        "line": 23,
        "type": "function"
      }
    ],
    "structure": {
      "complexFunctions": [
        {
          "name": "processData",
          "file": "src/api.js",
          "complexity": 15,
          "line": 45
        }
      ]
    }
  },
  "recommendations": [
    {
      "type": "refactor",
      "priority": "high",
      "description": "Extract duplicate validation logic",
      "files": ["src/auth.js", "src/user.js"]
    }
  ]
}

Configuration

Duplicate Detection

  • minLines: Minimum lines for duplicate detection (default: 5)
  • minSimilarity: Similarity threshold 0-1 (default: 0.85)
  • ignoreWhitespace: Ignore whitespace differences (default: true)
  • ignoreComments: Ignore comments (default: true)

Dead Code Detection

  • ignoreExports: Ignore exported symbols (default: true)
  • ignoreTestFiles: Ignore test files (default: true)
  • testPatterns: Custom test file patterns (default: [‘/*.test.js’, ‘/*.spec.js’])
  • whitelistPatterns: Symbol patterns to ignore (default: [])

Structure Analysis

  • maxComplexity: Max cyclomatic complexity (default: 10)
  • maxFunctionLength: Max function lines (default: 50)
  • maxParameters: Max function parameters (default: 5)
  • maxNestingDepth: Max nesting depth (default: 4)

CLI Commands

# Analyze with custom options
node src/cli.js analyze ./src \
  --min-similarity 0.9 \
  --max-complexity 8 \
  --format json \
  --output report.json

# Quick duplicate check
node src/cli.js check-dup ./src

# Quick dead code check
node src/cli.js check-dead ./src

# Quick structure check
node src/cli.js check-struct ./src

# Format code
node src/cli.js format ./src --output formatted.md

Features

O(n) Duplicate Detection

Uses MinHash LSH algorithm for fast approximate duplicate detection:

  • Exact matching: MD5 hashing for identical code blocks
  • Approximate matching: MinHash + LSH for similar code (Type-2 clones)
  • Scalable: Analyzes 1000+ files in seconds, not minutes

AST-Based Dead Code Detection

Static analysis to find unused code:

  • Detects unused functions, variables, and imports
  • Builds reference graph to track symbol usage
  • Configurable whitelist patterns
  • Smart export/test file filtering

Structure Metrics

Code complexity and maintainability analysis:

  • Cyclomatic complexity: Decision points per function
  • Function length: Lines of code per function
  • Nesting depth: Maximum indentation levels
  • Parameter count: Number of function parameters

Performance

Project Size Files Duration
Small <100 <2s
Medium 100-1K <10s
Large >1K <30s

Supported Languages

  • JavaScript (ES6+)
  • TypeScript
  • JSX/TSX
  • Python (experimental)

Examples

Find High-Similarity Duplicates

const duplicates = await checkDuplicates('./src', {
  minSimilarity: 0.95  // Only near-duplicates
});

duplicates.forEach(dup => {
  console.log(`${dup.file1} ↔ ${dup.file2}: ${dup.similarity * 100}%`);
});

Find Unused Exports

const deadCode = await checkDeadCode('./src', {
  ignoreExports: false  // Include exports in analysis
});

const unusedExports = deadCode.filter(item => item.exported);
console.log(`Found ${unusedExports.length} unused exports`);

Enforce Complexity Limits

const structure = await checkStructure('./src', {
  maxComplexity: 8,
  maxFunctionLength: 30
});

const violations = structure.complexFunctions.filter(
  fn => fn.complexity > 8 || fn.length > 30
);

if (violations.length > 0) {
  console.error('Complexity violations found:', violations);
  process.exit(1);
}

Integration

CI/CD Pipeline

# .github/workflows/code-quality.yml
- name: Check code quality
  run: |
    node src/cli.js analyze ./src --format json --output report.json
    if [ $(jq '.issuesFound' report.json) -gt 0 ]; then
      echo "Code quality issues found"
      exit 1
    fi

Pre-commit Hook

// package.json
{
  "husky": {
    "hooks": {
      "pre-commit": "node src/cli.js analyze ./src --format json"
    }
  }
}

Tips

  1. Start with defaults: The default settings work well for most projects
  2. Adjust thresholds: Tighten similarity thresholds to reduce false positives
  3. Use whitelist: Add framework patterns to avoid false positives
  4. Regular analysis: Run weekly to track technical debt
  5. Combine with formatters: Use with Prettier/ESLint for best results

Limitations

  • Static analysis: May miss dynamically accessed code
  • Best effort: Dead code detection has ~90% accuracy
  • Language support: Currently optimized for JavaScript/TypeScript
  • Large files: Files >10MB are skipped for performance