identifying

📁 florianbuetow/claude-code 📅 Today

总安装量

周安装量

安装命令

npx skills add https://github.com/florianbuetow/claude-code --skill identifying

Agent 安装分布

mcpjam 1

claude-code 1

replit 1

junie 1

windsurf 1

zencoder 1

Skill 文档

Identifiability Analysis (LINDDUN I)

Analyze source code for identifiability threats where individuals can be identified from supposedly anonymous data. Combinations of quasi-identifiers (zip code, birth date, gender) can uniquely identify individuals. Re-identification attacks on “anonymized” data are the primary concern.

Supported Flags

Read ../../shared/schemas/flags.md for full flag documentation. This skill supports all cross-cutting flags.

Flag	Identifiability-Specific Behavior
`--scope`	Default `changed`. Focuses on files handling user data, anonymization logic, data exports, analytics pipelines, and API responses.
`--depth quick`	Grep patterns only: scan for PII in logs, quasi-identifiers in exports, and missing anonymization.
`--depth standard`	Full code read, analyze data fields returned in APIs and stored in databases for re-identification risk.
`--depth deep`	Trace data flows from collection to storage to export. Assess quasi-identifier combinations across the system.
`--depth expert`	Deep + re-identification risk modeling: estimate k-anonymity violations and uniqueness of attribute combinations.
`--severity`	Filter output. Identifiability findings range from `low` (theoretical) to `critical` (direct PII exposure).
`--fix`	Generate anonymization, generalization, and suppression replacements.

Framework Context

LINDDUN I — Identifiability

Identifiability occurs when a person can be identified from data that is supposed to be anonymous or pseudonymous. Read ../../shared/frameworks/linddun.md for the full LINDDUN framework reference including re-identification attack patterns and regulatory definitions.

Privacy Property Violated: Anonymity / Pseudonymity

STRIDE Mapping: Information Disclosure (identifiability focuses specifically on re-identification of anonymized data rather than general data access)

Workflow

Step 1 — Determine Scope

Parse --scope flag (default: changed).
Resolve to a concrete file list.
Filter to relevant files: data models, API handlers, data export logic, analytics pipelines, logging configuration, database schemas, and anonymization utilities.
Prioritize files containing: user data structures, data export endpoints, log statements with user context, report generation, and data sharing logic.

Step 2 — Analyze for Identifiability Patterns

Read each scoped file and assess re-identification risk:

Identify direct identifiers: Find fields like name, email, phone, SSN, or national ID that should not appear in anonymous contexts.
Identify quasi-identifiers: Find combinations of fields (zip code, age, gender, job title) that together may uniquely identify individuals.
Check anonymization logic: Verify that anonymization techniques are actually applied and are sufficient (not just removing the name field).
Assess API responses: Check whether endpoints return more personal attributes than the consumer needs.
Examine logs and error messages: Look for PII appearing in log output, stack traces, or debug messages.

At --depth deep or --depth expert, model quasi-identifier combinations and estimate uniqueness across the population.

Step 3 — Report Findings

Output findings per ../../shared/schemas/findings.md. Each finding needs: IDENT-NNN id, title, severity (based on directness of identification and data sensitivity), location with snippet, description of what enables identification, impact (re-identification harm), fix (anonymization, generalization, or suppression), and CWE/LINDDUN references.

Analysis Checklist

Are direct identifiers (name, email, phone, SSN) present in data exports or analytics?
Do API responses return more user attributes than the consumer actually needs?
Are quasi-identifiers (zip code, birth date, gender) combined in any output?
Is anonymization actually implemented, or just assumed in comments?
Do logs contain IP addresses, user agents, or device identifiers alongside actions?
Can database queries return single-user results from “anonymous” tables?
Are email addresses or phone numbers used as primary keys or foreign keys?
Do error messages or stack traces expose personal data fields?

What to Look For

PII in log statements: Personal data written to application logs.
- Grep: log\.\w+\(.*email|logger\.\w+\(.*name|console\.log\(.*phone|print\(.*ssn
Email or phone as primary key: Using direct identifiers as database keys.
- Grep: PRIMARY KEY.*email|primary_key.*email|@Column.*email.*unique|findByEmail|findByPhone
IP address logging: Recording IP addresses without anonymization.
- Grep: req\.ip|request\.remote_addr|X-Forwarded-For|ip_address|ipAddress|getRemoteAddr
Over-fetched API responses: SELECT * or returning full user objects.
- Grep: SELECT \*.*FROM.*user|\.findAll\(|\.find\(\{\}\)|res\.json\(user\)|JSON\.stringify\(user
Insufficient anonymization: Removing names but keeping detailed attributes.
- Grep: anonymize|anonymise|deidentify|de_identify|pseudonymize|mask.*data
Quasi-identifier combinations: Multiple demographic fields in the same record.
- Grep: zip_code.*birth_date|zipCode.*gender|age.*location|dateOfBirth.*address
User agent collection: Storing full browser fingerprint strings.
- Grep: user-agent|userAgent|navigator\.userAgent|req\.headers\[.user-agent.\]
Data exports without scrubbing: Export endpoints that dump raw user data.
- Grep: export.*user|download.*report|csv.*user|toCSV|toJSON.*user

Regulatory Mapping

Regulation	Provision	Relevance
GDPR Recital 26	Identifiability test	Data is personal if any means can identify the subject
GDPR Art. 4(5)	Pseudonymization definition	Pseudonymized data is still personal data
GDPR Art. 25	Data protection by design	Anonymization must be effective by design
HIPAA Safe Harbor	18 identifier categories	All 18 must be removed for de-identification
CCPA 1798.140(h)	Deidentified information	Reasonably cannot be linked to a consumer
CCPA 1798.140(o)	Personal information	Includes information that identifies or could be linked

Output Format

Use finding ID prefix IDENT (e.g., IDENT-001, IDENT-002).

All findings follow the schema in ../../shared/schemas/findings.md with:

references.cwe: CWE-359 or CWE-200 as appropriate
references.owasp: A02:2021 (Cryptographic Failures — weak anonymization)
metadata.tool: "identifying"
metadata.framework: "linddun"
metadata.category: "I"

Summary table after all findings:

| Identifiability Pattern     | Critical | High | Medium | Low |
|-----------------------------|----------|------|--------|-----|
| Direct PII exposure         |          |      |        |     |
| PII in logs                 |          |      |        |     |
| Quasi-identifier combos     |          |      |        |     |
| Insufficient anonymization  |          |      |        |     |
| Over-fetched API responses  |          |      |        |     |
| IP / device tracking        |          |      |        |     |

Followed by: top 3 priorities, re-identification risk assessment, and overall assessment.

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台