identifying
npx skills add https://github.com/florianbuetow/claude-code --skill identifying
Agent 安装分布
Skill 文档
Identifiability Analysis (LINDDUN I)
Analyze source code for identifiability threats where individuals can be identified from supposedly anonymous data. Combinations of quasi-identifiers (zip code, birth date, gender) can uniquely identify individuals. Re-identification attacks on “anonymized” data are the primary concern.
Supported Flags
Read ../../shared/schemas/flags.md for full flag
documentation. This skill supports all cross-cutting flags.
| Flag | Identifiability-Specific Behavior |
|---|---|
--scope |
Default changed. Focuses on files handling user data, anonymization logic, data exports, analytics pipelines, and API responses. |
--depth quick |
Grep patterns only: scan for PII in logs, quasi-identifiers in exports, and missing anonymization. |
--depth standard |
Full code read, analyze data fields returned in APIs and stored in databases for re-identification risk. |
--depth deep |
Trace data flows from collection to storage to export. Assess quasi-identifier combinations across the system. |
--depth expert |
Deep + re-identification risk modeling: estimate k-anonymity violations and uniqueness of attribute combinations. |
--severity |
Filter output. Identifiability findings range from low (theoretical) to critical (direct PII exposure). |
--fix |
Generate anonymization, generalization, and suppression replacements. |
Framework Context
LINDDUN I — Identifiability
Identifiability occurs when a person can be identified from data that is supposed
to be anonymous or pseudonymous. Read
../../shared/frameworks/linddun.md for the
full LINDDUN framework reference including re-identification attack patterns and
regulatory definitions.
Privacy Property Violated: Anonymity / Pseudonymity
STRIDE Mapping: Information Disclosure (identifiability focuses specifically on re-identification of anonymized data rather than general data access)
Workflow
Step 1 — Determine Scope
- Parse
--scopeflag (default:changed). - Resolve to a concrete file list.
- Filter to relevant files: data models, API handlers, data export logic, analytics pipelines, logging configuration, database schemas, and anonymization utilities.
- Prioritize files containing: user data structures, data export endpoints, log statements with user context, report generation, and data sharing logic.
Step 2 — Analyze for Identifiability Patterns
Read each scoped file and assess re-identification risk:
- Identify direct identifiers: Find fields like name, email, phone, SSN, or national ID that should not appear in anonymous contexts.
- Identify quasi-identifiers: Find combinations of fields (zip code, age, gender, job title) that together may uniquely identify individuals.
- Check anonymization logic: Verify that anonymization techniques are actually applied and are sufficient (not just removing the name field).
- Assess API responses: Check whether endpoints return more personal attributes than the consumer needs.
- Examine logs and error messages: Look for PII appearing in log output, stack traces, or debug messages.
At --depth deep or --depth expert, model quasi-identifier combinations and
estimate uniqueness across the population.
Step 3 — Report Findings
Output findings per ../../shared/schemas/findings.md.
Each finding needs: IDENT-NNN id, title, severity (based on directness of
identification and data sensitivity), location with snippet, description of what
enables identification, impact (re-identification harm), fix (anonymization,
generalization, or suppression), and CWE/LINDDUN references.
Analysis Checklist
- Are direct identifiers (name, email, phone, SSN) present in data exports or analytics?
- Do API responses return more user attributes than the consumer actually needs?
- Are quasi-identifiers (zip code, birth date, gender) combined in any output?
- Is anonymization actually implemented, or just assumed in comments?
- Do logs contain IP addresses, user agents, or device identifiers alongside actions?
- Can database queries return single-user results from “anonymous” tables?
- Are email addresses or phone numbers used as primary keys or foreign keys?
- Do error messages or stack traces expose personal data fields?
What to Look For
- PII in log statements: Personal data written to application logs.
- Grep:
log\.\w+\(.*email|logger\.\w+\(.*name|console\.log\(.*phone|print\(.*ssn
- Grep:
- Email or phone as primary key: Using direct identifiers as database keys.
- Grep:
PRIMARY KEY.*email|primary_key.*email|@Column.*email.*unique|findByEmail|findByPhone
- Grep:
- IP address logging: Recording IP addresses without anonymization.
- Grep:
req\.ip|request\.remote_addr|X-Forwarded-For|ip_address|ipAddress|getRemoteAddr
- Grep:
- Over-fetched API responses: SELECT * or returning full user objects.
- Grep:
SELECT \*.*FROM.*user|\.findAll\(|\.find\(\{\}\)|res\.json\(user\)|JSON\.stringify\(user
- Grep:
- Insufficient anonymization: Removing names but keeping detailed attributes.
- Grep:
anonymize|anonymise|deidentify|de_identify|pseudonymize|mask.*data
- Grep:
- Quasi-identifier combinations: Multiple demographic fields in the same record.
- Grep:
zip_code.*birth_date|zipCode.*gender|age.*location|dateOfBirth.*address
- Grep:
- User agent collection: Storing full browser fingerprint strings.
- Grep:
user-agent|userAgent|navigator\.userAgent|req\.headers\[.user-agent.\]
- Grep:
- Data exports without scrubbing: Export endpoints that dump raw user data.
- Grep:
export.*user|download.*report|csv.*user|toCSV|toJSON.*user
- Grep:
Regulatory Mapping
| Regulation | Provision | Relevance |
|---|---|---|
| GDPR Recital 26 | Identifiability test | Data is personal if any means can identify the subject |
| GDPR Art. 4(5) | Pseudonymization definition | Pseudonymized data is still personal data |
| GDPR Art. 25 | Data protection by design | Anonymization must be effective by design |
| HIPAA Safe Harbor | 18 identifier categories | All 18 must be removed for de-identification |
| CCPA 1798.140(h) | Deidentified information | Reasonably cannot be linked to a consumer |
| CCPA 1798.140(o) | Personal information | Includes information that identifies or could be linked |
Output Format
Use finding ID prefix IDENT (e.g., IDENT-001, IDENT-002).
All findings follow the schema in
../../shared/schemas/findings.md with:
references.cwe:CWE-359orCWE-200as appropriatereferences.owasp:A02:2021(Cryptographic Failures — weak anonymization)metadata.tool:"identifying"metadata.framework:"linddun"metadata.category:"I"
Summary table after all findings:
| Identifiability Pattern | Critical | High | Medium | Low |
|-----------------------------|----------|------|--------|-----|
| Direct PII exposure | | | | |
| PII in logs | | | | |
| Quasi-identifier combos | | | | |
| Insufficient anonymization | | | | |
| Over-fetched API responses | | | | |
| IP / device tracking | | | | |
Followed by: top 3 priorities, re-identification risk assessment, and overall assessment.