data-backup

📁 delphine-l/claude_global 📅 2 days ago

总安装量

周安装量

#71826

全站排名

安装命令

npx skills add https://github.com/delphine-l/claude_global --skill data-backup

Agent 安装分布

gemini-cli 2

opencode 2

codebuddy 2

github-copilot 2

codex 2

kimi-cli 2

Skill 文档

Smart Backup System with Skill Integration

When to Use This Skill

Use this skill when:

Working on any project with files that change over time
Jupyter notebooks, data files (CSV/TSV), HackMD presentations, or mixed projects
Need intelligent cleanup before backup (clear outputs, remove debug code)
Want to track what changed when (data provenance)
Need professional backup workflow for collaboration or publication
Want context-aware backups that use other skills intelligently

The Problem

Long-running data enrichment projects risk:

Losing days of work from accidental overwrites
Unable to revert to previous data states
No documentation of what changed when
Running out of disk space from manual backups
Confusion about which version is current

Solution: Smart Two-Tier Backup System with Skill Integration

Core Features

Intelligent Detection – Automatically detects project type and files to backup
Skill Integration – Uses jupyter-notebook, hackmd, and other skills for pre-backup cleanup
Daily backups – Rolling 7-day window (auto-cleanup)
Milestone backups – Permanent, compressed (gzip ~80% reduction)
CHANGELOG – Automatic documentation of all changes
Session Integration – Prompts for backup when exiting Claude Code session

Smart Detection & Integration

The backup system automatically detects your project type and applies appropriate cleanup:

Jupyter Notebooks (uses jupyter-notebook skill):

Detects: *.ipynb files
Pre-backup cleanup:
- Clear all cell outputs
- Remove cells tagged ‘debug’ or ‘remove’
- Validate notebooks are syntactically correct
Result: Smaller backups, clean for sharing

HackMD/Presentations (uses hackmd skill):

Detects: *.md files with slideOptions: frontmatter
Pre-backup cleanup:
- Validate SVG elements (remove unsupported filters)
- Check slide separators are correct
- Verify YAML frontmatter
Result: Backup-ready presentations

Data Files (native handling):

Detects: *.csv, *.tsv, *.xlsx files
Pre-backup cleanup:
- Validate file integrity
- Check for corruption
Result: Safe data backups

Python Projects (uses managing-environments skill):

Detects: requirements.txt, environment.yml, venv/, .venv/
Pre-backup cleanup:
- Remove .pyc, __pycache__, .pytest_cache
- Clean build artifacts
- Include environment specifications
Result: Clean, reproducible backups

Mixed Projects:

Detects all of the above
Applies appropriate cleanup for each file type
Creates organized backup structure

Directory Structure

For data-only projects:

project/
âââ your_data_file.csv          # Main working file
âââ backup_project.sh           # Smart backup script
âââ backups/
    âââ daily/                  # Rolling 7-day backups
    â   âââ backup_2026-01-17.csv
    â   âââ backup_2026-01-18.csv
    â   âââ backup_2026-01-23.csv
    âââ milestones/             # Permanent compressed backups
    â   âââ milestone_2026-01-20_initial_enrichment.csv.gz
    â   âââ milestone_2026-01-23_recovered_accessions.csv.gz
    âââ CHANGELOG.md            # Auto-generated change log
    âââ README.md               # User documentation

For mixed projects (notebooks + data):

project/
âââ analysis.ipynb              # Jupyter notebooks
âââ data.csv                    # Data files
âââ backup_project.sh           # Smart backup script
âââ backups/
    âââ daily/                  # Rolling 7-day backups
    â   âââ backup_2026-01-17/
    â   â   âââ notebooks/
    â   â   â   âââ analysis.ipynb  # Cleaned (no outputs)
    â   â   âââ data/
    â   â       âââ data.csv
    â   âââ backup_2026-01-23/
    âââ milestones/             # Permanent compressed backups
    â   âââ milestone_2026-01-23_analysis_complete.tar.gz
    âââ CHANGELOG.md            # Auto-generated change log
    âââ README.md               # User documentation

Storage Efficiency

Daily backups: ~5.4 MB (7 days Ã 770KB)
Milestone backups: ~200KB each compressed (80% size reduction with gzip)
Total: <10 MB for complete project history
Auto-cleanup: Old daily backups delete after 7 days

Implementation

Quick Start with `/backup` Command

First time – Setup the backup system:

/backup

This will:

Detect your project type (notebooks, data files, presentations, etc.)
Set up appropriate backup scripts with smart cleanup
Create backup directory structure
Optionally configure automated backups

Daily usage – Create backups:

/backup                    # Daily backup with smart cleanup
/backup milestone "desc"   # Milestone backup
/backup list              # View all backups
/backup restore DATE      # Restore from backup

What Happens During Backup

Smart cleanup before backup:

Detects file types in your project
Applies skill-specific cleanup:
- Notebooks: Clear outputs, remove debug cells
- HackMD: Validate SVG, check formatting
- Python: Remove .pyc, __pycache__
- Data: Validate integrity
Creates organized backup with cleaned files
Updates CHANGELOG with what was backed up

Example output:

/backup

ð Detected: 3 notebooks, 2 data files

ð§¹ Pre-backup cleanup:
  â Cleared outputs from 3 notebooks
  â Removed 5 debug cells
  â Validated 2 data files

ð¾ Creating backup:
  â backups/daily/backup_2026-01-24/
    âââ notebooks/ (3 files, cleaned)
    âââ data/ (2 files)

â Backup complete: 2026-01-24
â Old backups cleaned (>7 days)
â CHANGELOG updated

Manual Script Usage (Alternative)

If you prefer to use the backup script directly:

./backup_project.sh                           # Daily backup
./backup_project.sh milestone "description"   # Milestone
./backup_project.sh list                      # List backups
./backup_project.sh restore 2026-01-23        # Restore

When to Create Milestones

After adding new data sources (GenomeScope, karyotypes, external APIs)
Before major data transformations or filtering
When completing analysis sections
Before submitting/publishing
Before sharing with collaborators
After recovering missing data

Key Features

Safety Features

â Never overwrites without asking – Prompts before overwriting existing backups
â Safety backup before restore – Creates backup of current state before any restore
â Automatic cleanup – Old daily backups auto-delete (configurable)
â Complete audit trail – CHANGELOG tracks everything
â Milestone protection – Important versions preserved forever (compressed)

CHANGELOG Tracking

The CHANGELOG.md automatically documents:

Date of each backup
Type (daily vs milestone)
Description of changes (for milestones)
Major modifications made to data

Example CHANGELOG:

## 2026-01-23
- **MILESTONE**: Recovered VGP accessions (backup created)
  - Added columns: `accession_recovered`, `accession_recovered_all`
  - Recovered 5 VGP accessions from NCBI
  - Searched AWS and NCBI for 17 species missing accessions
- Daily backup created at 2026-01-23 15:00:00

## 2026-01-22
- Enriched GenomeScope data for 21 species from AWS repository
- Added column: `genomescope_path` with direct links to summary files

Using `/backup` Command

The /backup command is available in all projects to set up and manage backups.

Setup mode (first run):

/backup

Detects project type automatically
Sets up appropriate backup scripts
Creates directory structure
Prompts for configuration (retention days, auto-backup)

Daily backup mode:

/backup                    # Quick daily backup

Milestone mode:

/backup milestone "description of changes"

Examples:

/backup milestone "added heterozygosity data"
/backup milestone "enriched with genomescope results"
/backup milestone "recovered missing accessions"

List and restore:

/backup list              # Show all available backups
/backup restore 2026-01-23 # Restore from specific date

Configuration: The backup script can be customized by editing backup_project.sh:

Change retention days (default: 7)
Modify backup directory location
Add custom cleanup rules

Benefits for Data Analysis

Data Provenance

CHANGELOG documents every modification
Clear audit trail for methods sections in papers
Know exactly what changed when

Confidence to Experiment

Easy rollback encourages trying different approaches
No fear of breaking working analyses
Can test aggressive transformations safely

Professional Workflow

Matches publication standards
Reviewers can verify data processing steps
Reproducible research practices

Collaboration-Ready

Team members can understand data history
New collaborators can see evolution of dataset
Clear documentation of enrichment process

Session Integration with `/safe-exit`

When you end a Claude Code session with /safe-exit, the system automatically:

Detects if backup system exists in the current project

Prompts for backup if system is configured:

ð¾ Backup system detected. Would you like to create a backup before exiting?

Options:
1. Daily backup (quick)
2. Milestone backup (with description)
3. Skip backup
4. Cancel exit

Choice [1-4]:

Performs cleanup and backup if requested
Prompts for Obsidian session summary (if obsidian skill is available):
- Asks for session theme
- Generates succinct summary of accomplishments, decisions, and remaining tasks
- Saves to project-specific subdirectory in Obsidian vault
Exits session cleanly

This ensures you never forget to backup AND document your work at the end of your session!

Example Workflow

Monday Morning

/backup                          # Daily backup with smart cleanup
# Work on notebooks and data enrichment all day
/backup milestone "added karyotype data for 50 new species"

Tuesday

/backup                          # Daily backup
# Continue work...

End of session

/safe-exit

ð¾ Backup system detected. Would you like to create a backup before exiting?
Choice: 1 (daily backup)

ð§¹ Cleaning 3 notebooks...
ð¾ Creating backup...
â Backup complete

ð Save session summary to Obsidian?
Save summary? (y/n): y

Brief theme/topic of today's work: karyotype data enrichment

âï¸ Generating session summary...
â Session summary saved to: project-name/2026-01-24_karyotype-data-enrichment.md

Session ended. Goodbye!

Friday (oops, made a mistake!)

/backup list                     # Check available backups
/backup restore 2026-01-23       # Restore from Wednesday

Advanced Usage

Custom Backup Script Template

The backup script can be customized for different file types or naming conventions:

#!/bin/bash
# Backup script for PROJECT_NAME

MAIN_TABLE="your_data_file.csv"
DAILY_DIR="backups/daily"
MILESTONE_DIR="backups/milestones"
CHANGELOG="backups/CHANGELOG.md"
DAYS_TO_KEEP=7

Viewing Compressed Milestones

# View without decompressing
gunzip -c milestone_file.csv.gz | less

# Decompress permanently
gunzip milestone_file.csv.gz

Multiple File Backups

For projects with multiple related data files, create separate backup scripts or modify the script to handle multiple files:

# Create separate backups
./backup_main_table.sh
./backup_metadata.sh

# Or modify script to backup multiple files
for file in *.csv; do
    cp "$file" "backups/daily/backup_${DATE}_$(basename $file)"
done

Token Efficiency

This backup system is token-efficient because:

No need to read large files just to create backups (uses cp)
Automated logging reduces manual documentation
Quick restore prevents wasted time re-implementing lost work
CHANGELOG serves as lightweight documentation

Real-World Example

VGP Phase 1 Enrichment Project:

Main file: 716 assemblies, 127 columns, ~770KB
Daily backups: 7 files = ~5.4 MB
Milestones: 3 compressed files = ~600KB
Total: ~6 MB for complete project history
Tracked: 2 weeks of data enrichment, 5 major milestones
Prevented: Multiple accidental overwrites during NCBI searches

Best Practices

Create daily backups at session start – Make it a habit
Milestone after every major change – Don’t rely on memory
Use descriptive milestone names – “added genomescope” not “updates”
Check CHANGELOG before sharing – Verify data provenance is clear
List backups periodically – Ensure auto-cleanup is working
Test restore once – Verify you know how to recover

Full Project Backups (vs Data-Only)

Problem

Data-only backups (single CSV file) don’t capture the complete project state. But backing up EVERYTHING creates bloated backups with old/irrelevant files.

Solution: Selective Full Project Backup

What to Include:

â Main analysis notebook (e.g., Analysis.ipynb)
â Primary data file (e.g., data.csv)
â Current figure generation scripts only (e.g., python_scripts/)
â Current figures only (e.g., figures/*.png – root level)
â Active documentation (.md files, excluding backups)
â Utility scripts (.sh files)

What to Exclude:

â Backup notebooks (*backup*.ipynb, *Copy*.ipynb)
â Exploratory scripts in scripts/ (only keep figure generators)
â Old figure versions (only current in figures/)
â Jupyter checkpoints (.ipynb_checkpoints/)
â Python cache (__pycache__/, *.pyc)

Implementation Pattern

Bash script with rsync + selective copy:

# Copy specific directory with exclusions
if [ -d "python_scripts" ]; then
    rsync -a --exclude='__pycache__' --exclude='*.pyc' \
        "python_scripts/" "${BACKUP_DIR}/python_scripts/"
fi

# Copy only current figures (root level PNG files)
if [ -d "figures" ]; then
    if ls figures/*.png 1> /dev/null 2>&1; then
        cp figures/*.png "${BACKUP_DIR}/figures/"
    fi
fi

# Copy docs, excluding backups
shopt -s nullglob
for file in *.md *.sh; do
    if [[ ! "$file" =~ (backup|BACKUP|Copy) ]]; then
        cp "$file" "${BACKUP_DIR}/"
    fi
done
shopt -u nullglob

Archive with tar:

# Daily: uncompressed (fast restore)
tar -cf "backup_${DATE}.tar" "${PROJECT_NAME}/"

# Milestone: compressed (space efficient)
tar -czf "milestone_${DATE}_${NAME}.tar.gz" "${PROJECT_NAME}/"

Backup Strategy

Type	Format	Retention	Purpose
Daily	`.tar` (uncompressed)	7 days	Quick recovery from recent mistakes
Milestone	`.tar.gz` (compressed)	Forever	Preserve major versions

Size Comparison

Real project example:

Data-only backup: 211 KB (compressed CSV)
Full project backup: 17 MB (notebook + data + scripts + 43 figures + docs)
7-day daily backups: ~120 MB total

When This Matters

Projects with evolving analyses where both code and data change
Jupyter notebook workflows with generated figures
Research projects needing reproducibility (code + data + outputs)

Path Verification for Backups

Before creating milestone backups, verify that files use relative paths.

Why this matters:

Backups may be restored to different locations
Notebooks shared from backups must work for others
Absolute paths break when directory structure changes

For complete path verification procedures and automated checking scripts, see the folder-organization skill.

Quick check:

# Check for absolute paths in notebooks
grep -l "/Users/" *.ipynb
grep -l "C:\\\\" *.ipynb

# Check in Python scripts
grep -l "/Users/" python_scripts/*.py

What to look for:

â /Users/yourname/project/data.csv (absolute)
â data/data.csv (relative)
â Image('/Users/you/figures/fig.png') (absolute)
â Image('figures/fig.png') (relative)

Best practice:

Run path check before milestone backups (see folder-organization skill)
Fix any absolute paths found
Test notebook runs from backup directory
Then create milestone backup

Troubleshooting

Backup script not found

# Check if backup system is set up
ls -l backup_project.sh

# Set up if needed
/backup

Disk space running low

# Check backup sizes
du -sh backups/

# Reduce retention days (edit backup_table.sh)
DAYS_TO_KEEP=3  # Instead of 7

# Manually clean old milestones (rare)
rm backups/milestones/milestone_old_file.csv.gz

CHANGELOG getting too large

# Archive old entries (manual)
tail -100 backups/CHANGELOG.md > backups/CHANGELOG_recent.md
mv backups/CHANGELOG.md backups/CHANGELOG_archive.md
mv backups/CHANGELOG_recent.md backups/CHANGELOG.md

Summary

Two-tier system: Daily rolling + permanent milestones
Storage efficient: Gzip compression (~80% reduction)
Auto-cleanup: 7-day rolling window for dailies
Complete audit trail: CHANGELOG tracks all changes
Safety first: Never overwrites without confirmation
Global installer: Use across all projects
Professional workflow: Publication-ready data provenance

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台