infrastructure-monitoring-setup

📁 dawiddutoit/custom-claude 📅 Jan 26, 2026

总安装量

周安装量

#50919

全站排名

安装命令

npx skills add https://github.com/dawiddutoit/custom-claude --skill infrastructure-monitoring-setup

Agent 安装分布

mcpjam 4

neovate 4

gemini-cli 4

antigravity 4

windsurf 4

zencoder 4

Skill 文档

Works with infrastructure-monitor.sh script, systemd timer, ntfy.sh push notifications,

Infrastructure Monitoring Setup Skill

Complete setup and configuration of automated infrastructure monitoring with mobile push notifications and auto-recovery capabilities.

Quick Start

Quick setup for monitoring (5 minutes):

# 1. Create unique ntfy topic
TOPIC="infra-$(openssl rand -hex 8)"
echo "Your topic: $TOPIC"

# 2. Add to .env
echo "ALERT_ENABLED=true" >> /home/dawiddutoit/projects/network/.env
echo "NTFY_SERVER=https://ntfy.sh" >> /home/dawiddutoit/projects/network/.env
echo "NTFY_TOPIC=$TOPIC" >> /home/dawiddutoit/projects/network/.env
echo "AUTO_RECOVER=true" >> /home/dawiddutoit/projects/network/.env

# 3. Install systemd service
sudo cp /home/dawiddutoit/projects/network/systemd/infrastructure-monitor.* /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now infrastructure-monitor.timer

# 4. Test
/home/dawiddutoit/projects/network/scripts/infrastructure-monitor.sh

Then install ntfy app on phone and subscribe to your topic.

When to Use This Skill
What This Skill Does
Instructions
- 3.1 Install ntfy Mobile App
- 3.2 Configure Monitoring in .env
- 3.3 Install Systemd Timer
- 3.4 Test Monitoring and Alerts
- 3.5 Configure Home Assistant Integration (Optional)
- 3.6 Verify Auto-Recovery
- 3.7 View Monitoring Logs
Supporting Files
Expected Outcomes
Requirements
Red Flags to Avoid

When to Use This Skill

Explicit Triggers:

“Setup monitoring”
“Configure mobile alerts”
“Enable auto-recovery”
“Setup ntfy notifications”
“Configure Home Assistant alerts”

Implicit Triggers:

Want to be notified of infrastructure failures
Need automated recovery for common issues
Infrastructure has been down without detection
Want proactive monitoring

Debugging Triggers:

“Why am I not getting alerts?”
“Is monitoring working?”
“How to test notifications?”

What This Skill Does

Mobile Alerts – Configures ntfy.sh push notifications to phone
Auto-Recovery – Enables automatic fixes for common failures
HA Integration – Optional Home Assistant notification integration
Systemd Service – Installs timer to run monitoring every 5 minutes
Tests Setup – Verifies notifications and recovery work
Logs Access – Shows how to view monitoring logs
Troubleshooting – Diagnoses alert delivery issues

Instructions

3.1 Install ntfy Mobile App

Install app:

Subscribe to topic:

Open ntfy app
Tap “+” to add subscription
Enter topic: infra-YOUR-RANDOM-ID (you’ll generate this in step 3.2)
Server: https://ntfy.sh
Tap “Subscribe”

Note: You need the topic ID from step 3.2 before subscribing. Come back here after generating it.

3.2 Configure Monitoring in .env

Generate unique topic ID:

TOPIC="infra-$(openssl rand -hex 8)"
echo "Your unique topic: $TOPIC"

Save this topic ID – you’ll use it in the ntfy app.

Add monitoring configuration to .env:

# Navigate to project directory
cd /home/dawiddutoit/projects/network

# Add monitoring variables
cat >> .env << EOF

# Monitoring & Alerts
ALERT_ENABLED=true
NTFY_SERVER=https://ntfy.sh
NTFY_TOPIC=$TOPIC
AUTO_RECOVER=true
EOF

Verify configuration:

grep -A4 "Monitoring & Alerts" /home/dawiddutoit/projects/network/.env

Expected:

# Monitoring & Alerts
ALERT_ENABLED=true
NTFY_SERVER=https://ntfy.sh
NTFY_TOPIC=infra-a3f7d92b4c8e1f56
AUTO_RECOVER=true

Configuration options:

Variable	Purpose	Default
`ALERT_ENABLED`	Enable mobile push notifications	`false`
`NTFY_SERVER`	ntfy.sh server URL	`https://ntfy.sh`
`NTFY_TOPIC`	Unique topic for your alerts	None (required)
`AUTO_RECOVER`	Enable automatic recovery	`true`

To disable auto-recovery but keep alerts:

# Edit .env
nano /home/dawiddutoit/projects/network/.env
# Change: AUTO_RECOVER=false

3.3 Install Systemd Timer

Install systemd service and timer to run monitoring every 5 minutes:

# Copy service files
sudo cp /home/dawiddutoit/projects/network/systemd/infrastructure-monitor.service /etc/systemd/system/
sudo cp /home/dawiddutoit/projects/network/systemd/infrastructure-monitor.timer /etc/systemd/system/

# Reload systemd
sudo systemctl daemon-reload

# Enable and start timer
sudo systemctl enable infrastructure-monitor.timer
sudo systemctl start infrastructure-monitor.timer

Verify timer is active:

# Check timer status
systemctl list-timers infrastructure-monitor.timer

# Check service status
sudo systemctl status infrastructure-monitor.timer

Expected:

â infrastructure-monitor.timer - Run infrastructure monitoring every 5 minutes
     Loaded: loaded (/etc/systemd/system/infrastructure-monitor.timer; enabled)
     Active: active (waiting) since...

Timer configuration:

Runs every 5 minutes
Starts 1 minute after boot
Persistent (survives reboots)

3.4 Test Monitoring and Alerts

Test monitoring script:

# Run monitoring manually
/home/dawiddutoit/projects/network/scripts/infrastructure-monitor.sh

Expected output shows:

Docker containers checked
Tunnel connectivity tested
Service health verified
Network interface status
Alert sent to ntfy topic

Test alert delivery:

Within 30 seconds, you should receive push notification on phone with infrastructure status.

If no notification received:

Check ntfy topic subscription:

# Test sending to topic directly
curl -d "Test from infrastructure monitoring" https://ntfy.sh/$TOPIC

If direct curl works but monitoring doesn’t:

Check ALERT_ENABLED=true in .env
Verify NTFY_TOPIC matches app subscription
Check script has network access

3.5 Configure Home Assistant Integration (Optional)

Why use Home Assistant integration:

Centralized home automation alerts
Can trigger automations based on infrastructure status
Redundancy with ntfy.sh
Integration with existing HA notifications

Prerequisites:

Home Assistant running and accessible
HA mobile app installed (for notify.mobile_app_* service)

Step 1: Create Long-Lived Access Token

Go to Home Assistant: http://192.168.68.123:8123
Click your profile (bottom left)
Scroll to “Long-Lived Access Tokens”
Click “Create Token”
Name: “Infrastructure Monitoring”
Copy token (shown only once)

Step 2: Find Notification Service Name

In Home Assistant: Developer Tools â Services
Filter by “notify”
Find your mobile app service: notify.mobile_app_your_phone

Step 3: Add to .env

# Edit .env
nano /home/dawiddutoit/projects/network/.env

# Add HA configuration
HA_NOTIFICATIONS_ENABLED=true
HA_BASE_URL=http://192.168.68.123:8123
HA_ACCESS_TOKEN=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
HA_NOTIFY_SERVICE=notify.mobile_app_your_phone

Step 4: Test HA Notifications

# Run monitoring (should send to both ntfy and HA)
/home/dawiddutoit/projects/network/scripts/infrastructure-monitor.sh

Check you receive notification in Home Assistant companion app.

Troubleshooting HA notifications:

# Test HA API access
curl -H "Authorization: Bearer YOUR_TOKEN" \
  http://192.168.68.123:8123/api/

# Test notification service
curl -X POST \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"message": "Test from infrastructure monitoring"}' \
  http://192.168.68.123:8123/api/services/notify/mobile_app_your_phone

3.6 Verify Auto-Recovery

Monitor logs to see auto-recovery in action:

# View live monitoring logs
sudo journalctl -u infrastructure-monitor.service -f

# Or check persistent log
tail -f /var/log/infrastructure-monitor.log

Auto-recovery capabilities:

Issue	Detection	Recovery Action
Stuck cloudflared	No registrations in 10 min	Restart cloudflared container
Docker network isolation	Ping fails between containers	Recreate bridge network
Inactive Ethernet	WiFi used instead of eth0	Activate Ethernet connection
Service failures	HTTP health checks fail	Restart affected containers

Test auto-recovery:

# Simulate stuck tunnel
docker stop cloudflared

# Wait 5 minutes (next monitoring run)
# Check logs - should show tunnel restarted

# Verify tunnel recovered
docker ps | grep cloudflared
docker logs cloudflared | grep "Registered tunnel"

3.7 View Monitoring Logs

View systemd service logs:

# Live monitoring logs
sudo journalctl -u infrastructure-monitor.service -f

# Last 50 lines
sudo journalctl -u infrastructure-monitor.service -n 50

# Logs from today
sudo journalctl -u infrastructure-monitor.service --since today

# Logs with timestamps
sudo journalctl -u infrastructure-monitor.service -o short-iso

View persistent log file:

# Live tail
tail -f /var/log/infrastructure-monitor.log

# Last 100 lines
tail -100 /var/log/infrastructure-monitor.log

# Search for errors
grep -i error /var/log/infrastructure-monitor.log

# Search for recoveries
grep -i "recovered" /var/log/infrastructure-monitor.log

Check timer schedule:

# Show next run time
systemctl list-timers infrastructure-monitor.timer

# Show timer configuration
systemctl cat infrastructure-monitor.timer

Monitoring controls:

# Stop monitoring temporarily
sudo systemctl stop infrastructure-monitor.timer

# Restart monitoring
sudo systemctl start infrastructure-monitor.timer

# Disable monitoring (survives reboot)
sudo systemctl disable infrastructure-monitor.timer

# Re-enable monitoring
sudo systemctl enable infrastructure-monitor.timer

Supporting Files

File	Purpose
`references/reference.md`	Monitoring architecture, recovery strategies, ntfy.sh details
`examples/examples.md`	Example configurations, alert formats, log outputs
`scripts/test-notifications.sh`	Test script for alert delivery

Expected Outcomes

Success:

ntfy app receives push notifications
Monitoring runs every 5 minutes
Auto-recovery fixes common failures within 5 minutes
Logs show monitoring activity
Home Assistant notifications working (if configured)

Partial Success:

Monitoring runs but alerts not received (check topic subscription)
Alerts received but auto-recovery disabled (set AUTO_RECOVER=true)

Failure Indicators:

No notifications received after 10 minutes
Timer not running (check systemctl status)
Script fails with errors (check logs)
HA notifications not working (check token/service name)

Requirements

Infrastructure server running Linux with systemd
Mobile device with ntfy app installed
Internet connectivity for ntfy.sh
.env file with monitoring configuration
Home Assistant (optional, for HA integration)

Red Flags to Avoid

Do not use public/guessable ntfy topic (security risk)
Do not share ntfy topic publicly (anyone can subscribe)
Do not disable monitoring without alternative alerting
Do not ignore persistent alerts (investigate root cause)
Do not run monitoring script too frequently (causes noise)
Do not commit .env with ntfy topic to git (privacy)
Do not use AUTO_RECOVER=false without manual monitoring

Notes

Monitoring checks run every 5 minutes via systemd timer
ntfy.sh is free and doesn’t require account
Topic ID should be random and private (security by obscurity)
Auto-recovery attempts fixes before alerting as critical
Alert levels: ð´ Critical (manual intervention), â ï¸ Warning (recovery in progress)
HA integration is optional and works alongside ntfy.sh
Logs persist across reboots at /var/log/infrastructure-monitor.log
Maximum detection time: 5 minutes (timer interval)
Monitoring survives server reboots (systemd timer enabled)
Use infrastructure-health-check skill for manual on-demand checks

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台