infrastructure-health-check
npx skills add https://github.com/dawiddutoit/custom-claude --skill infrastructure-health-check
Agent 安装分布
Skill 文档
Works with docker-compose, Caddy, Pi-hole, and Cloudflare services.
Infrastructure Health Check
Comprehensive health verification for all network infrastructure services.
Quick Start
Run a full infrastructure health check:
cd /home/dawiddutoit/projects/network && ./scripts/health-check.sh
Or invoke this skill with: “Check infrastructure health” or “Is everything running?”
Table of Contents
- When to Use This Skill
- What This Skill Does
- Instructions
- 3.1 Docker Container Status
- 3.2 Caddy HTTPS Verification
- 3.3 Pi-hole DNS Check
- 3.4 Cloudflare Tunnel Status
- 3.5 Webhook Endpoint Test
- 3.6 SSL Certificate Validity
- 3.7 Cloudflare Access Verification
- 3.8 Generate Health Report
- Supporting Files
- Expected Outcomes
- Requirements
- Red Flags to Avoid
When to Use This Skill
Explicit Triggers:
- “Check infrastructure health”
- “Is everything running?”
- “Check service status”
- “Verify SSL certificates”
- “Check tunnel connection”
- “Diagnose network issues”
Implicit Triggers:
- After restarting Docker services
- After network configuration changes
- Before deploying new services
- When services seem unresponsive
Debugging Triggers:
- “Why can’t I access pihole.temet.ai?”
- “Services are not responding”
- “SSL certificate errors”
- “Authentication not working”
What This Skill Does
Performs 8 health checks and generates a comprehensive status report:
- Docker Containers – Verifies all containers are running and healthy
- Caddy HTTPS – Tests reverse proxy is serving HTTPS correctly
- Pi-hole DNS – Confirms DNS resolution is working
- Cloudflare Tunnel – Checks tunnel connectivity to Cloudflare
- Webhook Endpoint – Tests GitHub webhook accessibility
- SSL Certificates – Validates certificate validity and expiration
- Cloudflare Access – Verifies authentication is configured
- Overall Status – Aggregates results into pass/fail summary
Instructions
3.1 Docker Container Status
Check all containers are running:
cd /home/dawiddutoit/projects/network && docker compose ps --format "table {{.Name}}\t{{.Status}}\t{{.Health}}"
Expected containers:
| Container | Status | Purpose |
|---|---|---|
| pihole | Up (healthy) | DNS + Ad blocking |
| caddy | Up | Reverse proxy |
| cloudflared | Up | Cloudflare Tunnel |
| webhook | Up | GitHub auto-deploy |
Check for issues:
docker compose ps --filter "status=exited"
docker compose ps --filter "health=unhealthy"
3.2 Caddy HTTPS Verification
Test Caddy is serving HTTPS for each domain:
# Test Pi-hole
curl -sI https://pihole.temet.ai --max-time 5 | head -1
# Test Jaeger
curl -sI https://jaeger.temet.ai --max-time 5 | head -1
# Test Langfuse
curl -sI https://langfuse.temet.ai --max-time 5 | head -1
Expected: HTTP/2 200 or HTTP/2 302 (redirect to auth)
Check Caddy logs for errors:
docker logs caddy --tail 20 2>&1 | grep -iE "error|warn|fail"
3.3 Pi-hole DNS Check
Verify DNS resolution is working:
# Check Pi-hole can resolve local domains
docker exec pihole dig +short @127.0.0.1 pihole.temet.ai
# Check from host
dig @localhost pihole.temet.ai +short
# Check external DNS
dig @1.1.1.1 pihole.temet.ai +short
Expected: Returns IP address (192.168.68.135 for local, Cloudflare IP for external)
Check Pi-hole status:
docker exec pihole pihole status
3.4 Cloudflare Tunnel Status
Verify tunnel is connected:
# Check tunnel logs for connection status
docker logs cloudflared --tail 30 2>&1 | grep -iE "connected|registered|error|failed"
# Check tunnel process is running
docker exec cloudflared pgrep -f cloudflared
Expected output contains:
Registered tunnel connection– Tunnel is connectedConnection ... registered– Healthy connection
Warning signs:
connection failed– Network issueserror– Configuration problems- No recent log entries – Process may be stuck
3.5 Webhook Endpoint Test
Verify webhook is accessible:
# Test webhook health endpoint locally
curl -s http://localhost:9000/hooks/health
# Test via domain (if local)
curl -sI https://webhook.temet.ai/hooks/health --max-time 5 | head -1
Expected: OK response or HTTP/2 200
3.6 SSL Certificate Validity
Check certificate details for each domain:
for domain in pihole jaeger langfuse ha code; do
echo "=== $domain.temet.ai ==="
echo | openssl s_client -servername $domain.temet.ai \
-connect $domain.temet.ai:443 2>/dev/null | \
openssl x509 -noout -dates -issuer 2>/dev/null || echo "FAILED"
echo
done
Expected output:
notBefore=<date>
notAfter=<date>
issuer=C = US, O = Let's Encrypt, CN = R11
Check certificate expiration:
# Get days until expiration
for domain in pihole jaeger langfuse; do
echo -n "$domain.temet.ai: "
echo | openssl s_client -servername $domain.temet.ai \
-connect $domain.temet.ai:443 2>/dev/null | \
openssl x509 -noout -checkend 2592000 && echo "OK (>30 days)" || echo "RENEW SOON"
done
3.7 Cloudflare Access Verification
Check Access is configured for protected services:
# Test that Access is intercepting (should redirect to login)
curl -sI https://pihole.temet.ai --max-time 5 | grep -E "^(HTTP|location|cf-)"
Expected for protected services:
HTTP/2 302with redirect to cloudflareaccess.com login- OR
HTTP/2 200if already authenticated
Check Access configuration via API:
source /home/dawiddutoit/projects/network/.env
curl -s "https://api.cloudflare.com/client/v4/accounts/${CLOUDFLARE_ACCOUNT_ID}/access/apps" \
-H "Authorization: Bearer ${CLOUDFLARE_ACCESS_API_TOKEN}" | \
python3 -c "import sys,json; apps=json.load(sys.stdin).get('result',[]); print('\n'.join([f\"{a['name']}: {a['domain']}\" for a in apps]))"
3.8 Generate Health Report
Aggregate all checks into a summary report:
========================================
Infrastructure Health Report
Generated: $(date)
========================================
DOCKER CONTAINERS
-----------------
[PASS] pihole: running (healthy)
[PASS] caddy: running
[PASS] cloudflared: running
[PASS] webhook: running
HTTPS ENDPOINTS
---------------
[PASS] pihole.temet.ai: HTTP/2 200
[PASS] jaeger.temet.ai: HTTP/2 200
[PASS] langfuse.temet.ai: HTTP/2 200
DNS RESOLUTION
--------------
[PASS] Local DNS: 192.168.68.135
[PASS] External DNS: resolving via Cloudflare
CLOUDFLARE TUNNEL
-----------------
[PASS] Tunnel: connected
WEBHOOK
-------
[PASS] Endpoint: responding
SSL CERTIFICATES
----------------
[PASS] pihole.temet.ai: valid, expires in 67 days
[PASS] jaeger.temet.ai: valid, expires in 67 days
[PASS] langfuse.temet.ai: valid, expires in 67 days
CLOUDFLARE ACCESS
-----------------
[PASS] pihole.temet.ai: protected
[PASS] jaeger.temet.ai: protected
[PASS] langfuse.temet.ai: protected
[PASS] webhook.temet.ai: bypass (public)
========================================
Overall Status: ALL CHECKS PASSED
========================================
Supporting Files
| File | Purpose |
|---|---|
scripts/health-check.sh |
Automated health check script |
references/troubleshooting.md |
Common issues and solutions |
examples/examples.md |
Example health check outputs |
Expected Outcomes
Success (All Checks Pass):
- All 4 containers running
- HTTPS endpoints responding with 200/302
- DNS resolving correctly
- Tunnel connected to Cloudflare
- Webhook accessible
- Certificates valid with >30 days remaining
- Access configured for protected services
Partial Failure:
- One or more containers down -> Restart with
docker compose up -d - Certificate expiring soon -> Will auto-renew, monitor
- Access misconfigured -> Run
./scripts/cf-access-setup.sh setup
Critical Failure:
- Multiple containers down -> Check Docker daemon, disk space
- Tunnel disconnected -> Check internet, tunnel token
- DNS not resolving -> Check Pi-hole container, router DNS settings
- All certificates invalid -> Check Cloudflare API token
Requirements
Environment:
- Docker and Docker Compose running
- Access to
/home/dawiddutoit/projects/network .envfile with Cloudflare credentials- Network connectivity
Services:
- pihole container
- caddy container
- cloudflared container
- webhook container
Red Flags to Avoid
- Do not ignore certificate expiration warnings
- Do not skip DNS checks when troubleshooting access issues
- Do not assume tunnel is connected without checking logs
- Do not run health checks without network connectivity
- Do not ignore container health status (unhealthy state)
- Do not forget to check both local and external DNS resolution
- Do not assume HTTP 302 is a failure (it’s auth redirect)
Notes
- Health checks should be run from the Pi (192.168.68.135) for accurate local results
- Remote access testing requires being outside the home network
- Certificate auto-renewal happens 30 days before expiration
- Cloudflare Tunnel reconnects automatically after brief disconnections
- Pi-hole DNS may cache results for up to 5 minutes
- Run
./scripts/health-check.shfor automated checking