infrastructure-health-check

📁 dawiddutoit/custom-claude 📅 Jan 26, 2026
4
总安装量
4
周安装量
#53895
全站排名
安装命令
npx skills add https://github.com/dawiddutoit/custom-claude --skill infrastructure-health-check

Agent 安装分布

mcpjam 4
neovate 4
gemini-cli 4
antigravity 4
windsurf 4
zencoder 4

Skill 文档

Works with docker-compose, Caddy, Pi-hole, and Cloudflare services.

Infrastructure Health Check

Comprehensive health verification for all network infrastructure services.

Quick Start

Run a full infrastructure health check:

cd /home/dawiddutoit/projects/network && ./scripts/health-check.sh

Or invoke this skill with: “Check infrastructure health” or “Is everything running?”

Table of Contents

  1. When to Use This Skill
  2. What This Skill Does
  3. Instructions
    • 3.1 Docker Container Status
    • 3.2 Caddy HTTPS Verification
    • 3.3 Pi-hole DNS Check
    • 3.4 Cloudflare Tunnel Status
    • 3.5 Webhook Endpoint Test
    • 3.6 SSL Certificate Validity
    • 3.7 Cloudflare Access Verification
    • 3.8 Generate Health Report
  4. Supporting Files
  5. Expected Outcomes
  6. Requirements
  7. Red Flags to Avoid

When to Use This Skill

Explicit Triggers:

  • “Check infrastructure health”
  • “Is everything running?”
  • “Check service status”
  • “Verify SSL certificates”
  • “Check tunnel connection”
  • “Diagnose network issues”

Implicit Triggers:

  • After restarting Docker services
  • After network configuration changes
  • Before deploying new services
  • When services seem unresponsive

Debugging Triggers:

  • “Why can’t I access pihole.temet.ai?”
  • “Services are not responding”
  • “SSL certificate errors”
  • “Authentication not working”

What This Skill Does

Performs 8 health checks and generates a comprehensive status report:

  1. Docker Containers – Verifies all containers are running and healthy
  2. Caddy HTTPS – Tests reverse proxy is serving HTTPS correctly
  3. Pi-hole DNS – Confirms DNS resolution is working
  4. Cloudflare Tunnel – Checks tunnel connectivity to Cloudflare
  5. Webhook Endpoint – Tests GitHub webhook accessibility
  6. SSL Certificates – Validates certificate validity and expiration
  7. Cloudflare Access – Verifies authentication is configured
  8. Overall Status – Aggregates results into pass/fail summary

Instructions

3.1 Docker Container Status

Check all containers are running:

cd /home/dawiddutoit/projects/network && docker compose ps --format "table {{.Name}}\t{{.Status}}\t{{.Health}}"

Expected containers:

Container Status Purpose
pihole Up (healthy) DNS + Ad blocking
caddy Up Reverse proxy
cloudflared Up Cloudflare Tunnel
webhook Up GitHub auto-deploy

Check for issues:

docker compose ps --filter "status=exited"
docker compose ps --filter "health=unhealthy"

3.2 Caddy HTTPS Verification

Test Caddy is serving HTTPS for each domain:

# Test Pi-hole
curl -sI https://pihole.temet.ai --max-time 5 | head -1

# Test Jaeger
curl -sI https://jaeger.temet.ai --max-time 5 | head -1

# Test Langfuse
curl -sI https://langfuse.temet.ai --max-time 5 | head -1

Expected: HTTP/2 200 or HTTP/2 302 (redirect to auth)

Check Caddy logs for errors:

docker logs caddy --tail 20 2>&1 | grep -iE "error|warn|fail"

3.3 Pi-hole DNS Check

Verify DNS resolution is working:

# Check Pi-hole can resolve local domains
docker exec pihole dig +short @127.0.0.1 pihole.temet.ai

# Check from host
dig @localhost pihole.temet.ai +short

# Check external DNS
dig @1.1.1.1 pihole.temet.ai +short

Expected: Returns IP address (192.168.68.135 for local, Cloudflare IP for external)

Check Pi-hole status:

docker exec pihole pihole status

3.4 Cloudflare Tunnel Status

Verify tunnel is connected:

# Check tunnel logs for connection status
docker logs cloudflared --tail 30 2>&1 | grep -iE "connected|registered|error|failed"

# Check tunnel process is running
docker exec cloudflared pgrep -f cloudflared

Expected output contains:

  • Registered tunnel connection – Tunnel is connected
  • Connection ... registered – Healthy connection

Warning signs:

  • connection failed – Network issues
  • error – Configuration problems
  • No recent log entries – Process may be stuck

3.5 Webhook Endpoint Test

Verify webhook is accessible:

# Test webhook health endpoint locally
curl -s http://localhost:9000/hooks/health

# Test via domain (if local)
curl -sI https://webhook.temet.ai/hooks/health --max-time 5 | head -1

Expected: OK response or HTTP/2 200

3.6 SSL Certificate Validity

Check certificate details for each domain:

for domain in pihole jaeger langfuse ha code; do
  echo "=== $domain.temet.ai ==="
  echo | openssl s_client -servername $domain.temet.ai \
    -connect $domain.temet.ai:443 2>/dev/null | \
    openssl x509 -noout -dates -issuer 2>/dev/null || echo "FAILED"
  echo
done

Expected output:

notBefore=<date>
notAfter=<date>
issuer=C = US, O = Let's Encrypt, CN = R11

Check certificate expiration:

# Get days until expiration
for domain in pihole jaeger langfuse; do
  echo -n "$domain.temet.ai: "
  echo | openssl s_client -servername $domain.temet.ai \
    -connect $domain.temet.ai:443 2>/dev/null | \
    openssl x509 -noout -checkend 2592000 && echo "OK (>30 days)" || echo "RENEW SOON"
done

3.7 Cloudflare Access Verification

Check Access is configured for protected services:

# Test that Access is intercepting (should redirect to login)
curl -sI https://pihole.temet.ai --max-time 5 | grep -E "^(HTTP|location|cf-)"

Expected for protected services:

  • HTTP/2 302 with redirect to cloudflareaccess.com login
  • OR HTTP/2 200 if already authenticated

Check Access configuration via API:

source /home/dawiddutoit/projects/network/.env
curl -s "https://api.cloudflare.com/client/v4/accounts/${CLOUDFLARE_ACCOUNT_ID}/access/apps" \
  -H "Authorization: Bearer ${CLOUDFLARE_ACCESS_API_TOKEN}" | \
  python3 -c "import sys,json; apps=json.load(sys.stdin).get('result',[]); print('\n'.join([f\"{a['name']}: {a['domain']}\" for a in apps]))"

3.8 Generate Health Report

Aggregate all checks into a summary report:

========================================
  Infrastructure Health Report
  Generated: $(date)
========================================

DOCKER CONTAINERS
-----------------
[PASS] pihole: running (healthy)
[PASS] caddy: running
[PASS] cloudflared: running
[PASS] webhook: running

HTTPS ENDPOINTS
---------------
[PASS] pihole.temet.ai: HTTP/2 200
[PASS] jaeger.temet.ai: HTTP/2 200
[PASS] langfuse.temet.ai: HTTP/2 200

DNS RESOLUTION
--------------
[PASS] Local DNS: 192.168.68.135
[PASS] External DNS: resolving via Cloudflare

CLOUDFLARE TUNNEL
-----------------
[PASS] Tunnel: connected

WEBHOOK
-------
[PASS] Endpoint: responding

SSL CERTIFICATES
----------------
[PASS] pihole.temet.ai: valid, expires in 67 days
[PASS] jaeger.temet.ai: valid, expires in 67 days
[PASS] langfuse.temet.ai: valid, expires in 67 days

CLOUDFLARE ACCESS
-----------------
[PASS] pihole.temet.ai: protected
[PASS] jaeger.temet.ai: protected
[PASS] langfuse.temet.ai: protected
[PASS] webhook.temet.ai: bypass (public)

========================================
  Overall Status: ALL CHECKS PASSED
========================================

Supporting Files

File Purpose
scripts/health-check.sh Automated health check script
references/troubleshooting.md Common issues and solutions
examples/examples.md Example health check outputs

Expected Outcomes

Success (All Checks Pass):

  • All 4 containers running
  • HTTPS endpoints responding with 200/302
  • DNS resolving correctly
  • Tunnel connected to Cloudflare
  • Webhook accessible
  • Certificates valid with >30 days remaining
  • Access configured for protected services

Partial Failure:

  • One or more containers down -> Restart with docker compose up -d
  • Certificate expiring soon -> Will auto-renew, monitor
  • Access misconfigured -> Run ./scripts/cf-access-setup.sh setup

Critical Failure:

  • Multiple containers down -> Check Docker daemon, disk space
  • Tunnel disconnected -> Check internet, tunnel token
  • DNS not resolving -> Check Pi-hole container, router DNS settings
  • All certificates invalid -> Check Cloudflare API token

Requirements

Environment:

  • Docker and Docker Compose running
  • Access to /home/dawiddutoit/projects/network
  • .env file with Cloudflare credentials
  • Network connectivity

Services:

  • pihole container
  • caddy container
  • cloudflared container
  • webhook container

Red Flags to Avoid

  • Do not ignore certificate expiration warnings
  • Do not skip DNS checks when troubleshooting access issues
  • Do not assume tunnel is connected without checking logs
  • Do not run health checks without network connectivity
  • Do not ignore container health status (unhealthy state)
  • Do not forget to check both local and external DNS resolution
  • Do not assume HTTP 302 is a failure (it’s auth redirect)

Notes

  • Health checks should be run from the Pi (192.168.68.135) for accurate local results
  • Remote access testing requires being outside the home network
  • Certificate auto-renewal happens 30 days before expiration
  • Cloudflare Tunnel reconnects automatically after brief disconnections
  • Pi-hole DNS may cache results for up to 5 minutes
  • Run ./scripts/health-check.sh for automated checking