pueue-job-orchestration

📁 terrylica/cc-skills 📅 6 days ago

总安装量

周安装量

#14682

全站排名

安装命令

npx skills add https://github.com/terrylica/cc-skills --skill pueue-job-orchestration

Agent 安装分布

augment 2

gemini-cli 2

antigravity 2

command-code 2

claude-code 2

github-copilot 2

Skill 文档

Pueue Job Orchestration

Manage long-running tasks on BigBlack/LittleBlack GPU workstations using Pueue job queue.

Overview

Pueue is a Rust CLI tool for managing shell command queues. It provides:

Daemon persistence – Survives SSH disconnects, crashes, reboots
Disk-backed queue – Auto-resumes after any failure
Group-based parallelism – Control concurrent jobs per group
Easy failure recovery – Restart failed jobs with one command

When to Use This Skill

Use this skill when the user mentions:

Trigger	Example
Running tasks on BigBlack/LittleBlack	“Run this on bigblack”
Long-running data processing	“Populate the cache for all symbols”
Batch/parallel operations	“Process these 70 jobs”
SSH remote execution	“Execute this overnight on the GPU server”
Cache population	“Fill the ClickHouse cache”

Quick Reference

Check Status

# Local
pueue status

# Remote (BigBlack)
ssh bigblack "~/.local/bin/pueue status"

Queue a Job

# Local
pueue add -- python long_running_script.py

# Remote (BigBlack)
ssh bigblack "~/.local/bin/pueue add -- cd ~/project && uv run python script.py"

# With group (for parallelism control)
pueue add --group p1 --label "BTCUSDT@1000" -- python populate.py --symbol BTCUSDT

Monitor Jobs

pueue follow <id>         # Watch job output in real-time
pueue log <id>            # View completed job output
pueue log <id> --full     # Full output (not truncated)

Manage Jobs

pueue restart <id>        # Restart failed job
pueue restart --all-failed # Restart ALL failed jobs
pueue kill <id>           # Kill running job
pueue clean               # Remove completed jobs from list
pueue reset               # Clear all jobs (use with caution)

Host Configuration

Host	Location	Parallelism Groups
BigBlack	`~/.local/bin/pueue`	p1 (4), p2 (2), p3 (3), p4 (1)
LittleBlack	`~/.local/bin/pueue`	default (2)
Local (macOS)	`/opt/homebrew/bin/pueue`	default

Workflows

1. Queue Single Remote Job

# Step 1: Verify daemon is running
ssh bigblack "~/.local/bin/pueue status"

# Step 2: Queue the job
ssh bigblack "~/.local/bin/pueue add --label 'my-job' -- cd ~/project && uv run python script.py"

# Step 3: Monitor progress
ssh bigblack "~/.local/bin/pueue follow <id>"

2. Batch Job Submission (Multiple Symbols)

For rangebar cache population or similar batch operations:

# Use the pueue-populate.sh script
ssh bigblack "cd ~/rangebar-py && ./scripts/pueue-populate.sh setup"   # One-time
ssh bigblack "cd ~/rangebar-py && ./scripts/pueue-populate.sh phase1"  # Queue Phase 1
ssh bigblack "cd ~/rangebar-py && ./scripts/pueue-populate.sh status"  # Check progress

3. Configure Parallelism Groups

# Create groups with different parallelism limits
pueue group add fast      # Create 'fast' group
pueue parallel 4 --group fast  # Allow 4 parallel jobs

pueue group add slow
pueue parallel 1 --group slow  # Sequential execution

# Queue jobs to specific groups
pueue add --group fast -- echo "fast job"
pueue add --group slow -- echo "slow job"

4. Handle Failed Jobs

# Check what failed
pueue status | grep Failed

# View error output
pueue log <id>

# Restart specific job
pueue restart <id>

# Restart all failed jobs
pueue restart --all-failed

Installation

macOS (Local)

brew install pueue
pueued -d  # Start daemon

Linux (BigBlack/LittleBlack)

# Download from GitHub releases (see https://github.com/Nukesor/pueue/releases for latest)
curl -sSL https://raw.githubusercontent.com/terrylica/rangebar-py/main/scripts/setup-pueue-linux.sh | bash

# Or manually:
# SSoT-OK: Version from GitHub releases page
PUEUE_VERSION="v4.0.2"
curl -sSL "https://github.com/Nukesor/pueue/releases/download/${PUEUE_VERSION}/pueue-x86_64-unknown-linux-musl" -o ~/.local/bin/pueue
curl -sSL "https://github.com/Nukesor/pueue/releases/download/${PUEUE_VERSION}/pueued-x86_64-unknown-linux-musl" -o ~/.local/bin/pueued
chmod +x ~/.local/bin/pueue ~/.local/bin/pueued

# Start daemon
~/.local/bin/pueued -d

Systemd Auto-Start (Linux)

mkdir -p ~/.config/systemd/user
cat > ~/.config/systemd/user/pueued.service << 'EOF'
[Unit]
Description=Pueue Daemon
After=network.target

[Service]
ExecStart=%h/.local/bin/pueued -v
Restart=on-failure

[Install]
WantedBy=default.target
EOF

systemctl --user daemon-reload
systemctl --user enable --now pueued

Integration with rangebar-py

The rangebar-py project has Pueue integration scripts:

Script	Purpose
`scripts/pueue-populate.sh`	Queue cache population jobs with group-based parallelism
`scripts/setup-pueue-linux.sh`	Install Pueue on Linux servers
`scripts/populate_full_cache.py`	Python script for individual symbol/threshold jobs

Phase-Based Execution

# Phase 1: 1000 dbps (fast, 4 parallel)
./scripts/pueue-populate.sh phase1

# Phase 2: 250 dbps (moderate, 2 parallel)
./scripts/pueue-populate.sh phase2

# Phase 3: 500, 750 dbps (3 parallel)
./scripts/pueue-populate.sh phase3

# Phase 4: 100 dbps (resource intensive, 1 at a time)
./scripts/pueue-populate.sh phase4

Troubleshooting

Issue	Cause	Solution
`pueue: command not found`	Not in PATH	Use full path: `~/.local/bin/pueue`
`Connection refused`	Daemon not running	Start with `pueued -d`
Jobs stuck in Queued	Group paused or at limit	Check `pueue status`, `pueue start`
SSH disconnect kills jobs	Not using Pueue	Queue via Pueue instead of direct SSH
Job fails immediately	Wrong working directory	Use `cd /path && command` pattern

Hook: itp-hooks/posttooluse-reminder.ts – Reminds to use Pueue for detected long-running commands
Reference: Pueue GitHub
Issue: rangebar-py#77 – Original implementation

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台