funsloth-runpod

📁 chrisvoncsefalvay/funsloth 📅 Jan 28, 2026
1
总安装量
1
周安装量
#46869
全站排名
安装命令
npx skills add https://github.com/chrisvoncsefalvay/funsloth --skill funsloth-runpod

Agent 安装分布

windsurf 1
opencode 1
cursor 1
kiro-cli 1
codex 1
claude-code 1

Skill 文档

RunPod Training Manager

Run Unsloth training on RunPod GPU instances.

Prerequisites

  1. RunPod API Key: echo $RUNPOD_API_KEY (get at runpod.io/console/user/settings)
  2. RunPod SDK: pip install runpod
  3. Training notebook/script: From funsloth-train

Workflow

1. Select GPU

GPU VRAM Cost Best For
RTX 3090 24GB ~$0.35/hr Budget 7-14B
RTX 4090 24GB ~$0.55/hr Fast 7-14B
A100 40GB 40GB ~$1.50/hr 14-34B
A100 80GB 80GB ~$2.00/hr 70B
H100 80GB ~$3.50/hr Fastest

RunPod typically has better prices than HF Jobs.

2. Choose Deployment

  • Pod (Recommended): Persistent, SSH access, network storage
  • Serverless: Pay per second, complex setup (better for inference)

3. Configure Network Volume (Recommended)

import runpod
volume = runpod.create_network_volume(name="funsloth-training", size_gb=50, region="US")

Allows: resume training, download checkpoints, share between pods.

4. Launch Pod

Use the official Unsloth Docker image for a pre-configured environment:

import runpod

pod = runpod.create_pod(
    name="funsloth-training",
    image_name="unsloth/unsloth",  # Official image, supports all GPUs incl. Blackwell
    gpu_type_id="{gpu_type}",
    volume_in_gb=50,
    network_volume_id="{volume_id}",
    env={
        "HF_TOKEN": "{token}",
        "WANDB_API_KEY": "{key}",
        "JUPYTER_PASSWORD": "unsloth",
    },
    ports="8888/http,22/tcp",
)

The Unsloth image includes Jupyter Lab (port 8888) and example notebooks in /workspace/unsloth-notebooks/.

5. Upload and Run

# SSH into pod
ssh root@{pod_ip}

# Upload script
scp train.py root@{pod_ip}:/workspace/

# Run training (use tmux for persistence)
tmux new -s training
cd /workspace && python train.py
# Ctrl+B, D to detach

6. Monitor

# SSH monitoring
tail -f /workspace/training.log
nvidia-smi -l 1

# Dashboard
https://runpod.io/console/pods/{pod_id}

7. Retrieve Checkpoints

# Save to network volume
cp -r /workspace/outputs /runpod-volume/

# Download via SCP
scp -r root@{pod_ip}:/workspace/outputs ./

# Or push to HF Hub from pod

8. Stop Pod

runpod.stop_pod(pod_id)    # Can resume later
runpod.terminate_pod(pod_id)  # Deletes pod, keeps volume

9. Handoff

Offer funsloth-upload for Hub upload with model card.

Best Practices

  1. Always use network volumes – pod storage is ephemeral
  2. Use spot instances for lower costs (risk of preemption)
  3. Set up SSH keys before creating pods
  4. Stop pods when not training – charges per minute
  5. Save checkpoints frequently with save_steps

Error Handling

Error Resolution
Pod creation failed Try different GPU type or region
SSH refused Wait 1-2 min, check IP
Out of disk Increase volume or clean up
Volume not mounting Check same region as pod

Bundled Resources