funsloth-runpod

📁 chrisvoncsefalvay/funsloth 📅 Jan 28, 2026

总安装量

周安装量

#46869

全站排名

安装命令

npx skills add https://github.com/chrisvoncsefalvay/funsloth --skill funsloth-runpod

Agent 安装分布

windsurf 1

opencode 1

cursor 1

kiro-cli 1

codex 1

claude-code 1

Skill 文档

RunPod Training Manager

Run Unsloth training on RunPod GPU instances.

Prerequisites

RunPod API Key: echo $RUNPOD_API_KEY (get at runpod.io/console/user/settings)
RunPod SDK: pip install runpod
Training notebook/script: From funsloth-train

Workflow

1. Select GPU

GPU	VRAM	Cost	Best For
RTX 3090	24GB	~$0.35/hr	Budget 7-14B
RTX 4090	24GB	~$0.55/hr	Fast 7-14B
A100 40GB	40GB	~$1.50/hr	14-34B
A100 80GB	80GB	~$2.00/hr	70B
H100	80GB	~$3.50/hr	Fastest

RunPod typically has better prices than HF Jobs.

2. Choose Deployment

Pod (Recommended): Persistent, SSH access, network storage
Serverless: Pay per second, complex setup (better for inference)

3. Configure Network Volume (Recommended)

import runpod
volume = runpod.create_network_volume(name="funsloth-training", size_gb=50, region="US")

Allows: resume training, download checkpoints, share between pods.

4. Launch Pod

Use the official Unsloth Docker image for a pre-configured environment:

import runpod

pod = runpod.create_pod(
    name="funsloth-training",
    image_name="unsloth/unsloth",  # Official image, supports all GPUs incl. Blackwell
    gpu_type_id="{gpu_type}",
    volume_in_gb=50,
    network_volume_id="{volume_id}",
    env={
        "HF_TOKEN": "{token}",
        "WANDB_API_KEY": "{key}",
        "JUPYTER_PASSWORD": "unsloth",
    },
    ports="8888/http,22/tcp",
)

The Unsloth image includes Jupyter Lab (port 8888) and example notebooks in /workspace/unsloth-notebooks/.

5. Upload and Run

# SSH into pod
ssh root@{pod_ip}

# Upload script
scp train.py root@{pod_ip}:/workspace/

# Run training (use tmux for persistence)
tmux new -s training
cd /workspace && python train.py
# Ctrl+B, D to detach

6. Monitor

# SSH monitoring
tail -f /workspace/training.log
nvidia-smi -l 1

# Dashboard
https://runpod.io/console/pods/{pod_id}

7. Retrieve Checkpoints

# Save to network volume
cp -r /workspace/outputs /runpod-volume/

# Download via SCP
scp -r root@{pod_ip}:/workspace/outputs ./

# Or push to HF Hub from pod

8. Stop Pod

runpod.stop_pod(pod_id)    # Can resume later
runpod.terminate_pod(pod_id)  # Deletes pod, keeps volume

9. Handoff

Offer funsloth-upload for Hub upload with model card.

Best Practices

Always use network volumes – pod storage is ephemeral
Use spot instances for lower costs (risk of preemption)
Set up SSH keys before creating pods
Stop pods when not training – charges per minute
Save checkpoints frequently with save_steps

Error Handling

Error	Resolution
Pod creation failed	Try different GPU type or region
SSH refused	Wait 1-2 min, check IP
Out of disk	Increase volume or clean up
Volume not mounting	Check same region as pod

Bundled Resources

scripts/train_sft.py – Training script template
scripts/estimate_cost.py – Cost estimation
references/PLATFORM_COMPARISON.md – RunPod vs alternatives
references/TROUBLESHOOTING.md – Common issues

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台