funsloth-local

📁 chrisvoncsefalvay/funsloth 📅 Jan 28, 2026
1
总安装量
1
周安装量
#52489
全站排名
安装命令
npx skills add https://github.com/chrisvoncsefalvay/funsloth --skill funsloth-local

Agent 安装分布

windsurf 1
opencode 1
cursor 1
kiro-cli 1
codex 1
claude-code 1

Skill 文档

Local GPU Training Manager

Run Unsloth training on your local GPU.

Prerequisites Check

1. Verify CUDA

import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU: {torch.cuda.get_device_name(0)}")
print(f"VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")

If CUDA not available:

  • Check NVIDIA drivers: nvidia-smi
  • Check CUDA: nvcc --version
  • Reinstall PyTorch: pip install torch --index-url https://download.pytorch.org/whl/cu121

2. Check VRAM

See references/HARDWARE_GUIDE.md for requirements:

VRAM Recommended Setup
8GB 7B, 4-bit, batch=1, LoRA r=8
12GB 7B, 4-bit, batch=2, LoRA r=16
16GB 7-13B, 4-bit, batch=2, LoRA r=16-32
24GB 7-14B, 4-bit, batch=4, LoRA r=32

3. Check Dependencies

pip install unsloth torch transformers trl peft datasets accelerate bitsandbytes

Docker Option

Use the official Unsloth Docker image for a pre-configured environment (supports all GPUs including Blackwell/50-series):

docker run -d \
  -e JUPYTER_PASSWORD="unsloth" \
  -p 8888:8888 \
  -v $(pwd)/work:/workspace/work \
  --gpus all \
  unsloth/unsloth

Access Jupyter at http://localhost:8888. Example notebooks are in /workspace/unsloth-notebooks/.

Environment variables:

  • JUPYTER_PASSWORD – Jupyter auth (default: unsloth)
  • JUPYTER_PORT – Port (default: 8888)
  • USER_PASSWORD – User/sudo password (default: unsloth)

Run Training

Option 1: Notebook

jupyter notebook notebooks/sft_template.ipynb

Option 2: Script

# Edit configuration in script, then run
python scripts/train_sft.py

GPU Selection (Multi-GPU)

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"  # Use first GPU

Monitor Training

Terminal

# Watch GPU usage
watch -n 1 nvidia-smi

# Or use nvitop (more detailed)
pip install nvitop && nvitop

WandB (Optional)

export WANDB_API_KEY="your-key"
# Add report_to="wandb" in TrainingArguments

Troubleshooting

OOM Error

Try in order:

  1. Reduce batch_size (to 1)
  2. Increase gradient_accumulation
  3. Reduce max_seq_length
  4. Reduce LoRA rank
  5. torch.cuda.empty_cache()

Loss Not Decreasing

  1. Check learning rate (try higher or lower)
  2. Verify chat template matches model
  3. Inspect data format

Training Too Slow

  1. Enable bf16 if supported
  2. Use packing=True for short sequences
  3. Reduce logging_steps

See references/TROUBLESHOOTING.md for more solutions.

Resume from Checkpoint

TrainingArguments(
    resume_from_checkpoint=True,  # Auto-find latest
    # Or: resume_from_checkpoint="outputs/checkpoint-500"
)

Save Model

Training script automatically saves:

  • outputs/lora_adapter/ – LoRA weights
  • outputs/merged_16bit/ – Merged model (optional)

Test Inference

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained("outputs/lora_adapter")
FastLanguageModel.for_inference(model)

messages = [{"role": "user", "content": "Hello!"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
outputs = model.generate(inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))

Handoff

Offer funsloth-upload for Hub upload with model card.

Tips

  1. Close other GPU apps before training
  2. Monitor temps – keep under 85C
  3. Use UPS for long runs
  4. Save frequently with save_steps

Bundled Resources