unsloth-fft
1
总安装量
1
周安装量
#51050
全站排名
安装命令
npx skills add https://github.com/cuba6112/skillfactory --skill unsloth-fft
Agent 安装分布
mcpjam
1
claude-code
1
junie
1
windsurf
1
zencoder
1
crush
1
Skill 文档
Overview
Full Fine-Tuning (FFT) in Unsloth allows for 100% exact weight updates, bypassing the low-rank approximations of LoRA. By utilizing Unsloth’s optimized gradient checkpointing, FFT can fit significantly larger batch sizes while ensuring total model modification.
When to Use
- When performing base model pre-training or continued pre-training on large datasets.
- When model-wide behaviors need modification that adapters (LoRA) cannot fully capture.
- When sufficient VRAM is available to handle full model gradients.
Decision Tree
- Do you need to modify 100% of the model weights?
- Yes: Proceed with FFT.
- No: Use [[unsloth-lora]].
- Is VRAM limited (e.g., < 24GB for a 7B model)?
- Yes: Enable
use_gradient_checkpointing = 'unsloth'andadamw_8bit. - No: Use standard BF16 and high batch sizes.
- Yes: Enable
Workflows
Initializing Full Fine-tuning
- Load the model using
FastLanguageModel.from_pretrainedwithload_in_4bit=Falseandload_in_8bit=False. - Pass
full_finetuning=Truein the initialization call to unlock all weight updates. - Apply the ‘unsloth’ gradient checkpointing via
FastLanguageModel.get_peft_model(model, use_gradient_checkpointing='unsloth').
FFT Memory Management
- Set
per_device_train_batch_sizeto 1 to accommodate full weight gradients. - Increase
gradient_accumulation_stepsto simulate effective batch sizes of 16-32. - Use the
adamw_8bitoptimizer to reduce memory consumption by optimizer states.
Non-Obvious Insights
- Unsloth’s gradient checkpointing implementation is critical for FFT, as it uses 30% less VRAM and allows for 2x larger batch sizes compared to standard Hugging Face implementations.
- Full fine-tuning is the preferred method for base model pre-training where the objective is knowledge injection rather than task instruction.
- FFT in Unsloth is strictly 100% exact; there is no numerical drift compared to standard training, only efficiency gains.
Evidence
- “To enable full fine-tuning (FFT), set full_finetuning = True.” Source
- “use_gradient_checkpointing = ‘unsloth’ uses 30% less VRAM, fits 2x larger batch sizes!” Source
Scripts
scripts/unsloth-fft_tool.py: Script to initialize a full fine-tuning session with Unsloth.scripts/unsloth-fft_tool.js: Node.js utility to generate FFT training parameters.
Dependencies
- unsloth
- torch
- transformers
References
- [[references/README.md]]