model-fine-tuning

📁 gitwalter/cursor-agent-factory 📅 2 days ago

总安装量

周安装量

#50832

全站排名

安装命令

npx skills add https://github.com/gitwalter/cursor-agent-factory --skill model-fine-tuning

Agent 安装分布

kilo 1

amp 1

opencode 1

kimi-cli 1

droid 1

codex 1

Skill 文档

Model Fine-Tuning Skill

Fine-tune large language models efficiently using LoRA, QLoRA, and PEFT techniques with Hugging Face transformers and TRL.

When to Use

Fine-tuning LLMs for domain-specific tasks
Adapting models to specific datasets or use cases
Reducing memory requirements for fine-tuning
Training on consumer hardware with limited VRAM
Creating task-specific model adapters
Quantizing models for deployment

Prerequisites

pip install transformers peft accelerate bitsandbytes trl datasets
# For GPU support
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Process

Step 1: Dataset Preparation

Format and tokenize your dataset for fine-tuning:

from datasets import load_dataset, Dataset
from transformers import AutoTokenizer

# Load or create dataset
dataset = load_dataset("json", data_files="training_data.jsonl")

# Initialize tokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")
tokenizer.pad_token = tokenizer.eos_token

def format_prompt(example):
    """Format instruction-following prompt."""
    return {
        "text": f"### Instruction:\n{example['instruction']}\n\n### Response:\n{example['response']}"
    }

def tokenize_function(examples):
    """Tokenize dataset."""
    return tokenizer(
        examples["text"],
        truncation=True,
        max_length=512,
        padding="max_length"
    )

# Format and tokenize
dataset = dataset.map(format_prompt)
tokenized_dataset = dataset.map(
    tokenize_function,
    batched=True,
    remove_columns=dataset["train"].column_names
)

Step 2: QLoRA Configuration

Set up QLoRA with 4-bit quantization and LoRA adapters:

from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

# 4-bit quantization config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True
)

# Load model with quantization
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-hf",
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True
)

# Prepare model for k-bit training
model = prepare_model_for_kbit_training(model)

# LoRA configuration
lora_config = LoraConfig(
    r=16,  # Rank
    lora_alpha=32,  # Scaling parameter
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

# Apply LoRA
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

Step 3: Training Configuration

Configure TRL SFTTrainer for supervised fine-tuning:

from transformers import TrainingArguments, Trainer
from trl import SFTTrainer
import torch

training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    learning_rate=2e-4,
    fp16=True,
    logging_steps=10,
    save_steps=500,
    save_total_limit=2,
    optim="paged_adamw_8bit",
    lr_scheduler_type="cosine",
    warmup_steps=100,
    report_to="tensorboard"
)

trainer = SFTTrainer(
    model=model,
    train_dataset=tokenized_dataset["train"],
    peft_config=lora_config,
    tokenizer=tokenizer,
    args=training_args,
    packing=False,
    max_seq_length=512,
    dataset_text_field="text"
)

Step 4: Training and Evaluation

Train the model with evaluation during training:

# Train model
trainer.train()

# Evaluate
eval_results = trainer.evaluate()
print(f"Perplexity: {eval_results['eval_loss']:.2f}")

# Save adapter
trainer.save_model("./adapter")
tokenizer.save_pretrained("./adapter")

Step 5: Adapter Merging and Export

Merge LoRA adapter with base model for inference:

from peft import PeftModel

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-hf",
    torch_dtype=torch.float16,
    device_map="auto"
)

# Load adapter
model = PeftModel.from_pretrained(base_model, "./adapter")

# Merge and save
merged_model = model.merge_and_unload()
merged_model.save_pretrained("./merged_model")
tokenizer.save_pretrained("./merged_model")

Step 6: Quantization Options

Apply different quantization techniques:

# GPTQ quantization
from auto_gptq import AutoGPTQForCausalLM

model_gptq = AutoGPTQForCausalLM.from_quantized(
    "TheBloke/Llama-2-7b-GPTQ",
    device="cuda:0",
    use_triton=True
)

# AWQ quantization
from awq import AutoAWQForCausalLM

model_awq = AutoAWQForCausalLM.from_quantized(
    "TheBloke/Llama-2-7b-AWQ",
    device_map="auto"
)

Output

Fine-tuned model adapter (LoRA weights)
Merged model (optional, for standalone inference)
Training metrics and logs
Evaluation results
Tokenizer configuration

Best Practices

Use QLoRA for memory-efficient fine-tuning on consumer GPUs
Start with small rank (r=8-16) and increase if needed
Monitor training loss to prevent overfitting
Use gradient checkpointing for very large models
Save checkpoints regularly during training
Evaluate on held-out validation set
Use appropriate learning rates (1e-5 to 5e-4)
Apply LoRA to attention layers (q_proj, v_proj, k_proj, o_proj)
Use 4-bit quantization with double quantization for best memory savings
Merge adapters only when needed for deployment

Knowledge: {directories.knowledge}/llm-fine-tuning-patterns.json
Skill: model-training for training infrastructure
Skill: data-pipeline for dataset preparation

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台