How To Fine-Tune a Hugging Face Model with Python

Last Updated: June 01, 2026

Table of Contents

What Is Fine-Tuning and When Should You Do It?
Installing Dependencies
Part 1: Fine-Tuning for Text Classification
Part 2: Fine-Tuning an LLM with LoRA
Real-Life Example: Customer Support Ticket Classifier
Frequently Asked Questions
Summary
Related Articles

Pretrained language models are impressive generalists. They can write code, explain concepts, translate languages, and summarize documents — all from a single set of weights. But “impressive generalist” and “expert in your specific domain” are different things. If you need a model that consistently uses your company’s terminology, follows your specific output format, matches your brand’s tone, or performs well on a narrow task like classifying customer support tickets by urgency — fine-tuning is how you get there.

Hugging Face has become the standard infrastructure layer for working with open-source models. The transformers library provides a unified API for hundreds of model architectures. The datasets library handles data loading and preprocessing. The Trainer class wraps the training loop with gradient accumulation, mixed precision, and evaluation built in. Together, they mean you can fine-tune a model with far less boilerplate than PyTorch alone would require.

This tutorial covers the complete fine-tuning workflow: setting up a dataset, loading a pretrained model, configuring training with the Trainer API, evaluating the results, and saving/loading your fine-tuned model. We’ll work through two examples — sentiment classification (a classification task) and instruction tuning (a text generation task).

Quick Answer
Fine-tuning with Hugging Face: load a pretrained model with AutoModelForSequenceClassification.from_pretrained(), tokenize your dataset with AutoTokenizer, define TrainingArguments, create a Trainer, call trainer.train(). For LLMs, use SFTTrainer from TRL with LoRA (PEFT) to reduce memory requirements. Save with trainer.save_model().

Written by Pubs

Python developer and educator with 15+ years building production systems across data engineering, web APIs, and AI tooling. Founder of Python How To Program — 270+ in-depth tutorials covering the modern Python stack.

View all tutorials by Pubs →

What Is Fine-Tuning and When Should You Do It?

A pretrained model has learned general language understanding from billions of tokens of text. Fine-tuning continues training on a smaller, task-specific dataset to specialize those general capabilities. The pretrained weights provide a head start — you need far less data and compute than training from scratch.

Fine-tuning is the right choice when: a general model gives inconsistent results on your specific task; you need the model to follow a specific output format reliably; you have domain-specific terminology the general model handles poorly; you need to embed task-specific knowledge that’s expensive to inject via prompting; or you need a smaller, faster model that’s specialized for one task rather than a large general model.

Fine-tuning is NOT the right choice when: the task can be solved with good prompting alone; you have fewer than a few hundred examples; the task requires knowledge that changes frequently (use RAG instead); or you don’t have the compute budget even for fine-tuning.

Approach	Data Needed	Compute	Best For
Prompting	0–10 examples	None	General tasks, quick iteration
Few-shot prompting	10–100 examples	None	Pattern following with small models
Fine-tuning (full)	1K–100K examples	High (multiple GPUs)	Small models, max performance
Fine-tuning (LoRA/PEFT)	100–10K examples	Moderate (1 GPU)	LLMs, memory-constrained hardware
RAG	Any amount	Low (just embeddings)	Knowledge that updates frequently

Sudo Sam standing between a large and small neural network orb connected by an energy beam — More data doesn’t always mean more model. Sometimes you just need the right 0.1%.

Installing Dependencies

pip install transformers datasets accelerate evaluate scikit-learn
# For LLM fine-tuning with LoRA:
pip install peft trl bitsandbytes

# If you have a GPU:
pip install torch --index-url https://download.pytorch.org/whl/cu121

The transformers library is the core Hugging Face library for models and tokenizers. datasets provides efficient data loading and processing. accelerate handles distributed training and mixed precision automatically. evaluate provides standardized metrics. peft (Parameter-Efficient Fine-Tuning) provides LoRA and other memory-efficient adaptation methods. trl (Transformer Reinforcement Learning) includes SFTTrainer for supervised fine-tuning of LLMs.

Part 1: Fine-Tuning for Text Classification

Text classification is the most common fine-tuning task: given a text, predict one of N categories. Sentiment analysis (positive/negative/neutral), intent classification, topic categorization — all the same training approach.

Loading and Preparing the Dataset

from datasets import load_dataset, DatasetDict
from transformers import AutoTokenizer

# Load the IMDB sentiment dataset from Hugging Face Hub
dataset = load_dataset("imdb")
print(dataset)
# DatasetDict with 'train' (25000 examples) and 'test' (25000 examples)
# Each example: {'text': '...', 'label': 0 or 1}

# For demonstration, work with a smaller subset
small_dataset = DatasetDict({
    'train': dataset['train'].select(range(2000)),
    'test': dataset['test'].select(range(500))
})

# Load the tokenizer for our base model
model_name = "distilbert-base-uncased"  # Fast, small, good baseline
tokenizer = AutoTokenizer.from_pretrained(model_name)

def tokenize_function(examples):
    """Tokenize text examples with truncation and padding."""
    return tokenizer(
        examples["text"],
        truncation=True,
        padding="max_length",
        max_length=512
    )

# Apply tokenization to the entire dataset
tokenized_dataset = small_dataset.map(
    tokenize_function,
    batched=True,           # Process in batches for speed
    remove_columns=["text"] # Remove the raw text column (we have tokens now)
)

print(f"Training examples: {len(tokenized_dataset['train'])}")
print(f"Test examples: {len(tokenized_dataset['test'])}")
print(f"Features: {tokenized_dataset['train'].features}")

The tokenizer converts raw text into token IDs that the model understands. truncation=True cuts sequences longer than max_length. padding="max_length" pads shorter sequences to the same length so they can be batched. batched=True in map() processes multiple examples at once, which is significantly faster than one-at-a-time processing.

API Alex arranging colorful glowing orbs in sequence on a conveyor belt — Tokenization: turning human language into something a model can actually count.

Loading the Model and Configuring Training

from transformers import (
    AutoModelForSequenceClassification,
    TrainingArguments,
    Trainer
)
import evaluate
import numpy as np

# Load pretrained model with a classification head
# num_labels=2 for binary sentiment (positive/negative)
model = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    num_labels=2,
    id2label={0: "NEGATIVE", 1: "POSITIVE"},
    label2id={"NEGATIVE": 0, "POSITIVE": 1}
)

# Load evaluation metric
accuracy_metric = evaluate.load("accuracy")
f1_metric = evaluate.load("f1")

def compute_metrics(eval_pred):
    """Compute accuracy and F1 during evaluation."""
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    accuracy = accuracy_metric.compute(predictions=predictions, references=labels)
    f1 = f1_metric.compute(predictions=predictions, references=labels, average="binary")
    return {**accuracy, **f1}

# Configure training
training_args = TrainingArguments(
    output_dir="./sentiment-model",     # Where to save checkpoints
    num_train_epochs=3,                 # Full passes through the training data
    per_device_train_batch_size=16,     # Batch size per GPU/CPU
    per_device_eval_batch_size=32,
    warmup_steps=100,                   # Gradual LR increase at start
    weight_decay=0.01,                  # L2 regularization
    learning_rate=2e-5,                 # Key hyperparameter for fine-tuning
    evaluation_strategy="epoch",        # Evaluate at end of each epoch
    save_strategy="epoch",
    load_best_model_at_end=True,        # Keep the best checkpoint
    metric_for_best_model="f1",
    logging_steps=50,
    fp16=True,                          # Mixed precision (faster on GPU)
    report_to="none"                    # Disable wandb/tensorboard for simplicity
)

# Create the Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
    compute_metrics=compute_metrics,
)

print(f"Model parameters: {model.num_parameters():,}")
print(f"Trainable parameters: {sum(p.numel() for p in model.parameters() if p.requires_grad):,}")

Training and Evaluating

# Train the model
print("Starting training...")
train_result = trainer.train()

print(f"\nTraining complete!")
print(f"Train loss: {train_result.training_loss:.4f}")

# Evaluate on test set
eval_results = trainer.evaluate()
print(f"\nTest accuracy: {eval_results['eval_accuracy']:.4f}")
print(f"Test F1: {eval_results['eval_f1']:.4f}")

# Save the fine-tuned model
trainer.save_model("./sentiment-model-final")
tokenizer.save_pretrained("./sentiment-model-final")
print("\nModel saved to ./sentiment-model-final")

The Trainer handles the entire training loop — forward pass, loss calculation, backpropagation, optimizer step — for every batch across all epochs. The load_best_model_at_end=True setting means that if epoch 2 had the best F1 score but epoch 3 regressed slightly, you get the epoch 2 weights, not epoch 3. After training, trainer.save_model() writes both the model weights and the tokenizer config to disk so they can be reloaded together as a unit. The resulting directory is self-contained — you can copy it to any machine and run inference without needing to know which base model it started from.

Using the Fine-Tuned Model

from transformers import pipeline

# Load the fine-tuned model with the high-level pipeline API
classifier = pipeline(
    "text-classification",
    model="./sentiment-model-final",
    tokenizer="./sentiment-model-final",
    device=0  # Use GPU if available, -1 for CPU
)

# Test on new examples
test_texts = [
    "This movie was absolutely brilliant! One of the best I've seen.",
    "Complete waste of time. Boring from start to finish.",
    "It was okay, nothing special but not terrible either.",
    "An unexpected masterpiece. I was completely captivated."
]

results = classifier(test_texts)
for text, result in zip(test_texts, results):
    print(f"Text: {text[:50]}...")
    print(f"Label: {result['label']} (confidence: {result['score']:.3f})\n")

The pipeline API is the simplest way to run inference on a saved model. It handles tokenization, tensor conversion, the forward pass, and converting logits back to human-readable labels — all in one call. The device=0 argument moves the model to your first GPU; use device=-1 for CPU-only inference. For production deployments where latency matters, you’d typically load the model once at startup and keep it in memory, batching incoming requests rather than processing them one at a time.

Part 2: Fine-Tuning an LLM with LoRA

Fine-tuning full LLMs (7B+ parameters) requires significant GPU memory — too much for most developers. LoRA (Low-Rank Adaptation) is a parameter-efficient approach that freezes the original model weights and adds small trainable rank decomposition matrices to each layer. Instead of updating 7 billion parameters, you update 5-10 million. The quality loss is minimal for most tasks.

from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments
from peft import LoraConfig, get_peft_model, TaskType
from trl import SFTTrainer
from datasets import Dataset
import torch

# Load a smaller model for this demo (use llama or mistral for production)
base_model = "microsoft/phi-2"  # 2.7B parameters, fits in ~8GB VRAM with LoRA

tokenizer = AutoTokenizer.from_pretrained(base_model)
tokenizer.pad_token = tokenizer.eos_token  # Phi-2 doesn't have a pad token

model = AutoModelForCausalLM.from_pretrained(
    base_model,
    torch_dtype=torch.float16,    # Use float16 to save memory
    device_map="auto"             # Automatically assign to GPU if available
)

# LoRA configuration
lora_config = LoraConfig(
    r=16,                           # Rank — higher = more parameters, better quality
    lora_alpha=32,                  # Scaling factor (usually 2x rank)
    target_modules=["q_proj", "v_proj"],  # Which layers to adapt (model-specific)
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.CAUSAL_LM
)

# Apply LoRA to the model
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# Output: trainable params: 3,145,728 || all params: 2,782,765,056 (0.11% trainable)

The r=16 rank determines LoRA’s capacity. Higher rank means more trainable parameters and better adaptation, but more memory. For most tasks, ranks between 8 and 64 work well. target_modules specifies which layers get LoRA adapters — this varies by model architecture. For LLaMA models it’s typically ["q_proj", "k_proj", "v_proj", "o_proj"].

Cache Katie pressing a glowing lightning bolt into a frozen crystal matrix — LoRA fine-tuning — 0.11% of parameters trained. 95% of the quality. The math checks out.

Preparing Instruction Data

from datasets import Dataset

# Instruction-following dataset format
# The model learns to follow instructions in this format
raw_data = [
    {
        "instruction": "Explain what a Python decorator is.",
        "input": "",
        "output": "A Python decorator is a function that takes another function as input and returns a modified version of that function. Decorators allow you to add functionality to existing functions without modifying them directly, using the @decorator_name syntax."
    },
    {
        "instruction": "Write a Python function to check if a number is prime.",
        "input": "",
        "output": "def is_prime(n: int) -> bool:\n    if n < 2:\n        return False\n    if n == 2:\n        return True\n    if n % 2 == 0:\n        return False\n    for i in range(3, int(n**0.5) + 1, 2):\n        if n % i == 0:\n            return False\n    return True"
    },
    {
        "instruction": "What does the following Python code do?",
        "input": "result = [x**2 for x in range(10) if x % 2 == 0]",
        "output": "This list comprehension creates a list of squares of even numbers from 0 to 9. It iterates through numbers 0-9, filters for even numbers (x % 2 == 0), squares each one (x**2), and collects them in a list. The result is [0, 4, 16, 36, 64]."
    },
    # ... add hundreds or thousands more examples for real training
]

def format_instruction(example):
    """Format into a single instruction-following string."""
    if example.get("input"):
        return f"""### Instruction:
{example['instruction']}

### Input:
{example['input']}

### Response:
{example['output']}"""
    else:
        return f"""### Instruction:
{example['instruction']}

### Response:
{example['output']}"""

# Convert to Dataset and format
dataset = Dataset.from_list(raw_data)
dataset = dataset.map(
    lambda x: {"text": format_instruction(x)},
    remove_columns=dataset.column_names
)
print(dataset[0]["text"])

Training with SFTTrainer

training_args = TrainingArguments(
    output_dir="./python-tutor-lora",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,  # Effective batch size = 4 * 4 = 16
    warmup_steps=50,
    learning_rate=2e-4,             # LoRA uses higher LR than full fine-tuning
    fp16=True,
    logging_steps=10,
    save_strategy="epoch",
    report_to="none"
)

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    args=training_args,
    tokenizer=tokenizer,
    dataset_text_field="text",
    max_seq_length=512,
    peft_config=lora_config
)

print("Training with LoRA...")
trainer.train()

# Save the LoRA adapter (NOT the full model -- much smaller!)
trainer.save_model("./python-tutor-lora-adapters")
print("LoRA adapters saved (small file, just the delta weights)")

The gradient_accumulation_steps=4 setting simulates a larger batch size by accumulating gradients over multiple forward passes before updating weights. This is essential when GPU memory limits your batch size — effective batch size of 16 trains better than effective batch size of 4.

Loading and Using LoRA Fine-Tuned Models

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Load base model + LoRA adapters
base_model_name = "microsoft/phi-2"
adapter_path = "./python-tutor-lora-adapters"

tokenizer = AutoTokenizer.from_pretrained(base_model_name)
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Load and merge LoRA adapters into the base model
model = PeftModel.from_pretrained(base_model, adapter_path)
model = model.merge_and_unload()  # Merge adapters into weights for faster inference

# Generate a response
def ask_model(question: str, max_tokens: int = 300) -> str:
    prompt = f"""### Instruction:
{question}

### Response:
"""
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_tokens,
            temperature=0.7,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )

    # Decode only the new tokens (skip the prompt)
    new_tokens = outputs[0][inputs["input_ids"].shape[1]:]
    return tokenizer.decode(new_tokens, skip_special_tokens=True)

# Test the fine-tuned model
response = ask_model("Explain list comprehensions in Python with an example.")
print(response)

merge_and_unload() permanently fuses the LoRA adapter weights back into the base model's weight matrices. The result is a single merged model with no LoRA overhead during inference — same speed as the original base model, but with your task-specific improvements baked in. This is the deployment-ready form. Alternatively, keep the adapter separate with PeftModel.from_pretrained() at runtime if you need to hot-swap between different adapters for the same base model without reloading the full weights each time.

Real-Life Example: Customer Support Ticket Classifier

Here's a complete fine-tuning workflow for a realistic business use case — classifying customer support tickets into categories:

from datasets import Dataset
from transformers import (
    AutoTokenizer, AutoModelForSequenceClassification,
    TrainingArguments, Trainer, DataCollatorWithPadding
)
import evaluate
import numpy as np

# Sample training data (in practice you'd have thousands of examples)
ticket_data = [
    {"text": "My payment failed but I was still charged", "label": 0},      # billing
    {"text": "Can't log into my account, password reset not working", "label": 1},  # auth
    {"text": "The app crashes every time I open the dashboard", "label": 2}, # bug
    {"text": "How do I export my data as a CSV file?", "label": 3},         # howto
    {"text": "Invoice shows wrong amount for last month", "label": 0},       # billing
    {"text": "Two-factor auth code not arriving via SMS", "label": 1},       # auth
    {"text": "Search results are empty even though I have data", "label": 2}, # bug
    {"text": "Can I change my billing cycle from monthly to annual?", "label": 3}, # howto
    # ... add many more
]

label_names = ["billing", "authentication", "bug_report", "how_to"]
num_labels = len(label_names)

model_name = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)

dataset = Dataset.from_list(ticket_data)

# Train/test split
split = dataset.train_test_split(test_size=0.2, seed=42)

def tokenize(examples):
    return tokenizer(examples["text"], truncation=True, padding=True)

tokenized = split.map(tokenize, batched=True)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

# Load model
model = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    num_labels=num_labels,
    id2label={i: name for i, name in enumerate(label_names)},
    label2id={name: i for i, name in enumerate(label_names)}
)

# Training
accuracy = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    preds, labels = eval_pred
    preds = np.argmax(preds, axis=1)
    return accuracy.compute(predictions=preds, references=labels)

args = TrainingArguments(
    output_dir="./ticket-classifier",
    num_train_epochs=5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    metric_for_best_model="accuracy",
    report_to="none"
)

trainer = Trainer(
    model=model,
    args=args,
    train_dataset=tokenized["train"],
    eval_dataset=tokenized["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics
)

trainer.train()
trainer.save_model("./ticket-classifier-final")

# Production inference
from transformers import pipeline

classifier = pipeline("text-classification", model="./ticket-classifier-final")

new_tickets = [
    "I was charged twice for the same subscription this month",
    "Getting 404 error on the reports page",
    "How do I add a team member to my account?"
]

for ticket in new_tickets:
    result = classifier(ticket)[0]
    print(f"Ticket: {ticket}")
    print(f"Category: {result['label']} (confidence: {result['score']:.2%})\n")

Debug Dee sorting falling paper tickets into four colored buckets in mission control — Four buckets. One model. Zero humans spending 30 seconds per ticket deciding if it's billing or auth.

Frequently Asked Questions

How much data do I need for fine-tuning?
For classification with a pretrained language model, 500-2000 labeled examples per class is a reasonable starting point. With more data you'll get better results up to a point of diminishing returns (usually 10K-100K examples). For instruction tuning LLMs, high-quality datasets of 1000-10000 examples often outperform low-quality datasets of 100K examples. Quality matters more than quantity.

Do I need a GPU for fine-tuning?
For small models (DistilBERT, BERT-base): CPU works but is slow (hours instead of minutes). For medium models (7B LLMs with LoRA): a single consumer GPU with 8-16GB VRAM (RTX 3080, 4080, or Apple M1/M2 Pro) is sufficient. For large models without LoRA: multiple high-VRAM GPUs or cloud compute (A100s).

What's the difference between fine-tuning and RLHF?
Fine-tuning (supervised) trains on (input, correct output) pairs — you need labeled data with known correct answers. RLHF (Reinforcement Learning from Human Feedback) trains the model to maximize human preference scores — you need human raters to rank model outputs. RLHF is how models like ChatGPT learn to be helpful and harmless. For most custom task fine-tuning, supervised fine-tuning is sufficient and much simpler.

How do I prevent catastrophic forgetting during fine-tuning?
Catastrophic forgetting is when fine-tuning on new data degrades performance on the original task. Solutions: use LoRA (fine-tuning a tiny fraction of parameters preserves the base model's capabilities); use a low learning rate (2e-5 for full fine-tuning, 2e-4 for LoRA); train for fewer epochs; include some original task data in your training mix.

When should I use LoRA vs full fine-tuning?
Use LoRA when: the model has more than 1B parameters; you're memory-constrained (consumer GPU or CPU); you want to keep multiple specialized adapters for different tasks; you need fast switching between tasks. Use full fine-tuning when: the model is small (DistilBERT, BERT-base); you have significant compute budget; you need maximum performance on a single specific task.

Summary

You've fine-tuned a model for both classification (DistilBERT on sentiment) and instruction following (LoRA adapters on an LLM). The Hugging Face ecosystem handles the messy parts — gradient accumulation, mixed precision, checkpoint saving, evaluation — so you can focus on data quality and hyperparameter choices, which are the real levers for fine-tuning success.

The most important lesson: data quality beats model size almost every time. A fine-tuned small model on clean, well-labeled data usually outperforms a large pretrained model on your specific task. Invest in your dataset before spending compute. For using your fine-tuned model in a conversational interface, see How To Build a Chatbot with Ollama. For serving it behind an API, see Building a REST API with FastAPI.

Frequently Asked Questions

What's the difference between fine-tuning and prompting?

Prompting steers a frozen model with carefully written instructions and examples; fine-tuning updates the model's weights so it learns your task. Prompting is free, instant, and reversible. Fine-tuning costs GPU time, produces a model artifact you have to host, and can degrade general capabilities (catastrophic forgetting). Reach for fine-tuning only when prompting has failed and you have at least a few hundred high-quality labeled examples.

How much data do I need for fine-tuning?

For LoRA-style adapters on a 7B model, useful results often appear with 500-5000 examples. Full fine-tuning needs an order of magnitude more. Quality matters more than quantity — 200 carefully curated examples often beat 10,000 noisy ones. Diversity matters too: the model only learns to handle distributions it sees in training.

Should I use LoRA or full fine-tuning?

LoRA (Low-Rank Adaptation) trains a tiny set of adapter weights (typically 0.1-1% of the model size) while keeping the base model frozen. It's faster, cheaper, uses 5-10x less GPU memory, and you can ship multiple adapters for different tasks against one base model. Full fine-tuning gives slightly better quality but is rarely worth the cost. PEFT library + Transformers handles LoRA in 20 lines.

Why does my fine-tuned model perform worse than the base model on some tasks?

Catastrophic forgetting — fine-tuning on a narrow distribution erodes the model's general capabilities. Mitigations: keep a small fraction of general-purpose data mixed into the training set, use LoRA which constrains the changes, or fine-tune for fewer epochs. Always benchmark the fine-tuned model on a held-out general eval set, not just on your task.

How do I evaluate a fine-tuned model fairly?

Hold out 10-20% of your labeled data as a test set BEFORE you start. Never look at it during training or hyperparameter tuning. Compute task-specific metrics (exact match, F1, ROUGE) on the test set, plus run a general benchmark (MMLU, HellaSwag) to check for capability regressions. Human evaluation on a sample of outputs catches issues automated metrics miss.

Continue Learning Python

Tutorials you might also find useful:

How To Use Python litellm for Multi-Model LLM Integration

Post Views: 123

How To Fine-Tune a Hugging Face Model with Python

What Is Fine-Tuning and When Should You Do It?

Installing Dependencies

Part 1: Fine-Tuning for Text Classification

Loading and Preparing the Dataset

Loading the Model and Configuring Training

Training and Evaluating

Using the Fine-Tuned Model

Part 2: Fine-Tuning an LLM with LoRA

Preparing Instruction Data

Training with SFTTrainer

Loading and Using LoRA Fine-Tuned Models

Real-Life Example: Customer Support Ticket Classifier

Frequently Asked Questions

Summary

Related Articles

Frequently Asked Questions

What's the difference between fine-tuning and prompting?

How much data do I need for fine-tuning?

Should I use LoRA or full fine-tuning?

Why does my fine-tuned model perform worse than the base model on some tasks?

How do I evaluate a fine-tuned model fairly?

Continue Learning Python

Submit a Comment Cancel reply