LLaMA-Factory

LLaMA-Factory overview
Quick start with Web UI
CLI fine-tuning
DPO preference training
Built-in dataset support
Evaluation and inference
Gotchas

SECTION 01

LLaMA-Factory overview

LLaMA-Factory (Zheng et al. 2024) is a unified fine-tuning framework that supports 100+ LLM architectures including Llama, Mistral, Qwen, Gemma, Falcon, Baichuan, and more. It offers: (1) LLaMA Board — a web UI for configuring and launching fine-tuning without any code; (2) CLI interface for scriptable pipelines; (3) built-in support for 50+ datasets; (4) alignment training methods including DPO, ORPO, and SimPO in addition to SFT; (5) FlashAttention 2, unsloth, and DeepSpeed integration for efficiency.

SECTION 02

Quick start with Web UI

pip install llamafactory
llamafactory-cli webui
# Opens LLaMA Board at http://localhost:7860

# Or with Docker:
docker run -it --gpus all \
    -p 7860:7860 \
    hiyouga/llamafactory:latest \
    llamafactory-cli webui

In LLaMA Board: (1) select model name + path; (2) choose fine-tuning method (LoRA, QLoRA, full); (3) select a built-in dataset or upload your own; (4) configure hyperparameters; (5) click "Start" — training launches with real-time loss curves.

SECTION 03

CLI fine-tuning

# SFT with LoRA via CLI
llamafactory-cli train \
    --model_name_or_path meta-llama/Llama-3-8B-Instruct \
    --method lora \
    --dataset alpaca_en \
    --template llama3 \
    --lora_rank 8 \
    --lora_alpha 16 \
    --output_dir ./llama3-lora \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --learning_rate 1e-4 \
    --num_train_epochs 3 \
    --bf16

# Equivalent YAML config (llama_factory/examples/lora_sft.yaml)
import yaml, subprocess

config = {
    "model_name_or_path": "meta-llama/Llama-3-8B-Instruct",
    "method": "lora",
    "dataset": "alpaca_en",
    "template": "llama3",
    "lora_rank": 8,
    "output_dir": "./llama3-lora",
    "per_device_train_batch_size": 4,
    "num_train_epochs": 3,
    "bf16": True,
}
with open("train_config.yaml", "w") as f:
    yaml.dump(config, f)
subprocess.run(["llamafactory-cli", "train", "train_config.yaml"])

SECTION 04

DPO preference training

# Step 1: SFT on base model
llamafactory-cli train \
    --model_name_or_path meta-llama/Llama-3-8B \
    --method lora \
    --dataset alpaca_en \
    --template llama3 \
    --output_dir ./sft-checkpoint

# Step 2: DPO alignment on preference data
# Dataset format: {"prompt": "...", "chosen": "...", "rejected": "..."}
llamafactory-cli train \
    --model_name_or_path meta-llama/Llama-3-8B \
    --adapter_name_or_path ./sft-checkpoint \
    --method lora \
    --stage dpo \
    --dataset dpo_en_demo \
    --dpo_beta 0.1 \
    --template llama3 \
    --output_dir ./dpo-checkpoint

SECTION 05

Built-in dataset support

LLaMA-Factory ships with 50+ pre-configured datasets. Reference them by name in the config. Examples:

alpaca_en / alpaca_zh: 52k instruction-following examples
identity: Model identity and capability dataset
oaast_sft_en: OpenAssistant SFT data
dpo_en_demo: DPO preference pairs
math_en: Mathematics reasoning dataset

For custom datasets, add an entry to dataset_info.json pointing to your local JSONL file and specifying the column mapping (instruction, input, output fields).

SECTION 06

Evaluation and inference

# Evaluate on MMLU after training
llamafactory-cli eval \
    --model_name_or_path meta-llama/Llama-3-8B-Instruct \
    --adapter_name_or_path ./llama3-lora \
    --task mmlu \
    --template llama3 \
    --lang en \
    --n_shot 5

# Interactive inference
llamafactory-cli chat \
    --model_name_or_path meta-llama/Llama-3-8B-Instruct \
    --adapter_name_or_path ./llama3-lora \
    --template llama3

# Deploy OpenAI-compatible API
llamafactory-cli api \
    --model_name_or_path meta-llama/Llama-3-8B-Instruct \
    --adapter_name_or_path ./llama3-lora \
    --template llama3 \
    --port 8000

SECTION 07

Gotchas

Template must match model: The template parameter specifies the chat format. Using the wrong template (e.g. llama2 for a Llama 3 model) produces garbled outputs. Check the LLaMA-Factory docs for the correct template name for your model.
Dataset info.json: Custom datasets must be registered in data/dataset_info.json. If you use a relative path in the config, make sure to run from the LLaMA-Factory root directory or set the dataset_dir parameter.
Memory with full fine-tuning: Full fine-tuning of 7B models requires 80+ GB VRAM without optimisations. Enable gradient checkpointing (--gradient_checkpointing) and use DeepSpeed ZeRO-3 for large models.

Multi-Modal & Instruction Tuning

Llama Factory supports fine-tuning for vision-language models and instruction-following tasks, extending beyond text-only models. The framework handles complex multi-modal inputs and mixed fine-tuning objectives. This enables training models that understand both image and text inputs, or models specialized in following complex multi-step instructions. The unified interface makes it straightforward to experiment with different tuning approaches on the same model architecture.

Instruction tuning specifically improves a model's ability to follow directions and understand intent. By fine-tuning on diverse instruction-response pairs, you can enhance the model's helpfulness and alignment with human preferences. Llama Factory streamlines this process through convenient dataset loading and training loop integration.

from llama_factory.train.sft.workflow import run_exp

# Multi-modal instruction tuning
train_args = {
    "model_name_or_path": "Llama-2-7b-hf",
    "dataset": "multi_modal_instructions",
    "task": "supervised_fine_tuning",
    "output_dir": "./output",
    "overwrite_output_dir": True,
    "per_device_train_batch_size": 4,
    "gradient_accumulation_steps": 4,
    "learning_rate": 5e-5,
    "num_train_epochs": 3,
}

run_exp(train_args)

As instruction tuning becomes more sophisticated, frameworks like Llama Factory provide essential tools for practitioners. The ability to efficiently fine-tune large models while managing complex datasets and training configurations is crucial for competitive model development in the evolving landscape of large language models.

The web UI in Llama Factory significantly lowers barriers to entry for users without deep ML infrastructure experience. The interface guides users through dataset selection, model choice, and hyperparameter configuration. This democratization of fine-tuning enables broader participation in model customization without requiring low-level PyTorch expertise. For teams prioritizing accessibility alongside capability, the web UI represents tremendous value.

Integration with Hugging Face Hub simplifies model and dataset management. Rather than manually downloading and organizing files, Llama Factory seamlessly pulls models and datasets from the Hub. This tight integration with the broader ML ecosystem ensures compatibility with thousands of community-contributed models and datasets. The streamlined workflow accelerates experimentation and reduces friction in the fine-tuning process.

Evaluation capabilities within Llama Factory enable immediate assessment of fine-tuned models. Rather than exporting models and setting up separate evaluation pipelines, you can evaluate directly within the framework. Metrics like BLEU, ROUGE, and custom metrics provide insights into fine-tuning effectiveness. This integrated evaluation workflow supports rapid iteration during model development.

Advanced training techniques supported by Llama Factory include knowledge distillation, where smaller models learn from larger ones. This enables efficient deployment of capable models on resource-constrained devices. As the machine learning community develops new training methodologies, Llama Factory continuously incorporates them, ensuring users have access to cutting-edge techniques without reimplementing them from scratch.

The dataset formats supported by Llama Factory span numerous options including Alpaca, ShareGPT, and custom formats. This flexibility enables working with existing datasets without extensive preprocessing. Converting datasets to compatible formats is straightforward, and the documentation provides clear guidance. The inclusive approach to data formats acknowledges the diversity of sources practitioners work with.

Training on limited resources remains feasible with Llama Factory through techniques like LoRA and QLoRA. These parameter-efficient fine-tuning methods reduce memory requirements dramatically, enabling fine-tuning on consumer hardware. Rather than requiring enterprise-scale compute resources, practitioners can experiment with fine-tuning on modest hardware. This democratization of fine-tuning has profound implications for expanding who can customize state-of-the-art models.

The inference capabilities in Llama Factory enable serving fine-tuned models in production. Rather than exporting models and building separate serving infrastructure, you can leverage Llama Factory's built-in capabilities. This integrated approach from fine-tuning through serving streamlines the complete workflow from model development through deployment.

Community contributions to Llama Factory include new datasets, training techniques, and model variants. This active ecosystem ensures the framework evolves to support emerging needs. Users benefit from community innovations without waiting for official releases, accelerating adoption of new best practices across the user base.

The framework continues to mature and expand, supporting increasingly sophisticated fine-tuning scenarios.

The accessibility provided by Llama Factory democratizes fine-tuning and enables organizations of all sizes to customize state-of-the-art models for their specific needs. The combination of web interface, command-line tools, and programmatic APIs accommodates different user preferences and skill levels. This inclusive design philosophy has contributed significantly to the framework adoption and community growth.

Technique	Description	When to Use
Standard SFT	Full parameter fine-tuning	Unlimited compute
LoRA	Low-rank adaptation	Limited GPU memory
QLoRA	Quantized LoRA	Consumer hardware
DPO	Direct preference optimization	Preference alignment