Axolotl, Unsloth, LlamaFactory, and Hugging Face TRL — when to use each and how to configure them
Fine-tuning tools vary by abstraction level, ease of use, performance, and customization depth. No single tool is best for everyone — the choice depends on your model size, GPU budget, and expertise.
Trainer libraries (TRL, Unsloth): Lower-level. More control, less abstraction. You write Python training loops. Configuration-first tools (Axolotl, LlamaFactory): Higher-level. Declarative YAML configs. You specify what, the tool handles how.
Tradeoff: Axolotl is easier for standard workflows but harder to customize. TRL is more flexible but requires more Python knowledge.
| Framework | Ease | Speed | LoRA | Multi-GPU | Custom Datasets |
|---|---|---|---|---|---|
| Axolotl | Easy (YAML) | Good | ✓ | ✓ FSDP | ✓ Flexible |
| Unsloth | Medium (Python) | Very Fast | ✓ | Limited | ✓ |
| LlamaFactory | Easy (YAML + UI) | Good | ✓ | ✓ | ✓ |
| TRL (SFTTrainer) | Hard (Python) | Good | ✓ | ✓ | ✓ Full control |
| LitGPT | Medium | Good | ✓ | ✓ | ✓ |
Quick decision: Use Axolotl for most teams. Use Unsloth if speed is critical. Use TRL if you need custom training loops. Use LlamaFactory if you want a UI.
Axolotl is a community trainer built on Hugging Face Transformers. Declarative YAML configs specify models, datasets, training parameters, and LoRA settings. Popular because it handles distributed training seamlessly.
Run training with a single command: accelerate launch -m axolotl.cli.train config.yaml. Axolotl automatically handles multi-GPU/multi-node via Accelerate.
Unsloth patches Transformers models for 2–4× speedup with lower memory via custom kernels. Integrates with Hugging Face trainers. Best for resource-constrained environments (Colab, limited GPUs).
Gradient checkpointing: Trade compute for memory. Recompute activations during backward pass. Flash Attention v2: Reduce memory by 3× vs standard attention. BFloat16: Smaller models fit in less memory, minimal precision loss.
Hugging Face TRL (Transformers Reinforcement Learning) provides SFTTrainer for supervised fine-tuning and PPO/DPO trainers for preference-based alignment. More low-level — you write Python.
DPO training: For preference-based alignment, use TRL's DPOTrainer with (prompt, chosen, rejected) triplets. Simpler than RLHF.
Different frameworks prefer different formats. Most convert to a standard at runtime.
| Format | Structure | Best For | Example |
|---|---|---|---|
| Alpaca | instruction, input, output | Instruction-following | {"instruction": "...", "output": "..."} |
| ShareGPT | conversations array with role/content | Multi-turn dialogue | {"conversations": [{"from": "human", "value": "..."}]} |
| Chat ML | <|im_start|>system/user/assistant<|im_end|> | Claude-style formatting | Special tokens for role separation |
| Instruction-Response | Paired (prompt, completion) | Basic SFT | {"prompt": "...", "completion": "..."} |
Recommendation: Use ShareGPT for multi-turn, Alpaca for instruction-following, Chat ML for Claude-like formatting.