Model and dataset registry with 500K+ public models, Git-LFS versioning, and programmatic access via huggingface_hub. The central distribution platform for open-source LLMs.
HuggingFace Hub is the central distribution platform for open-source AI models, datasets, and Spaces (demo apps). It hosts 500K+ models and 100K+ datasets, all versioned with Git and Git-LFS. Models can be loaded directly into transformers with a model ID string. The huggingface_hub Python library provides programmatic access: downloading, uploading, searching, and managing repositories. Access is free for public repos; private repos require a paid plan.
from huggingface_hub import snapshot_download, hf_hub_download
import os
# Set your token for gated models (Llama 3, Gemma etc)
os.environ["HF_TOKEN"] = "hf_..." # or huggingface-cli login
# Download entire model repo to local cache
local_dir = snapshot_download(
repo_id="meta-llama/Llama-3-8B-Instruct",
ignore_patterns=["*.gguf", "original/*"], # skip large alternate formats
cache_dir="~/.cache/huggingface/hub", # default cache location
)
print(f"Downloaded to: {local_dir}")
# Download a single file
config_path = hf_hub_download(
repo_id="meta-llama/Llama-3-8B-Instruct",
filename="config.json",
)
# Load with transformers (handles caching automatically)
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3-8B-Instruct")
# First call downloads; subsequent calls load from cache
from huggingface_hub import HfApi, create_repo
import os
api = HfApi(token=os.environ["HF_TOKEN"])
# Create a new repository
repo_url = create_repo(
repo_id="your-username/my-finetuned-llama3",
private=True, # private until ready to share
exist_ok=True,
)
# Upload an entire directory
api.upload_folder(
folder_path="./fine-tuned-llama3/merged",
repo_id="your-username/my-finetuned-llama3",
repo_type="model",
commit_message="Upload fine-tuned Llama 3 8B",
ignore_patterns=["*.tmp", "__pycache__/*"],
)
# Or use transformers' push_to_hub directly
from transformers import AutoModelForCausalLM, AutoTokenizer
model.push_to_hub("your-username/my-finetuned-llama3")
tokenizer.push_to_hub("your-username/my-finetuned-llama3")
from huggingface_hub import ModelCard, ModelCardData
# Create a model card programmatically
card_data = ModelCardData(
language=["en"],
license="llama3",
base_model="meta-llama/Llama-3-8B-Instruct",
tags=["llama", "fine-tuned", "qlora"],
datasets=["my-org/my-dataset"],
metrics=[{"type": "accuracy", "value": 0.87, "dataset": {"name": "my-eval"}}],
)
card = ModelCard(
content=(
"---
" + card_data.to_yaml() + "
---
"
"# My Fine-tuned Llama 3 8B
"
"Fine-tuned on domain-specific data using QLoRA.
"
"## Training details
"
"- Base: meta-llama/Llama-3-8B-Instruct
"
"- Method: QLoRA r=16
"
"- Dataset: 10k instruction pairs
"
),
)
card.push_to_hub("your-username/my-finetuned-llama3")
import os
from huggingface_hub import login
# Login (stores token in ~/.cache/huggingface/token)
login(token=os.environ["HF_TOKEN"])
# For gated models (Llama 3, Gemma): first accept the license on the model page
# then your token grants download access
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-3-8B-Instruct",
token=os.environ["HF_TOKEN"], # explicit token if not logged in
)
# Make your own model gated (requires org plan):
from huggingface_hub import HfApi
api = HfApi()
api.update_repo_settings(
repo_id="your-org/proprietary-model",
gated=True, # users must accept terms to download
)
import os
from huggingface_hub import HfApi
def push_to_hub_on_eval_pass(model_path: str, eval_score: float, threshold: float = 0.85):
if eval_score < threshold:
print(f"Eval {eval_score:.2f} below threshold {threshold} — skipping push")
return
api = HfApi(token=os.environ["HF_TOKEN"])
api.upload_folder(
folder_path=model_path,
repo_id="my-org/production-model",
commit_message=f"Auto-push: eval_score={eval_score:.3f}",
create_pr=True, # create a PR rather than pushing directly to main
)
print(f"Pushed to Hub with eval score {eval_score:.2f}")
# In your CI pipeline (GitHub Actions, etc):
# HF_TOKEN: ${{ secrets.HF_TOKEN }}
# python -c "from train import evaluate; score = evaluate(); push_to_hub_on_eval_pass('./model', score)"
~/.cache/huggingface/hub can grow to hundreds of GB if you download many models. Set HF_HOME or TRANSFORMERS_CACHE to a larger drive, and periodically run huggingface-cli delete-cache to reclaim space.TRANSFORMERS_OFFLINE=1 to prevent any network calls — all model loads will use the local cache. Essential for air-gapped deployments.The Hugging Face Hub hosts over 500,000 models with metadata including task type, training data, evaluation scores, and license information. The Hub's filtering and sorting capabilities enable discovery of models for specific tasks — filtering by task (text-generation, token-classification, translation), language, library (transformers, diffusers), and license (apache-2.0, mit, llama) identifies candidate models efficiently. The Model Leaderboard aggregates evaluation scores from Open LLM Leaderboard and other sources, enabling direct quality comparison before downloading models.
| Resource type | Hub URL pattern | Content |
|---|---|---|
| Model | huggingface.co/org/model-name | Weights, config, tokenizer, model card |
| Dataset | huggingface.co/datasets/org/name | Data files, dataset card, viewer |
| Space | huggingface.co/spaces/org/name | Demo app (Gradio/Streamlit) |
| Collection | huggingface.co/collections/... | Curated model/dataset groups |
Model caching behavior on the Hub client affects both storage efficiency and download performance. By default, huggingface_hub downloads models to ~/.cache/huggingface/hub/ and caches each revision with a content-addressable file system that avoids duplicate storage of identical files across model versions. Setting HF_HOME environment variable redirects the cache to a custom path, which is necessary when the default home directory lacks sufficient space for large model weights. The cache can be shared across containers or VMs by mounting the same cache directory, avoiding repeated downloads of the same model in distributed serving environments.
Hugging Face Hub's search interface (huggingface.co/models) filters >200k models by task (text-generation, image-classification, sentiment-analysis), framework (pytorch, tensorflow), language, and license. Programmatic discovery via huggingface_hub.list_models() enables filtering by ModelFilter(task="text-generation", library="transformers", sort="downloads", direction=-1); this returns popular models first, useful for finding well-maintained alternatives to obscure checkpoints. The Hub's metadata — download counts, last update date, model card completeness (scored 1–5 stars) — indicates active maintenance; models with <100 downloads or no update in 6 months are likely abandoned. Advanced search using Hugging Face API's full-text search (q="legal document classification") finds domain-specific models; combing search results with license filters (license="apache-2.0" or license="mit") identifies permissive models suitable for commercial deployment. Hub's leaderboards (e.g., Open LLM Leaderboard) rank models by benchmark performance, surfacing SOTA checkpoints; LMSYS Chatbot Arena provides crowdsourced rankings of instruction-tuned models, more reflective of real-world performance than standardized benchmarks.
The huggingface-cli command-line tool provides efficient model management: huggingface-cli model-info meta-llama/Llama-2-7b-hf prints full metadata including architecture, training data, quantization status, and license; huggingface-cli download meta-llama/Llama-2-7b-hf --include "*.md" downloads just the model card and README for quick review. Model cards (standardized markdown files in each repo) document intended use, training procedures, known limitations, and benchmark results — required reading before deployment. Well-maintained models include PyTorch and SafeTensors versions, detailed README with code examples, and community discussions; poor-quality repos lack these signals. The Hub CLI enables headless bulk operations: for model in $(huggingface-cli list-models --filter text-generation); do huggingface-cli scan-model $model; done automates compliance scanning across a model set. Integration with Git LFS (Large File Storage) enables efficient versioning: a model repo with 40 versions (fine-tunes, patches) uses <1GB local storage with sparse checkout; git clone --depth 1 --filter=blob:none --sparse followed by git sparse-checkout set . downloads only the latest revision, essential for CI/CD pipelines.
Hugging Face models are cached in ~/.cache/huggingface/hub by default; setting HF_HOME=/mnt/fast_ssd redirects cache to faster storage during iterative development. For production inference, copying the entire model to local storage eliminates network latency: python -c "from transformers import AutoModel; AutoModel.from_pretrained('meta-llama/Llama-2-7b', cache_dir='/model_cache')" pre-populates the cache, avoiding lazy loading on first inference request. Offline mode (export HF_DATASETS_OFFLINE=1; HF_HUB_OFFLINE=1) uses only cached models, critical for air-gapped deployments; missing models raise informative errors instead of attempting network access. For teams, configuring huggingface_hub.HfApi(token="hf_xxx", endpoint="https://hub-internal.company.com") uses private Hub instances (Hugging Face Enterprise) for access control and audit logging. Cache statistics are queryable: python -c "from huggingface_hub import scan_cache_dir; info = scan_cache_dir(); print(f'Total cache: {info.total_size_human}')" shows cache size and unused models; periodic pruning (remove revisions not accessed in 30 days) conserves disk space in shared cluster environments where dozens of models are cached.