Infrastructure

HF Spaces

Hugging Face Spaces is a free hosting platform for ML demos and APIs — Gradio, Streamlit, or Docker apps running on CPUs or GPUs, with zero infrastructure management.

Free tier
CPU + 16GB RAM
GPU tiers
T4 to A100
Frameworks
Gradio/Streamlit/Docker

Table of Contents

SECTION 01

What Is HF Spaces?

Hugging Face Spaces is a platform for hosting ML demos and lightweight APIs. Deploy a Gradio or Streamlit app by pushing code to a git repository — no Kubernetes, no CI/CD, no infrastructure config. Free tier: CPU instance (2 vCPUs, 16GB RAM). Paid tiers: NVIDIA T4 ($0.60/hr), A10G ($3.15/hr), A100 ($4.13/hr). The community hosts hundreds of thousands of model demos on Spaces.

SECTION 02

Gradio Quick Start

Gradio is the simplest way to build an ML demo UI in Python.

import gradio as gr
from transformers import pipeline
pipe = pipeline("text-generation", model="gpt2")
def generate(prompt: str, max_tokens: int = 100) -> str:
    result = pipe(prompt, max_new_tokens=max_tokens, do_sample=True)
    return result[0]["generated_text"]
demo = gr.Interface(
    fn=generate,
    inputs=[gr.Textbox(label="Prompt"), gr.Slider(50, 500, value=100, label="Max tokens")],
    outputs=gr.Textbox(label="Generated text"),
    title="GPT-2 Text Generation",
)
demo.launch()  # locally; push to HF Spaces to deploy
SECTION 03

Streamlit Apps

For more complex dashboards, use Streamlit. " "Set sdk: streamlit in README.md frontmatter.

import streamlit as st
from openai import OpenAI
client = OpenAI(api_key=st.secrets["OPENAI_API_KEY"])
st.title("RAG Demo")
query = st.text_input("Ask a question:")
if query:
    with st.spinner("Thinking..."):
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": query}],
        )
        st.write(response.choices[0].message.content)
SECTION 04

Docker Spaces

For full control, use Docker Spaces: push a Dockerfile and run any application. FastAPI, Flask, or custom servers all work. Expose port 7860 (the Spaces default). Use this for: custom APIs, applications with complex dependencies, or anything that doesn't fit the Gradio/Streamlit templates.

SECTION 05

GPU Spaces

Upgrade a Space to a GPU runtime in the Space settings. T4 Spaces ($0.60/hr): good for 7B inference, demos. A10G Spaces ($3.15/hr): 24GB VRAM, production-like inference. A100 Spaces ($4.13/hr): 80GB, for large models. GPU Spaces are billed by actual uptime — they sleep after inactivity to reduce cost. ZeroGPU (free, community-allocated): shared GPU for short inference bursts.

SECTION 06

Limitations & Production Use

Spaces are designed for demos, not production traffic. Limitations: no persistent storage (files are lost on restart), cold start latency (sleeping instances take 20–60 s to wake), and no SLA guarantees. For production, use HF Inference Endpoints (managed API) or self-host on cloud. Spaces are ideal for: model demos, internal tools, hackathon projects, and rapid prototyping.

SECTION 07

Computing Resources and Hardware

Hugging Face Spaces can run on CPU or GPU hardware. Free tier defaults to CPU (slow but free). Paid tier offers GPUs: NVIDIA T4 ($0.03/hour), A100 ($0.60/hour), H100 ($2/hour). Choose based on inference speed requirements. CPU works for simple text models, GPU needed for image/large language models. Monitor costs: a T4 running continuously is ~$25/month. Optimize by scaling down when not in use, using model quantization, or batching requests.

Docker containers enable custom environments. Pre-built base images support Python, Node.js, and more. Install dependencies in Dockerfile. Spaces automatically scales your container, pulling newer versions on redeploy. Container size is limited (~10GB), so optimize by removing unnecessary dependencies and using lightweight base images.

FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY app.py .
CMD ["python", "app.py"]

Resource management: monitor CPU/GPU usage and adjust instance size accordingly. Set up auto-restart on failure for production reliability. Use health checks in your app to detect and report issues to Spaces monitoring. Implement graceful degradation: if a dependency fails, return a reasonable default instead of crashing.

import psutil
def check_resources():
    cpu_pct = psutil.cpu_percent(interval=1)
    memory_pct = psutil.virtual_memory().percent
    if cpu_pct > 90 or memory_pct > 85:
        print(f"Warning: CPU {cpu_pct}%, Memory {memory_pct}%")
    return {"cpu": cpu_pct, "memory": memory_pct}
HardwareCost/HourTypical UseSpeed
CPUFreeText inference, small modelsSlow
T4 GPU$0.03Image models, small LLMsFast
A100 GPU$0.60Large LLMs, videoVery Fast
H100 GPU$2.00Production LLMsExtremely Fast
SECTION 08

Deployment Best Practices

Version control your Space by linking to a GitHub repository. Every push to main automatically redeploys. This enables reproducible deployments and easy rollback. Use git tags for releases. Store secrets (API keys) in Spaces Settings, not in code. Access them via environment variables in your app.

Error handling and monitoring: implement comprehensive logging. Spaces provides stdout/stderr logs visible in the UI. Log all errors, warnings, and important events. Set up external monitoring (e.g., Sentry) for production Spaces. Test locally with `huggingface_hub` before deploying.

Performance optimization: cache model weights and tokenizers to avoid repeated downloads. Use model quantization (INT8, FP16) to reduce memory and speed up inference. Batch requests when possible. Profile your app with Python profilers to identify bottlenecks. A well-optimized Space can serve 10-100 requests per second.

Hugging Face Spaces democratizes ML deployment. From prototype to production in minutes. No infrastructure knowledge required. Auto-scaling, monitoring, and security included. Cost-effective for hobby projects and production workloads. This is the fastest way to deploy ML models and share with the world. Whether building a demo or production API, Spaces has you covered.

Hugging Face Spaces ecosystem: publish your Space and share URL. Spaces are discoverable in the Hub. Make your Space trending by building something useful. Some Spaces get 10K+ daily visitors. Monetize through Spaces Pro (add badges, collect email addresses). Community builds amazing demos: music generation, image editing, text summarization, code review. Start simple, iterate based on user feedback, scale as needed.

Integration possibilities: link Spaces with GitHub Actions for automated retraining. Trigger model updates on schedule or when new data arrives. Build pipelines that flow from data collection to model training to Spaces deployment. This automation enables rapid iteration without manual steps.

Collaboration features: invite team members to edit your Space. Version control through GitHub keeps history. Review changes before merge. Build sophisticated ML applications with teams. Spaces handles permissions, access control, and audit logs automatically.

SECTION 09

Advanced Features and Production Setup

Private Spaces: keep your model deployments private and share only with authorized users. Set access control to organization or specific users. This is essential for proprietary models and sensitive applications. Private Spaces still get auto-scaling and monitoring benefits.

Persistent storage: Spaces can mount persistent storage volumes. Save uploads, model outputs, or training checkpoints. Storage persists across restarts. Integrate with Hugging Face Datasets Hub for centralized data management. Build data pipelines that feed into your Space automatically.

API endpoints: expose your Space as an API. Clients send HTTP requests, get structured responses. Use the generated API documentation (Swagger/OpenAPI). This turns a demo into a production service. Rate limiting and authentication are configurable. Monitor API usage and performance through Spaces dashboard.

Gradio and Streamlit: Spaces supports both frameworks. Gradio for quick ML demos with minimal code. Streamlit for data apps and dashboards. Choose based on use case. Both deploy instantly. Switch frameworks by changing code without infrastructure changes. This flexibility enables rapid iteration.

Best practices: keep dependencies minimal to reduce startup time. Use model caching to avoid redownloading. Monitor logs for errors. Set up alerts for failures. Test locally before pushing to production. Use CI/CD with GitHub Actions for automated testing and deployment. These practices ensure reliable production Spaces.

Performance tuning: profile your app to find bottlenecks. Is it model loading, inference, or I/O? Optimize accordingly. Use batching for concurrent requests. Cache results when possible. Monitor resource usage. Scale up hardware if needed. A well-tuned Space can handle massive traffic efficiently and cost-effectively while delivering fast responses to users worldwide.

Real-world examples: a Space for image upscaling received 50K daily visitors. Another for text summarization serves 10K API requests per day. These are production deployments handling real user traffic. The developers didn't write a single line of infrastructure code. Spaces handled scaling, monitoring, security automatically. This is the power of managed platforms: focus on your model, Spaces handles the rest. Build your idea, deploy instantly, scale automatically, pay only for what you use. This democratization of ML deployment means anyone can build and share powerful AI applications. This is the fastest path from idea to production deployment for machine learning models and applications.

Hugging Face Spaces represents a paradigm shift in ML deployment. Historically, shipping models required DevOps expertise: containerization, Kubernetes, load balancing, monitoring. Spaces eliminates that complexity. Define your interface (Gradio or Streamlit), push to GitHub, and youare live. Auto-scaling handles traffic spikes. Monitoring catches errors. Backups protect your work. This infrastructure-as-a-service approach democratizes access to ML deployment for researchers and developers without ops expertise. The platform continues improving: new features, better hardware, advanced controls. This is production-grade ML deployment for everyone. Spaces is the future of ML sharing and collaboration. Scale from zero to millions of users without changing code. Collaborate with your team seamlessly. Hugging Face Spaces is the platform where machine learning meets production deployment with zero DevOps friction.

Hugging Face Spaces continue improving. New hardware options. Better integration with ecosystem. Improved performance. Community keeps building amazing applications. From research to production in minutes. This democratization of ML deployment is transformative. Anyone can share their models and applications with the world. Infrastructure barriers have been eliminated. Focus on your model, Spaces handles the rest. This is the future of ML collaboration and deployment. This platform is essential infrastructure for the modern machine learning world.