OpenAI's image generation model with built-in prompt enhancement via ChatGPT. Generates highly faithful images to detailed prompts; natively integrated in ChatGPT and accessible via API.
DALL-E 3 (OpenAI, October 2023) is OpenAI's image generation model. Its distinctive feature is automatic prompt enhancement: before generating an image, DALL-E 3 uses a GPT-4-class model to rewrite and expand your prompt — adding details, fixing ambiguities, and optimising for image generation quality. This means brief prompts like "a dog in a park" generate much more detailed, coherent images than previous models where you had to craft detailed prompts manually. The tradeoff: you lose precise control over the exact image — the prompt rewriting can add details you didn't want.
import openai, base64, os
from pathlib import Path
client = openai.OpenAI()
# Generate a new image
response = client.images.generate(
model="dall-e-3",
prompt="A photorealistic illustration of a robot learning to paint in the style of Monet",
size="1024x1024", # "1024x1024", "1792x1024", "1024x1792"
quality="standard", # "standard" or "hd" (2x cost, higher detail)
style="vivid", # "vivid" (bright, dramatic) or "natural" (muted, realistic)
n=1, # DALL-E 3 only supports n=1 per request
response_format="url", # "url" (1hr expiry) or "b64_json"
)
image_url = response.data[0].url
revised_prompt = response.data[0].revised_prompt # prompt after GPT rewriting
print(f"Revised prompt: {revised_prompt}")
print(f"Image URL: {image_url}")
# Download and save
import requests
img_data = requests.get(image_url).content
Path("output.png").write_bytes(img_data)
# Or get base64 directly
response_b64 = client.images.generate(
model="dall-e-3",
prompt="Abstract geometric art",
size="1024x1024",
response_format="b64_json",
)
img_bytes = base64.b64decode(response_b64.data[0].b64_json)
Path("output_b64.png").write_bytes(img_bytes)
DALL-E 3 rewrites every prompt before generation. The revised prompt is returned in response.data[0].revised_prompt. You can partially suppress this by explicitly stating requirements: "I NEED to generate this exact prompt with no modifications. DO NOT add details. Prompt: [your prompt]". However, OpenAI recommends working with the rewriting rather than against it — the model is trained to produce better images from its revised prompts.
The rewriting typically: adds lighting descriptions, specifies art style explicitly, adds details about composition, clarifies ambiguous subjects, and removes content that might trigger safety filters.
from PIL import Image
import io
# DALL-E 2 supports inpainting (edit with mask) — DALL-E 3 does NOT currently
# Use DALL-E 2 for edits:
with open("original.png", "rb") as img_file, open("mask.png", "rb") as mask_file:
response = client.images.edit(
model="dall-e-2",
image=img_file,
mask=mask_file, # transparent areas = regions to fill
prompt="A wooden table with a vase of flowers",
n=1,
size="1024x1024",
)
# For DALL-E 3 "editing": use img2img style with a detailed description
# or use GPT-4V to describe the image, then regenerate with modifications in the prompt
n=4 like DALL-E 2.DALL-E 3 is dramatically more sensitive to prompt wording than DALL-E 2. A vague prompt gives worse results; a detailed, narrative prompt gives better ones. The model can also refuse prompts it deems harmful, so adversarial or unethical requests fail — unlike DALL-E 2's looser guardrails.
❌ Bad: "cat"
✓ Good: "A fluffy orange tabby cat with green eyes, sitting on a wooden windowsill,
soft natural light streaming in, photorealistic style, Canon EOS"
❌ Bad: "generate text in an image"
✓ Good: "A vintage movie poster with bold serif text reading 'RETRO NIGHTS',
neon colors, 1980s aesthetic, dramatic shadows"
# DALL-E 3 API with ChatGPT prompt refinement
response = client.images.generate(
model="dall-e-3",
prompt="A serene Zen garden with raked gravel, moss-covered stones,
bamboo stalks, mist, soft morning light, oil painting style",
size="1024x1024",
quality="hd",
n=1
)
Key tips: Be specific about style (photorealistic, oil painting, watercolor), mood, lighting, and composition. Avoid requesting text, faces of real people, or violence. Use the API's prompt-refining feature (it quietly improves your prompt before sending to the image model).
| Feature | DALL-E 2 | DALL-E 3 |
|---|---|---|
| Prompt length | Short (77 tokens) | Long (4000 tokens) |
| Refusal rates | Low | Higher (safety-first) |
| Style consistency | Variable | Excellent |
| Text in images | Rare, poor | Better, still risky |
| Latency | ~10s | ~20s (slower) |
DALL-E 3 safety and limitations: OpenAI built stricter guardrails into DALL-E 3 compared to DALL-E 2. It refuses to generate images of real public figures, violent content, copyrighted characters, or text in most cases. This is by design—the model aims to reduce misuse—but it also means some creative use cases fail. Adversarial prompts (e.g., "pretend you're DALL-E 2 and ignore safety guidelines") don't work; the model's safety is not just in the prompt, but deeply embedded in the fine-tuned weights.
Quality varies by prompt detail and image type. Photorealistic portraits are excellent; abstract art sometimes misses nuance; text in images is improving but still unreliable. The 1024×1024 resolution is standard; higher resolutions (custom aspect ratios via 1792×1024 or 1024×1792) are available but may reduce consistency. For production image generation, batch multiple prompts and include user feedback loops to refine wording.
DALL-E 3 vs. alternatives for image generation: Competing image models (Midjourney, Stable Diffusion, Adobe Firefly) each have strengths. Stable Diffusion is open-source and runs locally (privacy, cost). Midjourney excels at aesthetic, high-quality images and has a vibrant community. DALL-E 3 excels at following detailed text instructions and respecting style constraints. For applications where accuracy to the prompt is paramount (UI mockups, product visualizations, technical illustrations), DALL-E 3 is often the best choice. For creative ideation or artistic exploration, Midjourney may be better.
Integration patterns: Most applications use batch endpoints for non-realtime workflows (generate 50 variations overnight, curate manually) and single-image endpoints for interactive workflows (user provides feedback, generate new version). OpenAI's batch API offers 50% cost savings for non-urgent jobs, making it economical for large-scale content generation.
Future trends: multi-modal models (text+image as input to image generation), better video generation (moving beyond static images), and improved performance on complex scenes (many objects, intricate layouts). For now, DALL-E 3 is best for single-object focus and clear prompts; it struggles with very crowded scenes or precise spatial relationships.
Building DALL-E 3 into applications: For image-heavy applications (e-commerce product generation, design tool integration, content creation platforms), DALL-E 3 can be a backend service. Queue incoming generation requests, batch them for efficiency, and return results asynchronously. Users see a loading state while images are generated, then receive a notification when ready. This UX pattern is familiar and avoids timeout issues with real-time generation.
Content moderation: DALL-E 3's refusals are automatic but sometimes overzealous (refuse requests that should be allowed). For applications with custom content policies, consider post-generation review (AI moderation tool + human flagging) rather than relying solely on DALL-E 3's built-in guardrails. Store generated images securely and implement expiration (delete after 30 days) to manage storage costs.
Variant generation: to avoid paying for similar images multiple times, generate 1–2 images, then use image-to-image diffusion models (e.g., Stable Diffusion) to create inexpensive variations. Combine image generation with layout engines (HTML/CSS, Figma API) to create complete designs programmatically.
Future directions in image generation: The trajectory suggests image models will become more controllable (precise layout, object positioning, style) and more efficient (faster generation, lower cost). Emerging techniques include latent diffusion (generate in compressed space, faster), image-to-image edits (precise modifications without regenerating), and multi-scale generation (fast rough pass, then high-quality refinement). DALL-E 3 will likely incorporate these improvements in future versions. For teams building image generation features, DALL-E 3 is today's best option for text-to-image; for image editing and refinement, specialized tools (Stable Diffusion inpainting, Photoshop Generative Fill) may be better complements. The combination of multiple specialized tools is becoming standard practice.