Run Stable Diffusion, SDXL, FLUX, and video generation models at any scale. Velar provisions GPUs on demand, handles model loading, and scales to zero between requests — so you only pay for actual inference time.
Image generation workloads are bursty by nature — you might need to generate 10 images per hour or 10,000. Velar's serverless model means you never over-provision or under-provision GPU capacity.
Velar works with the full Hugging Face diffusers ecosystem as well as custom pipelines and ComfyUI workflows.
Text-to-image (SDXL, SD 1.5)
Generate images from text prompts. Most common use case.
Image-to-image
Transform or edit existing images with diffusion.
ControlNet
Guided generation with depth maps, pose, or edge inputs.
Video generation (AnimateDiff, SVD)
Generate short video clips from text or image prompts.
Single image generation or bulk parallel workloads.
Single image — SDXL
import velar
app = velar.App("image-gen")
image = velar.Image.from_registry(
"pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime"
).pip_install("diffusers", "transformers", "accelerate")
@app.function(gpu="L40S", image=image)
def generate_image(prompt: str, steps: int = 30):
from diffusers import StableDiffusionXLPipeline
import torch, io
pipe = StableDiffusionXLPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
use_safetensors=True,
).to("cuda")
result = pipe(prompt, num_inference_steps=steps)
buf = io.BytesIO()
result.images[0].save(buf, format="PNG")
return buf.getvalue()
app.deploy()
png_bytes = generate_image.remote("a cyberpunk city at sunset")Bulk generation — parallel GPUs
import velar
app = velar.App("bulk-generation")
image = velar.Image.from_registry(
"pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime"
).pip_install("diffusers", "transformers", "accelerate")
@app.function(gpu="L4", image=image)
def generate_batch(prompts: list[str]):
from diffusers import StableDiffusionPipeline
import torch, io
pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16,
).to("cuda")
images = pipe(prompts).images
return [
(buf := io.BytesIO(), img.save(buf, "PNG"), buf.getvalue())[2]
for img in images
]
# Generate 500 images across multiple GPUs in parallel
prompts = load_prompts() # your list
chunks = [prompts[i:i+4] for i in range(0, len(prompts), 4)]
results = [generate_batch.remote(chunk) for chunk in chunks]Diffusion models are VRAM-intensive. Use the smallest GPU that fits your model.
| Model | Recommended GPU | Notes |
|---|---|---|
| Stable Diffusion XL (SDXL) | L40S 48GB | Best quality, higher VRAM |
| Stable Diffusion 1.5 / 2.1 | L4 24GB | Fast, cost-effective |
| FLUX.1 | H100 80GB | State of the art quality |
| AnimateDiff / SVD | A100 80GB | Video generation |
| ControlNet variants | L4 24GB | Guided image generation |