NVIDIA L4 24GB on-demand from $0.66/hr. Per-second billing, scale to zero, no idle costs. Deploy your first workload in under 60 seconds.
Run Stable Diffusion, SDXL, and video models. The L4 delivers great throughput for image workloads at the lowest cost per image on Velar.
Deploy Llama 3 8B, Mistral 7B, Qwen 2.5 7B, and similar models with vLLM or TGI. Enough VRAM for 7–13B parameter models in fp16.
Process large datasets cost-efficiently. Run Whisper, text embedding models, or custom inference pipelines at $0.000183/sec.
| GPU | NVIDIA L4 24GB |
| VRAM | 24 GB |
| Architecture | Ada Lovelace |
| CUDA Cores | 7,680 |
| Tensor Cores | 240 |
| Memory Bandwidth | 300 GB/s |
Serverless Jobs
Scale to zero · per-second billing
$0.66/hr
No Dockerfile. No Kubernetes. Just a Python decorator and a GPU string.
import velar
app = velar.App("l4-inference")
image = velar.Image.from_registry(
"pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime"
).pip_install("transformers", "accelerate")
@app.function(gpu="L4", image=image)
def run(prompt: str) -> str:
from transformers import pipeline
pipe = pipeline("text-generation", model="meta-llama/Llama-3.2-1B")
return pipe(prompt)[0]["generated_text"]
app.deploy()
print(run.remote("Hello from L4!"))Start with $10 in free GPU credits. No credit card required. First workload live in under 60 seconds.