Available on Velar

Rent L4 24GB GPU Cloud

NVIDIA L4 24GB on-demand from $0.66/hr. Per-second billing, scale to zero, no idle costs. Deploy your first workload in under 60 seconds.

VRAM24 GB
ArchitectureAda Lovelace
Per second$0.000183/sec

Best for L4 24GB workloads

Image & video generation

Run Stable Diffusion, SDXL, and video models. The L4 delivers great throughput for image workloads at the lowest cost per image on Velar.

Small LLM inference

Deploy Llama 3 8B, Mistral 7B, Qwen 2.5 7B, and similar models with vLLM or TGI. Enough VRAM for 7–13B parameter models in fp16.

Batch embeddings & transcription

Process large datasets cost-efficiently. Run Whisper, text embedding models, or custom inference pipelines at $0.000183/sec.

L4 24GB specifications

GPUNVIDIA L4 24GB
VRAM24 GB
ArchitectureAda Lovelace
CUDA Cores7,680
Tensor Cores240
Memory Bandwidth300 GB/s

L4 24GB pricing on Velar

Serverless Jobs

Scale to zero · per-second billing

$0.66/hr

  • $0.000183/sec — billed to the second
  • Scale to zero when idle
  • No reserved capacity

Deploy on L4 24GB in 12 lines of Python

No Dockerfile. No Kubernetes. Just a Python decorator and a GPU string.

inference.py
import velar

app = velar.App("l4-inference")
image = velar.Image.from_registry(
    "pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime"
).pip_install("transformers", "accelerate")

@app.function(gpu="L4", image=image)
def run(prompt: str) -> str:
    from transformers import pipeline
    pipe = pipeline("text-generation", model="meta-llama/Llama-3.2-1B")
    return pipe(prompt)[0]["generated_text"]

app.deploy()
print(run.remote("Hello from L4!"))

Deploy on L4 24GB today

Start with $10 in free GPU credits. No credit card required. First workload live in under 60 seconds.