Available on Velar

Rent L4 24GB GPU Cloud

NVIDIA L4 24GB on-demand from $0.66/hr. Per-second billing, scale to zero, no idle costs. Deploy your first workload in under 60 seconds.

VRAM24 GB

ArchitectureAda Lovelace

Per second$0.000183/sec

Deploy on L4 24GB View full pricing

Best for L4 24GB workloads

Image & video generation

Run Stable Diffusion, SDXL, and video models. The L4 delivers great throughput for image workloads at the lowest cost per image on Velar.

Small LLM inference

Deploy Llama 3 8B, Mistral 7B, Qwen 2.5 7B, and similar models with vLLM or TGI. Enough VRAM for 7–13B parameter models in fp16.

Batch embeddings & transcription

Process large datasets cost-efficiently. Run Whisper, text embedding models, or custom inference pipelines at $0.000183/sec.

L4 24GB specifications

GPU	NVIDIA L4 24GB
VRAM	24 GB
Architecture	Ada Lovelace
CUDA Cores	7,680
Tensor Cores	240
Memory Bandwidth	300 GB/s

L4 24GB pricing on Velar

Serverless Jobs

Scale to zero · per-second billing

$0.66/hr

$0.000183/sec — billed to the second
Scale to zero when idle
No reserved capacity

Deploy on L4 24GB in 12 lines of Python

No Dockerfile. No Kubernetes. Just a Python decorator and a GPU string.

inference.py

import velar

app = velar.App("l4-inference")
image = velar.Image.from_registry(
    "pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime"
).pip_install("transformers", "accelerate")

@app.function(gpu="L4", image=image)
def run(prompt: str) -> str:
    from transformers import pipeline
    pipe = pipeline("text-generation", model="meta-llama/Llama-3.2-1B")
    return pipe(prompt)[0]["generated_text"]

app.deploy()
print(run.remote("Hello from L4!"))

Try it free Read the docs

Also available on Velar

RTX 4090

24 GB VRAM

$1.00/hr

A100 80GB

80 GB VRAM

$2.36/hr

Deploy on L4 24GB today

Start with $10 in free GPU credits. No credit card required. First workload live in under 60 seconds.

Get Started Free View all GPU pricing