Available on Velar

Rent RTX 4090 GPU Cloud

NVIDIA RTX 4090 24GB on-demand from $1.00/hr. Per-second billing, scale to zero, no idle costs. Deploy your first workload in under 60 seconds.

VRAM24 GB
ArchitectureAda Lovelace
Per second$0.000278/sec

Best for RTX 4090 workloads

LLM inference

High CUDA core count makes the 4090 excellent for fast token generation. Run 7–13B models with low latency at $1.00/hr.

Image generation

1 TB/s+ memory bandwidth enables fast Stable Diffusion and SDXL generation. Ideal for high-throughput image workloads.

Fine-tuning with LoRA/QLoRA

24 GB VRAM is enough for fine-tuning 7–13B models with quantization. Pay per second for training runs — no idle cost.

RTX 4090 specifications

GPUNVIDIA RTX 4090 24GB
VRAM24 GB
ArchitectureAda Lovelace
CUDA Cores16,384
Tensor Cores512
Memory Bandwidth1,008 GB/s

RTX 4090 pricing on Velar

Serverless Jobs

Scale to zero · per-second billing

$1.00/hr

  • $0.000278/sec — billed to the second
  • Scale to zero when idle
  • No reserved capacity

Persistent Endpoint

Always-on · flat monthly rate

$600/mo

  • $0.83/hr effective — 17% cheaper than on-demand
  • Zero cold-start latency
  • Always available, guaranteed GPU

Pro plan required · See plans

Deploy on RTX 4090 in 12 lines of Python

No Dockerfile. No Kubernetes. Just a Python decorator and a GPU string.

inference.py
import velar

app = velar.App("rtx-inference")
image = velar.Image.from_registry(
    "pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime"
).pip_install("transformers", "accelerate")

@app.function(gpu="RTX4090", image=image)
def generate(prompt: str) -> str:
    from transformers import pipeline
    pipe = pipeline("text-generation", model="mistralai/Mistral-7B-v0.1")
    return pipe(prompt, max_new_tokens=256)[0]["generated_text"]

app.deploy()
print(generate.remote("Explain transformers in one paragraph"))

Deploy on RTX 4090 today

Start with $10 in free GPU credits. No credit card required. First workload live in under 60 seconds.