NVIDIA RTX 4090 24GB on-demand from $1.00/hr. Per-second billing, scale to zero, no idle costs. Deploy your first workload in under 60 seconds.
High CUDA core count makes the 4090 excellent for fast token generation. Run 7–13B models with low latency at $1.00/hr.
1 TB/s+ memory bandwidth enables fast Stable Diffusion and SDXL generation. Ideal for high-throughput image workloads.
24 GB VRAM is enough for fine-tuning 7–13B models with quantization. Pay per second for training runs — no idle cost.
| GPU | NVIDIA RTX 4090 24GB |
| VRAM | 24 GB |
| Architecture | Ada Lovelace |
| CUDA Cores | 16,384 |
| Tensor Cores | 512 |
| Memory Bandwidth | 1,008 GB/s |
Serverless Jobs
Scale to zero · per-second billing
$1.00/hr
Persistent Endpoint
Always-on · flat monthly rate
$600/mo
Pro plan required · See plans
No Dockerfile. No Kubernetes. Just a Python decorator and a GPU string.
import velar
app = velar.App("rtx-inference")
image = velar.Image.from_registry(
"pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime"
).pip_install("transformers", "accelerate")
@app.function(gpu="RTX4090", image=image)
def generate(prompt: str) -> str:
from transformers import pipeline
pipe = pipeline("text-generation", model="mistralai/Mistral-7B-v0.1")
return pipe(prompt, max_new_tokens=256)[0]["generated_text"]
app.deploy()
print(generate.remote("Explain transformers in one paragraph"))Start with $10 in free GPU credits. No credit card required. First workload live in under 60 seconds.