Available on Velar

Rent H100 SXM GPU Cloud

NVIDIA H100 SXM 80GB on-demand from $3.77/hr. Per-second billing, scale to zero, no idle costs. Deploy your first workload in under 60 seconds.

VRAM80 GB

ArchitectureHopper

Per second$0.001047/sec

Deploy on H100 SXM View full pricing

Best for H100 SXM workloads

High-throughput LLM serving

3.35 TB/s memory bandwidth — 64% faster than the A100. Serve more requests per second with lower latency. Ideal for production APIs under heavy load.

Multi-GPU training runs

NVLink interconnect enables fast multi-GPU communication. Run distributed training jobs across multiple H100s with linear scaling.

Low-latency production inference

When every millisecond matters, the H100 SXM delivers. Deploy mission-critical inference APIs with the fastest available GPU on Velar.

H100 SXM specifications

GPU	NVIDIA H100 SXM 80GB
VRAM	80 GB
Architecture	Hopper
CUDA Cores	16,896
Tensor Cores	528
Memory Bandwidth	3,350 GB/s

H100 SXM pricing on Velar

Serverless Jobs

Scale to zero · per-second billing

$3.77/hr

$0.001047/sec — billed to the second
Scale to zero when idle
No reserved capacity

Persistent Endpoint

Always-on · flat monthly rate

$2,500/mo

$3.47/hr effective — 8% cheaper than on-demand
Zero cold-start latency
Always available, guaranteed GPU

Pro plan required · See plans

Deploy on H100 SXM in 12 lines of Python

No Dockerfile. No Kubernetes. Just a Python decorator and a GPU string.

inference.py

import velar

app = velar.App("h100-inference")
image = velar.Image.from_registry(
    "pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime"
).pip_install("vllm")

@app.function(gpu="H100", image=image)
def serve(prompt: str) -> str:
    from vllm import LLM, SamplingParams
    llm = LLM(model="meta-llama/Meta-Llama-3-70B-Instruct", tensor_parallel_size=1)
    params = SamplingParams(max_tokens=1024, temperature=0.7)
    output = llm.generate([prompt], params)
    return output[0].outputs[0].text

app.deploy()
print(serve.remote("Write a technical overview of GPU memory bandwidth"))

Try it free Read the docs

Also available on Velar

RTX 4090

24 GB VRAM

$1.00/hr

A100 80GB

80 GB VRAM

$2.36/hr

Deploy on H100 SXM today

Start with $10 in free GPU credits. No credit card required. First workload live in under 60 seconds.

Get Started Free View all GPU pricing