Available on Velar

Rent H100 SXM GPU Cloud

NVIDIA H100 SXM 80GB on-demand from $3.77/hr. Per-second billing, scale to zero, no idle costs. Deploy your first workload in under 60 seconds.

VRAM80 GB
ArchitectureHopper
Per second$0.001047/sec

Best for H100 SXM workloads

High-throughput LLM serving

3.35 TB/s memory bandwidth — 64% faster than the A100. Serve more requests per second with lower latency. Ideal for production APIs under heavy load.

Multi-GPU training runs

NVLink interconnect enables fast multi-GPU communication. Run distributed training jobs across multiple H100s with linear scaling.

Low-latency production inference

When every millisecond matters, the H100 SXM delivers. Deploy mission-critical inference APIs with the fastest available GPU on Velar.

H100 SXM specifications

GPUNVIDIA H100 SXM 80GB
VRAM80 GB
ArchitectureHopper
CUDA Cores16,896
Tensor Cores528
Memory Bandwidth3,350 GB/s

H100 SXM pricing on Velar

Serverless Jobs

Scale to zero · per-second billing

$3.77/hr

  • $0.001047/sec — billed to the second
  • Scale to zero when idle
  • No reserved capacity

Persistent Endpoint

Always-on · flat monthly rate

$2,500/mo

  • $3.47/hr effective — 8% cheaper than on-demand
  • Zero cold-start latency
  • Always available, guaranteed GPU

Pro plan required · See plans

Deploy on H100 SXM in 12 lines of Python

No Dockerfile. No Kubernetes. Just a Python decorator and a GPU string.

inference.py
import velar

app = velar.App("h100-inference")
image = velar.Image.from_registry(
    "pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime"
).pip_install("vllm")

@app.function(gpu="H100", image=image)
def serve(prompt: str) -> str:
    from vllm import LLM, SamplingParams
    llm = LLM(model="meta-llama/Meta-Llama-3-70B-Instruct", tensor_parallel_size=1)
    params = SamplingParams(max_tokens=1024, temperature=0.7)
    output = llm.generate([prompt], params)
    return output[0].outputs[0].text

app.deploy()
print(serve.remote("Write a technical overview of GPU memory bandwidth"))

Deploy on H100 SXM today

Start with $10 in free GPU credits. No credit card required. First workload live in under 60 seconds.