NVIDIA H100 SXM 80GB on-demand from $3.77/hr. Per-second billing, scale to zero, no idle costs. Deploy your first workload in under 60 seconds.
3.35 TB/s memory bandwidth — 64% faster than the A100. Serve more requests per second with lower latency. Ideal for production APIs under heavy load.
NVLink interconnect enables fast multi-GPU communication. Run distributed training jobs across multiple H100s with linear scaling.
When every millisecond matters, the H100 SXM delivers. Deploy mission-critical inference APIs with the fastest available GPU on Velar.
| GPU | NVIDIA H100 SXM 80GB |
| VRAM | 80 GB |
| Architecture | Hopper |
| CUDA Cores | 16,896 |
| Tensor Cores | 528 |
| Memory Bandwidth | 3,350 GB/s |
Serverless Jobs
Scale to zero · per-second billing
$3.77/hr
Persistent Endpoint
Always-on · flat monthly rate
$2,500/mo
Pro plan required · See plans
No Dockerfile. No Kubernetes. Just a Python decorator and a GPU string.
import velar
app = velar.App("h100-inference")
image = velar.Image.from_registry(
"pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime"
).pip_install("vllm")
@app.function(gpu="H100", image=image)
def serve(prompt: str) -> str:
from vllm import LLM, SamplingParams
llm = LLM(model="meta-llama/Meta-Llama-3-70B-Instruct", tensor_parallel_size=1)
params = SamplingParams(max_tokens=1024, temperature=0.7)
output = llm.generate([prompt], params)
return output[0].outputs[0].text
app.deploy()
print(serve.remote("Write a technical overview of GPU memory bandwidth"))Start with $10 in free GPU credits. No credit card required. First workload live in under 60 seconds.