Free to start · No credit card

Run a
Llama 3 server
in 12 lines
of Python.

No Dockerfile. No Kubernetes. No YAML. Write a function, call .deploy() — Velar builds the container, provisions the GPU, and keeps it running.

Deploy your first model See live pricing

$10 free GPU credits

Per-second billing

A100 from $2.36/hr

“Replaced 400 lines of infra YAML with 14 lines of Python. Deployed in 38 seconds.”

Matías R.

ML Engineer · Startup

“First A100 job ran while I was still reading the README. Billed $0.023.”

Sarah K.

Research Scientist

inference.py — 12 lines to production GPU

import velar app = velar.App("llama3-inference") image = velar.Image.from_registry( "pytorch/pytorch:2.1.0-cuda12.1").pip_install("transformers", "accelerate") @app.function(gpu="A100", image=image)def generate(prompt: str) -> str:

from transformers import pipeline pipe = pipeline("text-generation", model="meta-llama/Meta-Llama-3-8B") return pipe(prompt, max_new_tokens=512)[0]["generated_text"] # One command. Velar does the rest.app.deploy() # Call from anywhere — runs on GPU.print(generate.remote("Explain backpropagation"))

Lines of Python to production GPU

<60_s

Median cold start time

$0.020

Cost of a 30s A100 job

GPU types from L4 to H200

Why it matters

The old way ships YAML.
Velar ships models.

This is the actual difference between setting up GPU infra yourself and using Velar.

Without Velar

Traditional GPU deployment

AWS / GCP / bare metal

# Dockerfile (~40 lines)FROM nvidia/cuda:12.1.0-base-ubuntu22.04RUN apt-get update && apt-get install -y python3 ...COPY requirements.txt .RUN pip install -r requirements.txt # kubernetes.yaml (~80 lines)apiVersion: apps/v1kind: Deploymentspec: template: spec: nodeSelector: cloud.google.com/gke-accelerator: ... # + ECR push · IAM roles · VPC config · autoscaling ...

Setup time: 2–6 hours·Config lines: 200+

With Velar

Pure Python deployment

velar-sdk — pip install velar

import velar app = velar.App("my-model")image = velar.Image.from_registry("pytorch/pytorch:2.1") .pip_install("transformers") @app.function(gpu="A100", image=image)def run(prompt: str): ... ✓ app.deploy() # that's it

Setup time: < 60 seconds·Config lines: 0

Features

Built for ML engineers,
not DevOps teams

Every design decision in Velar optimizes for shipping models fast, not managing infrastructure.

Pure Python. Zero config.

0 Dockerfiles required

Velar generates the container from your Python function. Specify your base image and pip packages inline — no Dockerfile, no YAML, no Kubernetes. If you know Python, you know Velar.

Pay per second. Billed to the cent.

$0.020 / 30s on A100

No hourly minimums. No reserved capacity. Your workload runs, you pay for the exact seconds it ran. A 30-second A100 job costs $0.020. Cancel anytime — unused credits refunded automatically.

Instant cached redeploys.

0s build if code unchanged

Velar uses content-addressed image caching. If your code and dependencies haven't changed, the build step is skipped entirely. Iteration cycles that took 8 minutes take 4 seconds.

What engineers say

Trusted by ML engineers
who hate managing infra

★★★★★

“I had a Llama 3 inference endpoint running in 47 seconds. I spent more time reading the README than setting it up.”

Daniel R.

ML Engineer · Series A startup

★★★★★

“We replaced a 3-week AWS SageMaker setup with 2 hours of Velar. The per-second billing alone saved us $800 the first week.”

Ana K.

Research Lead · AI lab

★★★★★

“Finally, a GPU service that doesn't require a DevOps hire just to run a fine-tuning job. The Python SDK is exactly what it should be.”

Thomas C.

Founding Engineer · LLM startup

Pricing

GPU by the second.

No minimums. No reserved capacity. The price shown is what you pay.

L40S 48GB

48 GB VRAM

$1.46

per hour

$0.000406 / sec

H100 PCIe 80GB

80 GB VRAM

$4.06

per hour

$0.001128 / sec

H200 141GB

141 GB VRAM

$6.10

per hour

$0.001694 / sec

All GPUs billed per-second. See full pricing breakdown →

FAQ

Common questions

Can I use any base Docker image?

Yes — any public registry image works. Velar generates the Dockerfile from your function automatically, so most users never write one. If you need a custom base, pass it to Image.from_registry() and Velar layers your deps on top.

What if my job runs longer than expected?

Credits are reserved upfront based on a configurable timeout. When the job finishes or you cancel it, the unused portion is refunded immediately. You only pay for the seconds the GPU was actually running.

What happens after my $10 free credits?

Add a payment method and continue at the same per-second rates. No minimum spend, no reserved capacity, no contracts. You can set a hard monthly spend limit in the dashboard.

How does the image cache work?

Velar uses a content hash of your Dockerfile and handler code. If nothing changed since the last deploy, the build step is skipped entirely — typically under 15 seconds for a warm redeploy.

Ship your
first model
in 60 seconds.

$10 in GPU credits when you sign up. No credit card required. Cancel at any time.

Deploy your first model Read docs

Run aLlama 3 serverin 12 linesof Python.

The old way ships YAML.Velar ships models.

Built for ML engineers,not DevOps teams

Trusted by ML engineerswho hate managing infra

GPU by the second.

Common questions

Ship yourfirst modelin 60 seconds.

Run a
Llama 3 server
in 12 lines
of Python.

The old way ships YAML.
Velar ships models.

Built for ML engineers,
not DevOps teams

Trusted by ML engineers
who hate managing infra

Ship your
first model
in 60 seconds.