Free to start  ·  No credit card

Run a
Llama 3 server
in 12 lines
of Python.

No Dockerfile. No Kubernetes. No YAML. Write a function, call .deploy() — Velar builds the container, provisions the GPU, and keeps it running.

$10 free GPU credits
Per-second billing
A100 from $2.36/hr
inference.py — 12 lines to production GPU
import velar app = velar.App("llama3-inference") image = velar.Image.from_registry(    "pytorch/pytorch:2.1.0-cuda12.1").pip_install("transformers", "accelerate") @app.function(gpu="A100", image=image)def generate(prompt: str) -> str:
    from transformers import pipeline    pipe = pipeline("text-generation",                model="meta-llama/Meta-Llama-3-8B")    return pipe(prompt, max_new_tokens=512)[0]["generated_text"] # One command. Velar does the rest.app.deploy() # Call from anywhere — runs on GPU.print(generate.remote("Explain backpropagation"))
12
Lines of Python to production GPU
<60s
Median cold start time
$0.020
Cost of a 30s A100 job
7
GPU types from L4 to H200

Why it matters

The old way ships YAML.
Velar ships models.

This is the actual difference between setting up GPU infra yourself and using Velar.

Without Velar
Traditional GPU deployment
AWS / GCP / bare metal
# Dockerfile (~40 lines)FROM nvidia/cuda:12.1.0-base-ubuntu22.04RUN apt-get update && apt-get install -y python3 ...COPY requirements.txt .RUN pip install -r requirements.txt # kubernetes.yaml (~80 lines)apiVersion: apps/v1kind: Deploymentspec:  template:    spec:      nodeSelector:        cloud.google.com/gke-accelerator: ... # + ECR push · IAM roles · VPC config · autoscaling ...
Setup time: 2–6 hours·Config lines: 200+
With Velar
Pure Python deployment
velar-sdk — pip install velar
import velar app = velar.App("my-model")image = velar.Image.from_registry("pytorch/pytorch:2.1")          .pip_install("transformers") @app.function(gpu="A100", image=image)def run(prompt: str):    ... app.deploy() # that's it  
Setup time: < 60 seconds·Config lines: 0

Features

Built for ML engineers,
not DevOps teams

01
Pure Python. Zero config.
0 Dockerfiles required

Velar generates the container from your Python function. Specify your base image and pip packages inline — no Dockerfile, no YAML, no Kubernetes. If you know Python, you know Velar.

02
Pay per second. Billed to the cent.
$0.020 / 30s on A100

No hourly minimums. No reserved capacity. Your workload runs, you pay for the exact seconds it ran. A 30-second A100 job costs $0.020. Cancel anytime — unused credits refunded automatically.

03
Instant cached redeploys.
0s build if code unchanged

Velar uses content-addressed image caching. If your code and dependencies haven't changed, the build step is skipped entirely. Iteration cycles that took 8 minutes take 4 seconds.

What engineers say

Trusted by ML engineers
who hate managing infra

I had a Llama 3 inference endpoint running in 47 seconds. I spent more time reading the README than setting it up.

DR
Daniel R.
ML Engineer · Series A startup

We replaced a 3-week AWS SageMaker setup with 2 hours of Velar. The per-second billing alone saved us $800 the first week.

AK
Ana K.
Research Lead · AI lab

Finally, a GPU service that doesn't require a DevOps hire just to run a fine-tuning job. The Python SDK is exactly what it should be.

TC
Thomas C.
Founding Engineer · LLM startup

Pricing

GPU by the second.

No minimums. No reserved capacity. The price shown is what you pay.

L4 24GB
24 GB VRAM
$0.66
per hour
$0.000183 / sec
RTX 4090
24 GB VRAM
$1.00
per hour
$0.000278 / sec
L40S 48GB
48 GB VRAM
$1.46
per hour
$0.000406 / sec
POPULAR
A100 80GB
80 GB VRAM
$2.36
per hour
$0.000656 / sec
H100 PCIe 80GB
80 GB VRAM
$4.06
per hour
$0.001128 / sec
H100 SXM
80 GB VRAM
$4.57
per hour
$0.001269 / sec
H200 141GB
141 GB VRAM
$6.10
per hour
$0.001694 / sec

All GPUs billed per-second. See full pricing breakdown →

FAQ

Common questions

Can I use any base Docker image?
Yes — any public registry image works. Velar generates the Dockerfile from your function automatically, so most users never write one. If you need a custom base, pass it to Image.from_registry() and Velar layers your deps on top.
What if my job runs longer than expected?
Credits are reserved upfront based on a configurable timeout. When the job finishes or you cancel it, the unused portion is refunded immediately. You only pay for the seconds the GPU was actually running.
What happens after my $10 free credits?
Add a payment method and continue at the same per-second rates. No minimum spend, no reserved capacity, no contracts. You can set a hard monthly spend limit in the dashboard.
How does the image cache work?
Velar uses a content hash of your Dockerfile and handler code. If nothing changed since the last deploy, the build step is skipped entirely — typically under 15 seconds for a warm redeploy.

Ship your
first model
in 60 seconds.

$10 in GPU credits when you sign up. No credit card required. Cancel at any time.