pip install velar-sdk  ·  $10 free credits

Your Python
function.
Any GPU.
In minutes.

No Dockerfile. No Kubernetes. No YAML. Decorate your function — Velar builds the container, provisions the GPU, and runs your workload at per-second cost.

$10 free GPU credits
Per-second billing
A100 from $1.74/hr
import velar app = velar.App("transcriber") image = velar.Image.debian_slim().pip_install("faster-whisper") @app.function(gpu="L4", image=image)def transcribe(audio_url: str) -> str:
    from faster_whisper import WhisperModel    model = WhisperModel("large-v3")    segments, _ = model.transcribe(audio_url)    return " ".join(s.text for s in segments) # Transcribe 1000 files in parallel — runs on separate GPUsresults = list(transcribe.map(audio_urls))

Why it matters

The old way ships YAML.
Velar ships models.

This is the actual difference between setting up GPU infra yourself and using Velar.

Without Velar
Traditional GPU deployment
AWS / GCP / bare metal
# Dockerfile (~40 lines)FROM nvidia/cuda:12.1.0-base-ubuntu22.04RUN apt-get update && apt-get install -y python3 ...COPY requirements.txt .RUN pip install -r requirements.txt # kubernetes.yaml (~80 lines)apiVersion: apps/v1kind: Deploymentspec:  template:    spec:      nodeSelector:        cloud.google.com/gke-accelerator: ... # + ECR push · IAM roles · VPC config · autoscaling ...
Setup time: 2–6 hours·Config lines: 200+
With Velar
Pure Python deployment
velar-sdk — pip install velar
import velar app = velar.App("inference")image = velar.Image.from_registry("pytorch/pytorch:2.4.0-cuda12.4")          .pip_install("transformers") @app.cls(gpu="A100", image=image)  # model loads ONCEclass TextModel:    def __enter__(self):        from transformers import pipeline        self.pipe = pipeline("text-generation", model="meta-llama/Llama-3.1-8B") app.deploy()  # that's it
Setup time: < 60 seconds·Config lines: 0

Features

Built for ML engineers,
not DevOps teams

01
@app.cls: Warm models, instant calls
0s model reload after first call

Decorate your class with @app.cls. __enter__ runs once when the container starts — your model stays in GPU memory for every subsequent call. No cold model loads. No wasted compute.

02
.map(): Parallel GPU fleet
50 GPUs in parallel, out of the box

Call .map() on any function with a list of inputs. Velar provisions one GPU per item and runs them all in parallel. Embed 50k documents, transcribe 1000 audio files, or generate 10k images — same code, any scale.

03
Volume.from_name(): Cache once, run forever
16GB LLaMA downloaded exactly once

Declare a Volume and mount it on any function. HuggingFace models, datasets, checkpoints — download once to persistent storage, reused on every run. Never wait 5 minutes for a model download again.

04
Secret.from_name(): Zero credential leaks
Secrets never touch your source code

Store HF_TOKEN, API keys, or any credential in Velar's encrypted vault. Reference by name in your function — the value is injected at runtime and never appears in your code, image, or git history.

Early days

Be one of the first to build on Velar

Velar is new — and that works in your favor. You get founder-level support, direct access to the team, and per-second pricing built to win you over, not to maximize margin.

Start for free

Pricing

GPU by the second.

No minimums. No reserved capacity. The price shown is what you pay.

L4 24GB
24 GB VRAM
$0.49
per hour
$0.000135 / sec
RTX 4090
24 GB VRAM
$0.74
per hour
$0.000205 / sec
L40S 48GB
48 GB VRAM
$1.08
per hour
$0.000299 / sec
POPULAR
A100 80GB
80 GB VRAM
$1.74
per hour
$0.000483 / sec
H100 PCIe 80GB
80 GB VRAM
$2.99
per hour
$0.000830 / sec
H100 SXM
80 GB VRAM
$3.36
per hour
$0.000934 / sec
H200 141GB
141 GB VRAM
$4.49
per hour
$0.001247 / sec
See full pricing breakdownAll GPUs billed per-second · no minimums

FAQ

Common questions

Can I use any base Docker image?
Yes — any public registry image works. Velar generates the Dockerfile from your function automatically, so most users never write one. If you need a custom base, pass it to Image.from_registry() and Velar layers your deps on top.
What if my job runs longer than expected?
Credits are reserved upfront based on a configurable timeout. When the job finishes or you cancel it, the unused portion is refunded immediately. You only pay for the seconds the GPU was actually running.
What happens after my $10 free credits?
Add a payment method and continue at the same per-second rates. No minimum spend, no reserved capacity, no contracts. You can set a hard monthly spend limit in the dashboard.
How does the image cache work?
Velar uses a content hash of your Dockerfile and handler code. If nothing changed since the last deploy, the build step is skipped entirely — typically under 15 seconds for a warm redeploy.

Ship your
first model
in 60 seconds.

$10 in GPU credits when you sign up. Cancel at any time.