Use Cases/Batch Processing

GPU batch processing

Process millions of records with GPU-accelerated batch jobs. Velar automatically parallelizes your work across multiple GPU instances, handles retries, and scales back to zero when the job is done. Pay only for the compute time used.

How Velar parallelizes batch jobs

The pattern is simple: split your dataset into chunks, call your GPU function with .remote() for each chunk, and Velar runs all of them concurrently across as many GPU instances as needed. Results are returned when all chunks complete.

  • Concurrent execution across multiple GPU instances
  • Scale from 1 to hundreds of parallel workers
  • Scale to zero after the job completes
  • Per-second billing — no idle GPU cost
  • No job scheduler or queue to manage
  • Works with any Python data processing library

Common batch workloads

Semantic embeddings

Generate vector embeddings for large document corpora to power search, RAG pipelines, and recommendation systems.

Models: BGE, E5, all-MiniLM, OpenCLIP

Audio transcription

Transcribe large audio or video archives with Whisper large-v3. Process hours of audio per minute with parallelized GPU jobs.

Models: Whisper large-v3, Whisper medium

AI data labeling

Use vision models or LLMs to label, classify, or augment training data at scale before a fine-tuning run.

Models: LLaVA, InternVL, custom classifiers

Re-ranking & scoring

Run cross-encoders or reward models over large candidate sets for retrieval augmentation or RLHF data preparation.

Models: Cross-encoders, DeBERTa, custom

Code examples

Embeddings pipeline or audio transcription at scale.

Embedding 100k documents

embed.py
import velar

app = velar.App("batch-embed")

image = velar.Image.from_registry(
    "pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime"
).pip_install("sentence-transformers")

@app.function(gpu="L4", image=image)
def embed_batch(texts: list[str]) -> list[list[float]]:
    from sentence_transformers import SentenceTransformer
    model = SentenceTransformer("BAAI/bge-large-en-v1.5")
    return model.encode(texts, batch_size=64).tolist()

# Embed 100k documents — Velar runs chunks in parallel across GPUs
def embed_all(documents: list[str]):
    app.deploy()
    chunk_size = 256
    chunks = [documents[i:i+chunk_size] for i in range(0, len(documents), chunk_size)]
    # All chunks run concurrently on separate GPU instances
    results = [embed_batch.remote(chunk) for chunk in chunks]
    return [vec for batch in results for vec in batch]

Whisper transcription pipeline

transcribe.py
import velar

app = velar.App("transcription")

image = velar.Image.from_registry(
    "pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime"
).pip_install("openai-whisper", "ffmpeg-python")

@app.function(gpu="L4", image=image)
def transcribe(audio_bytes: bytes, language: str = "en") -> str:
    import whisper, tempfile, os

    model = whisper.load_model("large-v3")
    with tempfile.NamedTemporaryFile(suffix=".mp3", delete=False) as f:
        f.write(audio_bytes)
        tmp_path = f.name

    result = model.transcribe(tmp_path, language=language)
    os.unlink(tmp_path)
    return result["text"]

app.deploy()
# Transcribe 1000 audio files in parallel
audio_files = load_audio_files()
transcripts = [transcribe.remote(audio) for audio in audio_files]

Process your first dataset on GPU

$10 in free GPU credits. No credit card required.