How We Work — TensorTune

Scope & intro call

Free

We start with a conversation. You tell us the model(s), the hardware target, and the metric that's actually hurting — latency, throughput, cost per request, VRAM headroom. We define what "better" means in numbers, sign an NDA, and agree on the exact scope of the audit. No charge, no obligation.

You get →A defined scope, an NDA, and a fixed price & timeline for the audit.

Inference audit

Paid · fixed price

We capture a reproducible baseline, then profile your pipeline end-to-end on your hardware or ours. Using Nsight Systems, Nsight Compute, runtime telemetry and workload-specific load tests, we measure where time and hardware budget actually go — prefill vs decode, TTFT and inter-token latency, p50/p95/p99, KV-cache pressure, memory bandwidth, batching and CPU/GPU offload. No guessing; everything is instrumented.

Fixed price and fixed timeline, agreed before we begin
Typically 1–2 weeks depending on pipeline complexity
Load-tested at 1, 2, 4, 8, 10+ concurrent clients
Run on your infrastructure, or on a rented instance we manage

You get →A complete, measured map of your performance bottlenecks.

Findings, roadmap & quote

Included in audit

You receive a written report, not a slide deck: a benchmark matrix with reproducible test scripts, ranked bottlenecks with measured numbers, and the projected latency and cost improvement for each fix. Critically, it includes a firm fixed quote and timeline for the optimization work itself — so you decide with real data, not estimates.

Benchmark matrix + reproducible test scripts
Profiling report: CPU/GPU/kernel bottlenecks, ranked by priority
Runtime configuration recommendations for your target hardware

You get →A report your own team could act on, plus an exact price & schedule for the next stage.

Optimization

Optional · separate engagement

If you choose to proceed, we execute the fixes against the agreed scope, price and timeline — runtime and quantization tuning, serving-layer and batching changes, kernel-level work, or fine-tuning where it's in scope. We validate with before/after benchmarks (TTFT, decode latency, throughput, VRAM, power, p95/p99) and check quality impact, then hand off a technical report, recommended configuration, reproducible scripts and a next-step roadmap.

You get →The optimized system, plus measured proof of the improvement.

Why the audit is paid

The audit has standalone value.

It's senior engineering time spent measuring a real system — and the report is useful on its own. Even if you never hire us for the optimization, your team walks away knowing exactly where the performance and cost are leaking, and what each fix is worth.

If you proceed to the optimization stage, the full audit fee is credited toward that engagement. So the audit effectively becomes a no-risk first step.

Audit priceFixed quote

Timeline1–2 weeks

DeliverableWritten report + roadmap

If you continueFee credited

What each side brings

What we need from you

Access to the model(s) and the inference pipeline, or a representative reproduction
The hardware target you deploy on (or want to)
Your targets: latency, concurrency, context length and the metric that matters most
Any privacy or compliance constraints on data and deployment
A point of contact who knows the system

What you walk away with

A benchmark matrix and reproducible test scripts
A measured, ranked list of bottlenecks — no guesswork
Projected savings or speed-up for every recommended fix
A fixed price and timeline for the optimization work
A report your own engineers can execute, even without us

No surprises.
Just measured work.

Scope & intro call

Inference audit

Findings, roadmap & quote

Optimization

The audit has standalone value.

What each side brings

What we need from you

What you walk away with

Know exactly what your
inference is costing you.

Scope & intro call

Inference audit

Findings, roadmap & quote

Optimization

The audit has standalone value.

What each side brings

What we need from you

What you walk away with

Know exactly what yourinference is costing you.

Know exactly what your
inference is costing you.