// HOW WE WORK · A TRANSPARENT, MEASURED PROCESS

No surprises.
Just measured work.

Every engagement follows the same predictable path. We agree on scope before we touch anything, the audit is fixed-price and paid, and you only commit to optimization work once you've seen the numbers and a firm quote. You always know what you're paying for and what you'll get back.

01

Scope & intro call

Free

We start with a conversation. You tell us the model(s), the hardware target, and the metric that's actually hurting — latency, throughput, cost per request, VRAM headroom. We define what "better" means in numbers, sign an NDA, and agree on the exact scope of the audit. No charge, no obligation.

You get →A defined scope, an NDA, and a fixed price & timeline for the audit.
02

Inference audit

We capture a reproducible baseline, then profile your pipeline end-to-end on your hardware or ours. Using Nsight Systems, Nsight Compute, runtime telemetry and workload-specific load tests, we measure where time and hardware budget actually go — prefill vs decode, TTFT and inter-token latency, p50/p95/p99, KV-cache pressure, memory bandwidth, batching and CPU/GPU offload. No guessing; everything is instrumented.

  • Fixed price and fixed timeline, agreed before we begin
  • Typically 1–2 weeks depending on pipeline complexity
  • Load-tested at 1, 2, 4, 8, 10+ concurrent clients
  • Run on your infrastructure, or on a rented instance we manage
You get →A complete, measured map of your performance bottlenecks.
03

Findings, roadmap & quote

You receive a written report, not a slide deck: a benchmark matrix with reproducible test scripts, ranked bottlenecks with measured numbers, and the projected latency and cost improvement for each fix. Critically, it includes a firm fixed quote and timeline for the optimization work itself — so you decide with real data, not estimates.

  • Benchmark matrix + reproducible test scripts
  • Profiling report: CPU/GPU/kernel bottlenecks, ranked by priority
  • Runtime configuration recommendations for your target hardware
You get →A report your own team could act on, plus an exact price & schedule for the next stage.
04

Optimization

Optional · separate engagement

If you choose to proceed, we execute the fixes against the agreed scope, price and timeline — runtime and quantization tuning, serving-layer and batching changes, kernel-level work, or fine-tuning where it's in scope. We validate with before/after benchmarks (TTFT, decode latency, throughput, VRAM, power, p95/p99) and check quality impact, then hand off a technical report, recommended configuration, reproducible scripts and a next-step roadmap.

You get →The optimized system, plus measured proof of the improvement.
Why the audit is paid

The audit has standalone value.

It's senior engineering time spent measuring a real system — and the report is useful on its own. Even if you never hire us for the optimization, your team walks away knowing exactly where the performance and cost are leaking, and what each fix is worth.

If you proceed to the optimization stage, the full audit fee is credited toward that engagement. So the audit effectively becomes a no-risk first step.
Audit priceFixed quote
Timeline1–2 weeks
DeliverableWritten report + roadmap
If you continueFee credited

What each side brings

What we need from you

  • Access to the model(s) and the inference pipeline, or a representative reproduction
  • The hardware target you deploy on (or want to)
  • Your targets: latency, concurrency, context length and the metric that matters most
  • Any privacy or compliance constraints on data and deployment
  • A point of contact who knows the system

What you walk away with

  • A benchmark matrix and reproducible test scripts
  • A measured, ranked list of bottlenecks — no guesswork
  • Projected savings or speed-up for every recommended fix
  • A fixed price and timeline for the optimization work
  • A report your own engineers can execute, even without us

Know exactly what your
inference is costing you.

Start with a scope call. We'll tell you whether an audit makes sense — and if it doesn't, we'll say so.

Book an audit →