Latency is the only honest currency an execution engine has. A quote that arrives ten milliseconds late is, in expectation, the wrong quote. A risk check that takes a millisecond on a fast day takes ten on a hard one, and a hard day is the one you cannot afford to be slow on. For an institutional FX market maker, the entire economics of the desk live inside that observation.
This post is the engineering counterpart to The Architecture of a Fair Spread. The pricing post explained what we are pricing; this one explains how the underlying machine earns the right to price it that way. The audience here is the engineering and risk leads at counterparties evaluating Drovix — the people who want to see the boring boxes-and-arrows version before they trust the marketing.
Where the time goes
A useful exercise: account for every microsecond between a market-data update arriving at one of our edge nodes and a Drovix quote being placed in front of an institutional taker. The accounting, for our hot FX path, breaks down approximately as follows.
- Wire-arrival to kernel-bypass NIC: low single-digit microseconds, dictated by the venue line and the colocation cross-connect.
- NIC-to-process via Solarflare-style user-space stack: sub-microsecond, with care taken on cache pinning and NUMA affinity.
- Decode and book-state update inside the pricing process: low single-digit microseconds for a normalised tick on a major.
- ONNX model inference for skew adjustment: low tens of microseconds, on the same physical core as the tick handler.
- Risk pre-check and outbound encoding: low single-digit microseconds combined, including credit and exposure gating.
- Wire-departure to taker: low single-digit microseconds back through the same kernel-bypass path.
Round-trip, end to end, fits comfortably inside the externally advertised <30 ms typical figure — and the internal pricing-to-quote leg is the tightest part of the budget. The slack in the overall envelope is intentional: it is the buffer that absorbs queue depth, congestion bursts, and the kind of CPI-minute that turns competitor engines into oscillating panic loops.

Architectural choices that actually buy you microseconds
Modern C++ on pinned cores
The Drovix engine is implemented in modern C++ — lock-free data structures, zero allocation on the hot path, deterministic teardown, and CPU isolation via pinned cores with interrupts masked off the trading cores. We do not run a Java tier in the price-construction loop and we do not put a Python service in front of latency-sensitive logic.
The reason is not language tribalism; it is that a garbage collector is a fundamentally non-deterministic adversary in a system whose tail latency is its product. Every minor GC is a stop-the-world event for the strategy that asked the question. You can engineer around GC pauses, but the engineering effort to do so reliably is greater than the engineering effort to write the code in a language that does not have them in the first place. Modern C++ — RAII, smart pointers, constexpr, [[likely]] hints — is more productive than its 2005 reputation, and the productivity argument against C++ no longer survives a serious 2026 engineering team.
Aeron for transport
Inter-process and host-to-host messaging inside Drovix runs over Aeron — the reliable UDP and IPC transport originally built for the high-frequency electronic-trading community by the team behind LMAX Disruptor. Aeron gives us four properties we cannot get elsewhere without significant engineering investment:
- Reliable multicast within a single broadcast domain — one publisher, many subscribers, with the publisher unaware of the subscriber count.
- Archive-replay for deterministic crash recovery — every event is on disk in order, replayable bit-for-bit.
- Back-pressure semantics that survive a 10× burst without dropping a tick — slow consumers do not cause publishers to block on the fast path.
- Single-digit-microsecond IPC between processes on the same machine — fast enough that decomposing the engine into separate processes carries near-zero latency cost.
When a venue floods us, our slowest consumer does not become everyone else's problem. The publisher continues at line rate; the slow consumer falls behind and either catches up from the journal or is reset. This is the property that lets us run our compliance and analytics consumers off the same event stream as the pricing engine without compromising the pricing engine's latency.
ONNX inference on the same machine that prints quotes
The price-construction model is exported from training as an ONNX runtime graph and served in-process on the same machine that emits quotes. There is no RPC to a remote inference cluster. There is no GPU you would have to wait on. The model is small (a few megabytes), latency-aware by construction, and continuously evaluated against realised execution outcomes. If a candidate replacement degrades on out-of-sample windowed evaluation, it does not get promoted; we do not run a stale champion for political reasons.
Why ONNX specifically? Because we want the runtime to be the same in research and production, and because the runtime should be a thin layer over CPU vector instructions, not a framework. ONNX Runtime gives us SIMD-vectorised execution of the operators we use (matmul, GELU, layer-norm) at single-digit-microsecond latency for the model topologies we ship, with the same numerical results in research notebooks and on the hot path.

Time, kept honestly
Internal clocks are synchronised via PTP (IEEE 1588) inside each colocation, with hardware timestamping on the NIC where available. The entire system writes timestamped journal events at single-microsecond resolution. A round-trip you cannot reconstruct from the journal is a round-trip you cannot trust. When a counterparty asks 'what did your system do at 14:32:18.412 UTC last Thursday?', the answer is in the journal, not in someone's memory or a partially-flushed log.
Withdraw synchronously, never partially
When the model loses confidence, all outbound prices widen or pull together. The reason — covered in The Architecture of a Fair Spread — is that asymmetric withdrawal is what gets a market maker selected against. A platform that cannot synchronise its own state across egress paths in microseconds is a platform that will leak one-sided liquidity, and one-sided liquidity is what an adversarial taker is specifically looking for in the seconds after an unscheduled headline.
What this buys the counterparty
From the outside, the engineering above produces three observable things that show up in your TCA report:
- A tight
- A fill-rate distribution with a short upper tail — most fills inside the inside, no fat-tailed bursts of rejects.
- A journal entry per event, replayable on demand, that lets your auditor and your TCA team agree on what happened.
Why we publish this
Institutional desks should never accept architecture as a black box. The above is not proprietary in the sense of being secret — every serious electronic market maker makes similar decisions; what differs is discipline. The reason we describe it openly is so that prospective counterparties can ask informed questions, and so that our own engineering team is held to what we have written down. If you find a gap between this article and what you observe in TCA, please send it to compliance@drovix.com.
Microseconds matter, but only because the alternative is to charge clients for sloppiness disguised as variability. Discipline in the engine is the discipline that lets the spread stay tight.
Where to read next
→ Routing Beyond the Inside Quote — what the engine does when our own price is not the best available.
→ Risk Without Friction — pre-trade gates, audit trails, and how the same engine enforces credit and exposure limits in the same microsecond budget.
→ Drovix for Hedge Funds — the institutional client portal that surfaces all of this in a single auditable view.
Analyst Desk
Drovix Research Desk
Institutional Research
Drovix Research Desk publishes institutional-grade analysis covering macro events, cross-asset correlations, and execution insights for professional market participants.
Frequently Asked Questions
Q1.What is the typical end-to-end latency of a Drovix institutional quote?+
Q2.Why does Drovix use C++ instead of Java or Python for the trading engine?+
Q3.What is Aeron and why does Drovix rely on it?+
Q4.Why does Drovix run ML inference on the same machine as the order book?+
Q5.How does Drovix keep time across systems?+
Q6.Does Drovix colocate with specific exchanges?+
Q7.How is the Drovix engine monitored for latency regressions?+
Related Reads
Platform
Last Look in 2026: What Symmetric Means, What Asymmetric Hides, and Why It Still Matters
Next Read
Platform
FIX Tags That Decide Fill vs Re-Quote: The Protocol Detail That Matters
Next Read
Platform
Market Data Handle Discipline: The Boring Engineering That Decides Whether Your Strategy Survives a News Spike
Next Read
