QNSP PQC Benchmarks — Real Numbers, Reproducible

CPU
Apple M4 Max
Cores
14
OS
darwin/arm64
Memory
36 GiB
Node
v24.15.0
liboqs
0.15.0
Generated
2026-05-14 14:36:38Z
Commit
6fa08c5666c6

Key Encapsulation Mechanisms (KEMs)

p50 latency per operation. 500 iterations per algorithm with one untimed warm-up.

Algorithm	Standard	NIST cat.	Keygen p50	Encaps p50	Decaps p50	Keygen ops/s	Pub key	Ciphertext
ML-KEM-512	FIPS 203	L1	5.0 µs	6.0 µs	7.0 µs	174.6k /s	800B	768B
ML-KEM-768	FIPS 203	L3	9.0 µs	0.010 ms	0.011 ms	106.4k /s	1184B	1088B
ML-KEM-1024	FIPS 203	L5	0.014 ms	0.014 ms	0.016 ms	71.4k /s	1568B	1568B

Digital signatures

p50 latency per operation. Iteration counts vary by algorithm cost (ML-DSA: 100–200, Falcon: 50, SLH-DSA: 10–25). Message size: 179 bytes.

Algorithm	Standard	NIST cat.	Keygen p50	Sign p50	Verify p50	Verify ops/s	Pub key	Sig size
ML-DSA-44	FIPS 204	L2	0.034 ms	0.116 ms	0.035 ms	29.2k /s	1312B	2420B
ML-DSA-65	FIPS 204	L3	0.061 ms	0.189 ms	0.053 ms	17.9k /s	1952B	3309B
ML-DSA-87	FIPS 204	L5	0.093 ms	0.247 ms	0.092 ms	10.8k /s	2592B	4627B
Falcon-512	FN-DSA	L1	2.93 ms	0.097 ms	0.015 ms	64.0k /s	897B	650B
SLH-DSA-SHA2-128f	FIPS 205	L1	0.485 ms	11.06 ms	0.680 ms	1.5k /s	32B	17088B
SLH-DSA-SHA2-256f	FIPS 205	L5	1.85 ms	37.69 ms	1.01 ms	985 /s	64B	49856B

Cold-start vs warm-start

First-call latency on a fresh algorithm handle vs steady-state p50. Matters for serverless / lambda deployments where cold-path latency affects tail latency. Ratio > 10 indicates non-trivial cold-start cost worth provisioning around. Lower is better.

Algorithm	First call	Warmup median	Steady-state p50	Cold/Warm ratio
ML-KEM-512	0.010 ms	7.0 µs	5.0 µs	2.075×
ML-KEM-768	0.012 ms	0.011 ms	9.0 µs	1.366×
ML-KEM-1024	0.022 ms	0.015 ms	0.014 ms	1.536×
ML-DSA-44	0.043 ms	0.038 ms	0.034 ms	1.254×
ML-DSA-65	0.088 ms	0.066 ms	0.061 ms	1.443×
ML-DSA-87	0.103 ms	0.090 ms	0.093 ms	1.109×
Falcon-512	5.40 ms	2.73 ms	2.93 ms	1.842×
SLH-DSA-SHA2-128f	0.532 ms	0.487 ms	0.485 ms	1.097×
SLH-DSA-SHA2-256f	1.87 ms	1.75 ms	1.85 ms	1.008×

Memory footprint per algorithm instance (batched, N=200)

Creates 200 algorithm instances in a single process, runs one keygen per instance, then divides the RSS delta by the batch size for a stable per-instance number. Capacity planning input: concurrent-instance-count × per-instance-footprint = lower-bound platform memory. Per-instance heap is the V8-side (JavaScript object) cost; RSS includes the native-C liboqs allocation.

Algorithm	Per-instance RSS	Per-instance heap	Batch RSS Δ	Batch heap Δ
ML-KEM-512	4.6 KiB	1.4 KiB	928.0 KiB	288.1 KiB
ML-KEM-768	5.4 KiB	1.4 KiB	1.05 MiB	272.0 KiB
ML-KEM-1024	5.5 KiB	1.4 KiB	1.08 MiB	277.8 KiB
ML-DSA-44	164 B	1.4 KiB	32.0 KiB	271.4 KiB
ML-DSA-65	6.2 KiB	1.4 KiB	1.22 MiB	270.2 KiB
ML-DSA-87	8.4 KiB	1.4 KiB	1.64 MiB	276.4 KiB
Falcon-512	1.0 KiB	1.3 KiB	208.0 KiB	270.0 KiB
SLH-DSA-SHA2-128f	0 B	1.3 KiB	0 B	270.0 KiB
SLH-DSA-SHA2-256f	82 B	1.3 KiB	16.0 KiB	270.0 KiB

Multi-process concurrency (real multi-core scaling)

Aggregate keygen ops/sec when running 1 / 2 / 4 / 8 child Node processes in parallel, each running 100 keygens. Includes fork + load + warm-up overhead per child, so absolute numbers are lower than the in-process steady-state baseline — but the scaling ratio (1→8) reflects real multi-core production deployment behavior. Per-algorithm rows below.

Algorithm	1 proc	2 proc	4 proc	8 proc	1→8 ratio
ML-KEM-512	1.5k /s	2.9k /s	5.3k /s	9.3k /s	6.04×
ML-KEM-768	1.3k /s	2.8k /s	5.2k /s	9.1k /s	6.78×
ML-KEM-1024	1.5k /s	2.8k /s	5.2k /s	9.2k /s	6.27×
ML-DSA-44	1.5k /s	2.8k /s	5.0k /s	9.2k /s	6.31×
ML-DSA-65	1.2k /s	2.6k /s	4.8k /s	8.7k /s	7.29×
ML-DSA-87	1.3k /s	2.5k /s	4.7k /s	7.9k /s	6.25×
Falcon-512	259 /s	510 /s	962 /s	1.8k /s	7.03×
SLH-DSA-SHA2-128f	892 /s	1.7k /s	3.2k /s	6.0k /s	6.75×
SLH-DSA-SHA2-256f	405 /s	753 /s	1.4k /s	2.9k /s	7.09×

Native liboqs vs @noble/post-quantum (pure JavaScript)

Side-by-side keygen performance: the same algorithm executed by the native-C liboqs (0.15.0) that ships in QNSP backend services, vs the pure-JavaScript @noble/post-quantum (0.6.0) that ships in the QNSP browser SDK. Both produce byte-identical FIPS-conformant artifacts; the native path is materially faster for backend workloads, pure-JS is the right choice for browser-side / serverless-edge deployment where native bindings aren't available.

Algorithm	liboqs p50	noble p50	liboqs ops/s	noble ops/s	Native speedup
ML-KEM-512	5.0 µs	0.178 ms	174.6k /s	5.0k /s	35.558×
ML-KEM-768	9.0 µs	0.269 ms	106.4k /s	3.7k /s	29.843×
ML-KEM-1024	0.014 ms	0.450 ms	71.4k /s	2.2k /s	32.11×
ML-DSA-44	0.034 ms	1.00 ms	29.1k /s	986 /s	29.529×
ML-DSA-65	0.061 ms	1.72 ms	15.8k /s	585 /s	28.107×
ML-DSA-87	0.093 ms	2.74 ms	10.6k /s	365 /s	29.496×
Falcon-512	2.93 ms	Falcon (FN-DSA pending FIPS 206) is not in @noble/post-quantum's catalog as of 0.6.0.
SLH-DSA-SHA2-128f	0.485 ms	2.96 ms	2.1k /s	335 /s	6.097×
SLH-DSA-SHA2-256f	1.85 ms	12.04 ms	549 /s	83 /s	6.503×

In-process concurrency (single-process saturation reference)

Aggregate ops/sec at in-process workers ∈ {1, 2, 4, 8, 16} for ML-KEM-768. liboqs native calls are synchronous, so this saturates around the workers=1 figure — this measurement exists to confirm the saturation; the multi-process numbers above are the right input for capacity sizing.

In-process workers	Total ops	Duration	Aggregate ops/sec
1	50	0.495 ms	100.9k /s
2	100	1.22 ms	82.3k /s
4	200	1.93 ms	103.6k /s
8	400	4.07 ms	98.4k /s
16	800	7.47 ms	107.1k /s

Verify these numbers yourself

The numbers above come from the native C liboqs build that runs inside QNSP backend services. The same FIPS 203 / 204 algorithms are also implemented in pure JavaScript by @noble/post-quantum — ~50× slower in pure JS but produces byte-identical FIPS-conformant artifacts. Three independent ways to verify, no signup required:

Audit the published JSON. https://qnsp.cuilabs.io/pqc-benchmarks/pqc-latest.json — full per-algorithm p50/p95/p99/mean numbers, hardware specs, and the timestamp + git SHA from the run that produced them.
curl a live PQC operation. https://qnsp.cuilabs.io/api/sandbox/pqc-runtime runs ML-KEM-768 + ML-DSA-65 keygen / encaps / decaps / sign / verify per request, with server-side integrity flags. Sibling endpoints /conformance and /api/health/pqc-sandbox probe determinism vectors and self-canary status.

Run pure-JS noble locally — copy-paste this one-liner:

npm i @noble/post-quantum && node --input-type=module -e "import {ml_kem768} from '@noble/post-quantum/ml-kem.js'; const t=performance.now(); const kp=ml_kem768.keygen(); console.log('keygen ms:', (performance.now()-t).toFixed(3), '— pubkey bytes:', kp.publicKey.length);"

Absolute timings depend on hardware, kernel scheduler, and thermal state. Cross-algorithm ratios are stable. Schema v3 adds multi-process scaling, batched memory measurement, and side-by-side noble comparison. The internal benchmark runner uses process.hrtime.bigint() for nanosecond timing and discards the first untimed iteration to remove cold-cache and JIT warmup effects. Public SDK source is at https://github.com/cuilabs/qnsp-public.

Open-source project

Open Quantum Safe — Continuous Benchmarks

16 PQC algorithms (ML-KEM, ML-DSA, SLH-DSA, BIKE, HQC, Falcon, MAYO, SNOVA, UOV) with live keygen / encap / sign / verify visualizations.

Same liboqs base we use — apples-to-apples cross-reference for every QNSP number above.

Open-source binary

liboqs `speed_kem` / `speed_sig`

Reproducible cycle counts for keygen / encap / decap / sign / verify on any hardware. v0.15.0 released Nov 2025.

Run it yourself on the same machine you'll deploy on — no need to trust anyone else's numbers.

Academic (D.J. Bernstein et al.)

SUPERCOP / eBACS

Cycle counts and key/CT/signature sizes for ML-KEM, ML-DSA, SPHINCS+, Falcon, HQC, BIKE, McEliece across 30+ measurement machines.

Cross-CPU comparison — useful when planning deployments on a specific server SKU.

Vendor research (production-scale data)

Cloudflare — State of the Post-Quantum Internet 2025

Real-world TLS handshake slowdown measured across ~50% of Cloudflare's traffic running hybrid PQC. ML-DSA-44 / SLH-DSA / FN-DSA-512 CPU comparison.

Counter to the "PQC is too slow for production" objection — actual hyperscaler latency numbers.

Vendor (cloud provider)

AWS — ML-KEM in KMS / ACM / Secrets Manager

TPS p01–p99 across 500 runs; +80–150 µs handshake compute, +1,600 B on the wire.

Only public PQC TLS benchmark from a hyperscaler KMS service.

Academic

pqm4 — Embedded benchmarks (Cortex-M4)

Cycles, RAM, flash for ML-KEM, ML-DSA, SLH-DSA on STM32L4R5ZI.

If you're deploying QNSP at the IoT edge, our cloud numbers don't translate — start here.

Academic (peer-reviewed, 2025)

ML-KEM/ML-DSA on RP2040 (arXiv 2603.19340)

Timing, memory, energy across all NIST security levels — e.g. ML-KEM-512 = 35.7 ms / 2.83 mJ on RP2040.

First post-FIPS-203/204 isolated-algorithm energy numbers — useful for battery-bound deployments.

Open-source project

PQC-LEO — Reproducible TLS handshake harness

Push-button TLS-1.3 handshake benchmark via OpenSSL 3.6 + oqsprovider. v0.5.1 released April 2026.

Third-party harness you can point at QNSP's edge gateway to measure handshake latency yourself.

Verified May 2026. Sources are independent of QNSP. We link to neutral benchmarks because trust requires triangulation — verify our numbers, then verify ours against theirs.

ML-KEM-768 (the FIPS 203 default for most QNSP tiers) generates a key pair in microseconds and runs encapsulate / decapsulate at hundreds of thousands of operations per second on commodity hardware. ML-DSA-65 signatures verify orders of magnitude faster than they sign — making PQC viable on the synchronous request path of services that issue many short-lived tokens. SLH-DSA is conservative and slow but produces hash-based signatures with a different security argument than lattice-based schemes, and we benchmark it so customers in regulated sectors can pick it deliberately rather than ruling it out by reputation.

These numbers are why QNSP can run a PQC-only crypto policy at the edge gateway, KMS, and audit-service tiers without extra infrastructure. The same liboqs binary that produced this JSON also handles every key, signature, and audit record on the platform.

PQC benchmarks — real numbers, reproducible