Benchmarked 2026-05-14
PQC benchmarks — real numbers, reproducible
These are not marketing numbers. Every value below is captured by the same liboqs build that runs in QNSP production, on the host listed in the test environment box. Re-run the script yourself; the JSON output is committed to the repo.
Key Encapsulation Mechanisms (KEMs)
p50 latency per operation. 500 iterations per algorithm with one untimed warm-up.
| Algorithm | Standard | NIST cat. | Keygen p50 | Encaps p50 | Decaps p50 | Keygen ops/s | Pub key | Ciphertext |
|---|---|---|---|---|---|---|---|---|
| ML-KEM-512 | FIPS 203 | L1 | 5.0 µs | 6.0 µs | 7.0 µs | 174.6k /s | 800B | 768B |
| ML-KEM-768 | FIPS 203 | L3 | 9.0 µs | 0.010 ms | 0.011 ms | 106.4k /s | 1184B | 1088B |
| ML-KEM-1024 | FIPS 203 | L5 | 0.014 ms | 0.014 ms | 0.016 ms | 71.4k /s | 1568B | 1568B |
Digital signatures
p50 latency per operation. Iteration counts vary by algorithm cost (ML-DSA: 100–200, Falcon: 50, SLH-DSA: 10–25). Message size: 179 bytes.
| Algorithm | Standard | NIST cat. | Keygen p50 | Sign p50 | Verify p50 | Verify ops/s | Pub key | Sig size |
|---|---|---|---|---|---|---|---|---|
| ML-DSA-44 | FIPS 204 | L2 | 0.034 ms | 0.116 ms | 0.035 ms | 29.2k /s | 1312B | 2420B |
| ML-DSA-65 | FIPS 204 | L3 | 0.061 ms | 0.189 ms | 0.053 ms | 17.9k /s | 1952B | 3309B |
| ML-DSA-87 | FIPS 204 | L5 | 0.093 ms | 0.247 ms | 0.092 ms | 10.8k /s | 2592B | 4627B |
| Falcon-512 | FN-DSA | L1 | 2.93 ms | 0.097 ms | 0.015 ms | 64.0k /s | 897B | 650B |
| SLH-DSA-SHA2-128f | FIPS 205 | L1 | 0.485 ms | 11.06 ms | 0.680 ms | 1.5k /s | 32B | 17088B |
| SLH-DSA-SHA2-256f | FIPS 205 | L5 | 1.85 ms | 37.69 ms | 1.01 ms | 985 /s | 64B | 49856B |
Cold-start vs warm-start
First-call latency on a fresh algorithm handle vs steady-state p50. Matters for serverless / lambda deployments where cold-path latency affects tail latency. Ratio > 10 indicates non-trivial cold-start cost worth provisioning around. Lower is better.
| Algorithm | First call | Warmup median | Steady-state p50 | Cold/Warm ratio |
|---|---|---|---|---|
| ML-KEM-512 | 0.010 ms | 7.0 µs | 5.0 µs | 2.075× |
| ML-KEM-768 | 0.012 ms | 0.011 ms | 9.0 µs | 1.366× |
| ML-KEM-1024 | 0.022 ms | 0.015 ms | 0.014 ms | 1.536× |
| ML-DSA-44 | 0.043 ms | 0.038 ms | 0.034 ms | 1.254× |
| ML-DSA-65 | 0.088 ms | 0.066 ms | 0.061 ms | 1.443× |
| ML-DSA-87 | 0.103 ms | 0.090 ms | 0.093 ms | 1.109× |
| Falcon-512 | 5.40 ms | 2.73 ms | 2.93 ms | 1.842× |
| SLH-DSA-SHA2-128f | 0.532 ms | 0.487 ms | 0.485 ms | 1.097× |
| SLH-DSA-SHA2-256f | 1.87 ms | 1.75 ms | 1.85 ms | 1.008× |
Memory footprint per algorithm instance (batched, N=200)
Creates 200 algorithm instances in a single process, runs one keygen per instance, then divides the RSS delta by the batch size for a stable per-instance number. Capacity planning input: concurrent-instance-count × per-instance-footprint = lower-bound platform memory. Per-instance heap is the V8-side (JavaScript object) cost; RSS includes the native-C liboqs allocation.
| Algorithm | Per-instance RSS | Per-instance heap | Batch RSS Δ | Batch heap Δ |
|---|---|---|---|---|
| ML-KEM-512 | 4.6 KiB | 1.4 KiB | 928.0 KiB | 288.1 KiB |
| ML-KEM-768 | 5.4 KiB | 1.4 KiB | 1.05 MiB | 272.0 KiB |
| ML-KEM-1024 | 5.5 KiB | 1.4 KiB | 1.08 MiB | 277.8 KiB |
| ML-DSA-44 | 164 B | 1.4 KiB | 32.0 KiB | 271.4 KiB |
| ML-DSA-65 | 6.2 KiB | 1.4 KiB | 1.22 MiB | 270.2 KiB |
| ML-DSA-87 | 8.4 KiB | 1.4 KiB | 1.64 MiB | 276.4 KiB |
| Falcon-512 | 1.0 KiB | 1.3 KiB | 208.0 KiB | 270.0 KiB |
| SLH-DSA-SHA2-128f | 0 B | 1.3 KiB | 0 B | 270.0 KiB |
| SLH-DSA-SHA2-256f | 82 B | 1.3 KiB | 16.0 KiB | 270.0 KiB |
Multi-process concurrency (real multi-core scaling)
Aggregate keygen ops/sec when running 1 / 2 / 4 / 8 child Node processes in parallel, each running 100 keygens. Includes fork + load + warm-up overhead per child, so absolute numbers are lower than the in-process steady-state baseline — but the scaling ratio (1→8) reflects real multi-core production deployment behavior. Per-algorithm rows below.
| Algorithm | 1 proc | 2 proc | 4 proc | 8 proc | 1→8 ratio |
|---|---|---|---|---|---|
| ML-KEM-512 | 1.5k /s | 2.9k /s | 5.3k /s | 9.3k /s | 6.04× |
| ML-KEM-768 | 1.3k /s | 2.8k /s | 5.2k /s | 9.1k /s | 6.78× |
| ML-KEM-1024 | 1.5k /s | 2.8k /s | 5.2k /s | 9.2k /s | 6.27× |
| ML-DSA-44 | 1.5k /s | 2.8k /s | 5.0k /s | 9.2k /s | 6.31× |
| ML-DSA-65 | 1.2k /s | 2.6k /s | 4.8k /s | 8.7k /s | 7.29× |
| ML-DSA-87 | 1.3k /s | 2.5k /s | 4.7k /s | 7.9k /s | 6.25× |
| Falcon-512 | 259 /s | 510 /s | 962 /s | 1.8k /s | 7.03× |
| SLH-DSA-SHA2-128f | 892 /s | 1.7k /s | 3.2k /s | 6.0k /s | 6.75× |
| SLH-DSA-SHA2-256f | 405 /s | 753 /s | 1.4k /s | 2.9k /s | 7.09× |
Native liboqs vs @noble/post-quantum (pure JavaScript)
Side-by-side keygen performance: the same algorithm executed by the native-C liboqs (0.15.0) that ships in QNSP backend services, vs the pure-JavaScript @noble/post-quantum (0.6.0) that ships in the QNSP browser SDK. Both produce byte-identical FIPS-conformant artifacts; the native path is materially faster for backend workloads, pure-JS is the right choice for browser-side / serverless-edge deployment where native bindings aren't available.
| Algorithm | liboqs p50 | noble p50 | liboqs ops/s | noble ops/s | Native speedup |
|---|---|---|---|---|---|
| ML-KEM-512 | 5.0 µs | 0.178 ms | 174.6k /s | 5.0k /s | 35.558× |
| ML-KEM-768 | 9.0 µs | 0.269 ms | 106.4k /s | 3.7k /s | 29.843× |
| ML-KEM-1024 | 0.014 ms | 0.450 ms | 71.4k /s | 2.2k /s | 32.11× |
| ML-DSA-44 | 0.034 ms | 1.00 ms | 29.1k /s | 986 /s | 29.529× |
| ML-DSA-65 | 0.061 ms | 1.72 ms | 15.8k /s | 585 /s | 28.107× |
| ML-DSA-87 | 0.093 ms | 2.74 ms | 10.6k /s | 365 /s | 29.496× |
| Falcon-512 | 2.93 ms | Falcon (FN-DSA pending FIPS 206) is not in @noble/post-quantum's catalog as of 0.6.0. | |||
| SLH-DSA-SHA2-128f | 0.485 ms | 2.96 ms | 2.1k /s | 335 /s | 6.097× |
| SLH-DSA-SHA2-256f | 1.85 ms | 12.04 ms | 549 /s | 83 /s | 6.503× |
In-process concurrency (single-process saturation reference)
Aggregate ops/sec at in-process workers ∈ {1, 2, 4, 8, 16} for ML-KEM-768. liboqs native calls are synchronous, so this saturates around the workers=1 figure — this measurement exists to confirm the saturation; the multi-process numbers above are the right input for capacity sizing.
| In-process workers | Total ops | Duration | Aggregate ops/sec |
|---|---|---|---|
| 1 | 50 | 0.495 ms | 100.9k /s |
| 2 | 100 | 1.22 ms | 82.3k /s |
| 4 | 200 | 1.93 ms | 103.6k /s |
| 8 | 400 | 4.07 ms | 98.4k /s |
| 16 | 800 | 7.47 ms | 107.1k /s |
Verify these numbers yourself
The numbers above come from the native C liboqs build that runs inside QNSP backend services. The same FIPS 203 / 204 algorithms are also implemented in pure JavaScript by @noble/post-quantum — ~50× slower in pure JS but produces byte-identical FIPS-conformant artifacts. Three independent ways to verify, no signup required:
- Audit the published JSON. https://qnsp.cuilabs.io/pqc-benchmarks/pqc-latest.json — full per-algorithm p50/p95/p99/mean numbers, hardware specs, and the timestamp + git SHA from the run that produced them.
- curl a live PQC operation. https://qnsp.cuilabs.io/api/sandbox/pqc-runtime runs ML-KEM-768 + ML-DSA-65 keygen / encaps / decaps / sign / verify per request, with server-side integrity flags. Sibling endpoints
/conformanceand/api/health/pqc-sandboxprobe determinism vectors and self-canary status. - Run pure-JS noble locally — copy-paste this one-liner:
npm i @noble/post-quantum && node --input-type=module -e "import {ml_kem768} from '@noble/post-quantum/ml-kem.js'; const t=performance.now(); const kp=ml_kem768.keygen(); console.log('keygen ms:', (performance.now()-t).toFixed(3), '— pubkey bytes:', kp.publicKey.length);"
Absolute timings depend on hardware, kernel scheduler, and thermal state. Cross-algorithm ratios are stable. Schema v3 adds multi-process scaling, batched memory measurement, and side-by-side noble comparison. The internal benchmark runner uses process.hrtime.bigint() for nanosecond timing and discards the first untimed iteration to remove cold-cache and JIT warmup effects. Public SDK source is at https://github.com/cuilabs/qnsp-public.
Cross-reference
Independent benchmark sources
The numbers we publish are reproducible against multiple third parties. Cross-reference our results with these neutral sources before you trust any vendor's PQC claim — including ours.
Verified May 2026. Sources are independent of QNSP. We link to neutral benchmarks because trust requires triangulation — verify our numbers, then verify ours against theirs.
What this proves
Production PQC isn't a research problem — it's a measurement problem
ML-KEM-768 (the FIPS 203 default for most QNSP tiers) generates a key pair in microseconds and runs encapsulate / decapsulate at hundreds of thousands of operations per second on commodity hardware. ML-DSA-65 signatures verify orders of magnitude faster than they sign — making PQC viable on the synchronous request path of services that issue many short-lived tokens. SLH-DSA is conservative and slow but produces hash-based signatures with a different security argument than lattice-based schemes, and we benchmark it so customers in regulated sectors can pick it deliberately rather than ruling it out by reputation.
These numbers are why QNSP can run a PQC-only crypto policy at the edge gateway, KMS, and audit-service tiers without extra infrastructure. The same liboqs binary that produced this JSON also handles every key, signature, and audit record on the platform.