QNSP

Benchmarked 2026-05-14

PQC benchmarks — real numbers, reproducible

These are not marketing numbers. Every value below is captured by the same liboqs build that runs in QNSP production, on the host listed in the test environment box. Re-run the script yourself; the JSON output is committed to the repo.

CPU
Apple M4 Max
Cores
14
OS
darwin/arm64
Memory
36 GiB
Node
v24.15.0
liboqs
0.15.0
Generated
2026-05-14 14:36:38Z
Commit
6fa08c5666c6

Key Encapsulation Mechanisms (KEMs)

p50 latency per operation. 500 iterations per algorithm with one untimed warm-up.

AlgorithmStandardNIST cat.Keygen p50Encaps p50Decaps p50Keygen ops/sPub keyCiphertext
ML-KEM-512FIPS 203L15.0 µs6.0 µs7.0 µs174.6k /s800B768B
ML-KEM-768FIPS 203L39.0 µs0.010 ms0.011 ms106.4k /s1184B1088B
ML-KEM-1024FIPS 203L50.014 ms0.014 ms0.016 ms71.4k /s1568B1568B

Digital signatures

p50 latency per operation. Iteration counts vary by algorithm cost (ML-DSA: 100–200, Falcon: 50, SLH-DSA: 10–25). Message size: 179 bytes.

AlgorithmStandardNIST cat.Keygen p50Sign p50Verify p50Verify ops/sPub keySig size
ML-DSA-44FIPS 204L20.034 ms0.116 ms0.035 ms29.2k /s1312B2420B
ML-DSA-65FIPS 204L30.061 ms0.189 ms0.053 ms17.9k /s1952B3309B
ML-DSA-87FIPS 204L50.093 ms0.247 ms0.092 ms10.8k /s2592B4627B
Falcon-512FN-DSAL12.93 ms0.097 ms0.015 ms64.0k /s897B650B
SLH-DSA-SHA2-128fFIPS 205L10.485 ms11.06 ms0.680 ms1.5k /s32B17088B
SLH-DSA-SHA2-256fFIPS 205L51.85 ms37.69 ms1.01 ms985 /s64B49856B

Cold-start vs warm-start

First-call latency on a fresh algorithm handle vs steady-state p50. Matters for serverless / lambda deployments where cold-path latency affects tail latency. Ratio > 10 indicates non-trivial cold-start cost worth provisioning around. Lower is better.

AlgorithmFirst callWarmup medianSteady-state p50Cold/Warm ratio
ML-KEM-5120.010 ms7.0 µs5.0 µs2.075×
ML-KEM-7680.012 ms0.011 ms9.0 µs1.366×
ML-KEM-10240.022 ms0.015 ms0.014 ms1.536×
ML-DSA-440.043 ms0.038 ms0.034 ms1.254×
ML-DSA-650.088 ms0.066 ms0.061 ms1.443×
ML-DSA-870.103 ms0.090 ms0.093 ms1.109×
Falcon-5125.40 ms2.73 ms2.93 ms1.842×
SLH-DSA-SHA2-128f0.532 ms0.487 ms0.485 ms1.097×
SLH-DSA-SHA2-256f1.87 ms1.75 ms1.85 ms1.008×

Memory footprint per algorithm instance (batched, N=200)

Creates 200 algorithm instances in a single process, runs one keygen per instance, then divides the RSS delta by the batch size for a stable per-instance number. Capacity planning input: concurrent-instance-count × per-instance-footprint = lower-bound platform memory. Per-instance heap is the V8-side (JavaScript object) cost; RSS includes the native-C liboqs allocation.

AlgorithmPer-instance RSSPer-instance heapBatch RSS ΔBatch heap Δ
ML-KEM-5124.6 KiB1.4 KiB928.0 KiB288.1 KiB
ML-KEM-7685.4 KiB1.4 KiB1.05 MiB272.0 KiB
ML-KEM-10245.5 KiB1.4 KiB1.08 MiB277.8 KiB
ML-DSA-44164 B1.4 KiB32.0 KiB271.4 KiB
ML-DSA-656.2 KiB1.4 KiB1.22 MiB270.2 KiB
ML-DSA-878.4 KiB1.4 KiB1.64 MiB276.4 KiB
Falcon-5121.0 KiB1.3 KiB208.0 KiB270.0 KiB
SLH-DSA-SHA2-128f0 B1.3 KiB0 B270.0 KiB
SLH-DSA-SHA2-256f82 B1.3 KiB16.0 KiB270.0 KiB

Multi-process concurrency (real multi-core scaling)

Aggregate keygen ops/sec when running 1 / 2 / 4 / 8 child Node processes in parallel, each running 100 keygens. Includes fork + load + warm-up overhead per child, so absolute numbers are lower than the in-process steady-state baseline — but the scaling ratio (1→8) reflects real multi-core production deployment behavior. Per-algorithm rows below.

Algorithm1 proc2 proc4 proc8 proc1→8 ratio
ML-KEM-5121.5k /s2.9k /s5.3k /s9.3k /s6.04×
ML-KEM-7681.3k /s2.8k /s5.2k /s9.1k /s6.78×
ML-KEM-10241.5k /s2.8k /s5.2k /s9.2k /s6.27×
ML-DSA-441.5k /s2.8k /s5.0k /s9.2k /s6.31×
ML-DSA-651.2k /s2.6k /s4.8k /s8.7k /s7.29×
ML-DSA-871.3k /s2.5k /s4.7k /s7.9k /s6.25×
Falcon-512259 /s510 /s962 /s1.8k /s7.03×
SLH-DSA-SHA2-128f892 /s1.7k /s3.2k /s6.0k /s6.75×
SLH-DSA-SHA2-256f405 /s753 /s1.4k /s2.9k /s7.09×

Native liboqs vs @noble/post-quantum (pure JavaScript)

Side-by-side keygen performance: the same algorithm executed by the native-C liboqs (0.15.0) that ships in QNSP backend services, vs the pure-JavaScript @noble/post-quantum (0.6.0) that ships in the QNSP browser SDK. Both produce byte-identical FIPS-conformant artifacts; the native path is materially faster for backend workloads, pure-JS is the right choice for browser-side / serverless-edge deployment where native bindings aren't available.

Algorithmliboqs p50noble p50liboqs ops/snoble ops/sNative speedup
ML-KEM-5125.0 µs0.178 ms174.6k /s5.0k /s35.558×
ML-KEM-7689.0 µs0.269 ms106.4k /s3.7k /s29.843×
ML-KEM-10240.014 ms0.450 ms71.4k /s2.2k /s32.11×
ML-DSA-440.034 ms1.00 ms29.1k /s986 /s29.529×
ML-DSA-650.061 ms1.72 ms15.8k /s585 /s28.107×
ML-DSA-870.093 ms2.74 ms10.6k /s365 /s29.496×
Falcon-5122.93 msFalcon (FN-DSA pending FIPS 206) is not in @noble/post-quantum's catalog as of 0.6.0.
SLH-DSA-SHA2-128f0.485 ms2.96 ms2.1k /s335 /s6.097×
SLH-DSA-SHA2-256f1.85 ms12.04 ms549 /s83 /s6.503×

In-process concurrency (single-process saturation reference)

Aggregate ops/sec at in-process workers ∈ {1, 2, 4, 8, 16} for ML-KEM-768. liboqs native calls are synchronous, so this saturates around the workers=1 figure — this measurement exists to confirm the saturation; the multi-process numbers above are the right input for capacity sizing.

In-process workersTotal opsDurationAggregate ops/sec
1500.495 ms100.9k /s
21001.22 ms82.3k /s
42001.93 ms103.6k /s
84004.07 ms98.4k /s
168007.47 ms107.1k /s

Verify these numbers yourself

The numbers above come from the native C liboqs build that runs inside QNSP backend services. The same FIPS 203 / 204 algorithms are also implemented in pure JavaScript by @noble/post-quantum — ~50× slower in pure JS but produces byte-identical FIPS-conformant artifacts. Three independent ways to verify, no signup required:

  1. Audit the published JSON. https://qnsp.cuilabs.io/pqc-benchmarks/pqc-latest.json — full per-algorithm p50/p95/p99/mean numbers, hardware specs, and the timestamp + git SHA from the run that produced them.
  2. curl a live PQC operation. https://qnsp.cuilabs.io/api/sandbox/pqc-runtime runs ML-KEM-768 + ML-DSA-65 keygen / encaps / decaps / sign / verify per request, with server-side integrity flags. Sibling endpoints /conformance and /api/health/pqc-sandbox probe determinism vectors and self-canary status.
  3. Run pure-JS noble locally — copy-paste this one-liner:
    npm i @noble/post-quantum && node --input-type=module -e "import {ml_kem768} from '@noble/post-quantum/ml-kem.js'; const t=performance.now(); const kp=ml_kem768.keygen(); console.log('keygen ms:', (performance.now()-t).toFixed(3), '— pubkey bytes:', kp.publicKey.length);"

Absolute timings depend on hardware, kernel scheduler, and thermal state. Cross-algorithm ratios are stable. Schema v3 adds multi-process scaling, batched memory measurement, and side-by-side noble comparison. The internal benchmark runner uses process.hrtime.bigint() for nanosecond timing and discards the first untimed iteration to remove cold-cache and JIT warmup effects. Public SDK source is at https://github.com/cuilabs/qnsp-public.

Cross-reference

Independent benchmark sources

The numbers we publish are reproducible against multiple third parties. Cross-reference our results with these neutral sources before you trust any vendor's PQC claim — including ours.

Open-source project
Open Quantum Safe — Continuous Benchmarks
16 PQC algorithms (ML-KEM, ML-DSA, SLH-DSA, BIKE, HQC, Falcon, MAYO, SNOVA, UOV) with live keygen / encap / sign / verify visualizations.
Same liboqs base we use — apples-to-apples cross-reference for every QNSP number above.
Open-source binary
liboqs `speed_kem` / `speed_sig`
Reproducible cycle counts for keygen / encap / decap / sign / verify on any hardware. v0.15.0 released Nov 2025.
Run it yourself on the same machine you'll deploy on — no need to trust anyone else's numbers.
Academic (D.J. Bernstein et al.)
SUPERCOP / eBACS
Cycle counts and key/CT/signature sizes for ML-KEM, ML-DSA, SPHINCS+, Falcon, HQC, BIKE, McEliece across 30+ measurement machines.
Cross-CPU comparison — useful when planning deployments on a specific server SKU.
Vendor research (production-scale data)
Cloudflare — State of the Post-Quantum Internet 2025
Real-world TLS handshake slowdown measured across ~50% of Cloudflare's traffic running hybrid PQC. ML-DSA-44 / SLH-DSA / FN-DSA-512 CPU comparison.
Counter to the "PQC is too slow for production" objection — actual hyperscaler latency numbers.
Vendor (cloud provider)
AWS — ML-KEM in KMS / ACM / Secrets Manager
TPS p01–p99 across 500 runs; +80–150 µs handshake compute, +1,600 B on the wire.
Only public PQC TLS benchmark from a hyperscaler KMS service.
Academic
pqm4 — Embedded benchmarks (Cortex-M4)
Cycles, RAM, flash for ML-KEM, ML-DSA, SLH-DSA on STM32L4R5ZI.
If you're deploying QNSP at the IoT edge, our cloud numbers don't translate — start here.
Academic (peer-reviewed, 2025)
ML-KEM/ML-DSA on RP2040 (arXiv 2603.19340)
Timing, memory, energy across all NIST security levels — e.g. ML-KEM-512 = 35.7 ms / 2.83 mJ on RP2040.
First post-FIPS-203/204 isolated-algorithm energy numbers — useful for battery-bound deployments.
Open-source project
PQC-LEO — Reproducible TLS handshake harness
Push-button TLS-1.3 handshake benchmark via OpenSSL 3.6 + oqsprovider. v0.5.1 released April 2026.
Third-party harness you can point at QNSP's edge gateway to measure handshake latency yourself.

Verified May 2026. Sources are independent of QNSP. We link to neutral benchmarks because trust requires triangulation — verify our numbers, then verify ours against theirs.

What this proves

Production PQC isn't a research problem — it's a measurement problem

ML-KEM-768 (the FIPS 203 default for most QNSP tiers) generates a key pair in microseconds and runs encapsulate / decapsulate at hundreds of thousands of operations per second on commodity hardware. ML-DSA-65 signatures verify orders of magnitude faster than they sign — making PQC viable on the synchronous request path of services that issue many short-lived tokens. SLH-DSA is conservative and slow but produces hash-based signatures with a different security argument than lattice-based schemes, and we benchmark it so customers in regulated sectors can pick it deliberately rather than ruling it out by reputation.

These numbers are why QNSP can run a PQC-only crypto policy at the edge gateway, KMS, and audit-service tiers without extra infrastructure. The same liboqs binary that produced this JSON also handles every key, signature, and audit record on the platform.