Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allocate less in Fortuna #188

Merged
merged 2 commits into from
Feb 3, 2024
Merged

Allocate less in Fortuna #188

merged 2 commits into from
Feb 3, 2024

Conversation

hannesm
Copy link
Member

@hannesm hannesm commented Jan 24, 2024

@palainp observed lots of 2 byte allocations (with malloc, via bigarray). The root cause is the add function in our Fortuna module.

This PR uses instead a global pre-allocated buffer. This is likely fine for OCaml 4, but will be troublesome (a data race) when OCaml 5 (see #186) is considered, or any other multi-threading preemption setup.

It is not 100% clear how to move forward. One part is to document "mirage-crypto is not SMP safe", and if someone wants to use it in such a system, they should guard functions with mutexes.

What does the way ahead mean? We have observed that using "cstruct.t" may have been not the wisest choice from early on, and settling on bytes/string would be more wise (in terms of GC friendlyness). We also observe that a "domain-local variable" (i.e. __thread in C) is missing in OCaml AFAICT. Keeping performance in mind, maybe we should merge this as an intermediate stop-gap solution, then move on revising the API to use string/bytes (at least internally), and only then figure out a solution for the multicore stuff.

In related news, there's ongoing work for thread-local storage in OCaml 5 (ocaml/ocaml#12724 ocaml/ocaml#12889 ocaml/ocaml#12677) -- looks like 5.3 may have such a thing. Considering my time and energy, maybe mark mirage-crypto* as not available for OCaml 5 (until there's a thread local storage story and we fix the code) is the way to go? WDYT?

@palainp
Copy link
Member

palainp commented Jan 24, 2024

Thank you @hannesm ! I can confirm that with this PR I no more have 10's of 2B allocation per second 🚀 So I'm in favor to merge it as #186 is not merged yet.
(I think mirage-crypto-rng-eio could not be marked unaviable to Ocaml 5.)
I'd like to generally have less allocation and (or at least) less big-arrays allocations as the GC has trouble collecting them :(

@hannesm
Copy link
Member Author

hannesm commented Feb 3, 2024

Let's merge this, and get data race freedom in a later release..

@hannesm hannesm merged commit 24dff0d into mirage:main Feb 3, 2024
27 checks passed
@hannesm hannesm deleted the fortuna-less-alloc branch February 3, 2024 11:44
hannesm added a commit to hannesm/opam-repository that referenced this pull request Feb 26, 2024
CHANGES:

* mirage-crypto, mirage-crypto-rng{,lwt,mirage}: support CL.EXE compiler
  (mirage/mirage-crypto#137 @jonahbeckford) - mirage-crypto-pk not yet due to gmp dependency,
  mirage-crypto-ec doesn't pass testsuite
* mirage-crypto-ec: use simpler square root for ed25519 - saving 3
  multiplications and 2 squarings, details
  https://mailarchive.ietf.org/arch/msg/cfrg/qlKpMBqxXZYmDpXXIx6LO3Oznv4/
  (mirage/mirage-crypto#196 @hannesm)
* mirage-crypto-ec: use sliding window method with pre-computed calues of
  multiples of the generator point for NIST curves, speedup around 4x for P-256
  sign (mirage/mirage-crypto#191 @Firobe, review @palainp @hannesm)
* mirage-crypto-ec: documentation: warn about power timing analysis on `k` in
  Dsa.sign (mirage/mirage-crypto#195 @hannesm, as proposed by @edwintorok)
* mirage-crypto-ec: replace internal Cstruct.t by string (speedup up to 2.5x)
  (mirage/mirage-crypto#146 @dinosaure @hannesm @reynir, review @Firobe @palainp @hannesm @reynir)
* bench/speed: add EC (ECDSA & EdDSA generate/sign/verify, ECDH secret/share)
  operations (mirage/mirage-crypto#192 @hannesm)
* mirage-crypto-rng: use rdtime instead of rdcycle on RISC-V (rdcycle is
  privileged since Linux kernel 6.6) (mirage/mirage-crypto#194 @AdrianBunk, review by @edwintorok)
* mirage-crypto-rng: support Loongarch (mirage/mirage-crypto#190 @fangyaling, review @loongson-zn)
* mirage-crypto-rng: support NetBSD (mirage/mirage-crypto#189 @drchrispinnock)
* mirage-crypto-rng: allocate less in Fortuna when feeding (mirage/mirage-crypto#188 @hannesm,
  reported by @palainp)
* mirage-crypto-ec: avoid mirage-crypto-pk and asn1-combinators test dependency
  (instead, craft our own asn.1 decoder -- mirage/mirage-crypto#200 @hannesm)

### Performance differences between v0.11.2 and v0.11.3 and OpenSSL

The overall result is promising: P-256 sign operation improved 9.4 times, but
is still a 4.9 times slower than OpenSSL.

Numbers in operations per second (apart from speedup, which is a factor
v0.11.3 / v0.11.2), gathered on a Intel i7-5600U CPU 2.60GHz using FreeBSD 14.0,
OCaml 4.14.1, and OpenSSL 3.0.12.

#### P224

| op     | v0.11.2 | v0.11.3 | speedup | OpenSSL |
|--------|---------|---------|---------|---------|
| gen    | 1160    | 20609   |    17.8 |         |
| sign   | 931     | 8169    |     8.8 | 21319   |
| verify | 328     | 1606    |     4.9 | 10719   |
| dh-sec | 1011    | 12595   |    12.5 |         |
| dh-kex | 992     | 2021    |     2.0 | 16691   |

#### P256

| op     | v0.11.2 | v0.11.3 | speedup | OpenSSL |
|--------|---------|---------|---------|---------|
| gen    | 990     | 19365   |    19.6 |         |
| sign   | 792     | 7436    |     9.4 | 36182   |
| verify | 303     | 1488    |     4.9 | 13383   |
| dh-sec | 875     | 11508   |    13.2 |         |
| dh-kex | 895     | 1861    |     2.1 | 17742   |

#### P384

| op     | v0.11.2 | v0.11.3 | speedup | OpenSSL |
|--------|---------|---------|---------|---------|
| gen    | 474     | 6703    |    14.1 |         |
| sign   | 349     | 3061    |     8.8 | 900     |
| verify | 147     | 544     |     3.7 | 1062    |
| dh-sec | 378     | 4405    |    11.7 |         |
| dh-kex | 433     | 673     |     1.6 | 973     |

#### P521

| op     | v0.11.2 | v0.11.3 | speedup | OpenSSL |
|--------|---------|---------|---------|---------|
| gen    | 185     | 1996    |    10.8 |         |
| sign   | 137     | 438     |     3.2 | 2737    |
| verify | 66      | 211     |     3.2 | 1354    |
| dh-sec | 180     | 1535    |     8.5 |         |
| dh-kex | 201     | 268     |     1.3 | 2207    |

#### 25519

| op     | v0.11.2 | v0.11.3 | speedup | OpenSSL |
|--------|---------|---------|---------|---------|
| gen    | 23271   | 22345   |     1.0 |         |
| sign   | 11228   | 10985   |     1.0 | 21794   |
| verify | 8149    | 8029    |     1.0 | 7729    |
| dh-sec | 14075   | 13968   |     1.0 |         |
| dh-kex | 13487   | 14079   |     1.0 | 24824   |
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants