Fast multiplication #11

mangofromage · 2019-08-19T14:50:17Z

This is alternative multiplication method based on column operations.
https://www.youtube.com/watch?v=zm3tQ_BPgm8 - idea is from this video.
I also added test that compares methods.
The current multiplication method should be replaced with new multiplication method.

kokke · 2019-08-19T20:34:40Z

bn.c

+
+  int usable_len = BN_ARRAY_SIZE;
+
+  /*  this section speads up algorithm by "cutting" len of bignum*/


I don't want to pull this part. Let me first comment it, and then tell you the reason why I would like to avoid it,

I appreciate the effort and understand why you would want this here (and in every other functions. You could also maintain a bitset of highest used bit to optimize for speed.
However it adds to the size of both source- and object code. I also fear you might miss out on vectorization optimizations from the compiler, but I wouldn't know for sure. It also makes you susceptible to leaking info via timing if used in encryption algorithms. I mostly want to take a raincheck because it also breaks the beautiful simplicity of the naïive algorithm.

Thanks for the PR - I really appreciate the effort you've put into this

Vector operations bases on processor words so don't worry.
I understand the timing problem in cryptography.
I am not sure are you talking about the same peace of code.

So the section with finding actual length of number can be removed.

kokke · 2019-08-19T20:38:17Z

bn.c

+  usable_len = usable_len > BN_ARRAY_SIZE ? BN_ARRAY_SIZE : usable_len;
+  //
+
+  for (int i = 0; i < usable_len; ++i)


This part is brilliant. I love how you've reduced the allocation of two bignum's into none. That really cuts straight to the bone 👍

However I would like to avoid the scoped-declarations of variables inside the for-loop for stylistic reasons and to be a bit more portable against shitty/subset compilers. I will pull it as-is and then edit it myself, no worries.

Declarations in for loops are provided by C standard, so if compiler does not support the standard it is not a compiler but shit. It is better style to use temporary variables like iterators in scopes. But if you want to keep same style it's ok for me.
The main improvement is reducing time complexity.

The old multiplication is O(n^3). New is only O(n^2).

mangofromage · 2019-08-23T07:58:02Z

For me it is ok to merge it and edit in way you like ;)

kokke · 2019-09-26T15:32:21Z

I got away from this PR @kolkil - sorry about that - I took some time off and forgot everything about software development :)

I will try and get it merged and edit whatever small details I want to change.

If I keep forgetting, please remind me again that I want to merge this PR.

mhaeuser · 2019-10-12T14:42:29Z

Good day,

First off, I currently use a heavily modified version of this library and hence would ask for someone to try to reproduce this with a "clean" version of the library. However, as the alternative multiplication function does not use any of the other functions, I am fairly certain this is a rare bug in the logic.

I'm not yet sure what pattern triggers it, but I do know this works very well in most cases. However, when trying to integrate this with a custom crypto stack, I started to get weird results and decided to compare this mul algo against the original. The first result is the incorrect result of this faster multiplication algorithm (locally modded, but unmodded seems to cause the same issue), the second result is the correct (verified against OpenSSL BN) solution of the original.

If I find out what causes this, I'll report back, but I'll need to familiarise myself with the algorithmic idea for a bit.

14e8839b4900fcfb5fc4d6dc64753740f59ae0afbd0fe9379bdbee18bd606c57d014e2d44e079dc3ffaabd4226c5d64792c1df2c24d3260567156b6caf278c7355fd299ae6f993952011488cd39c1129862edead758d1f31e6a11221babffab183f0056aaf735d8616972a07a128e79bdd64ef96e8de71bab4a968e40148f017a52350cd1296347c93d5c82f2b36bb72bf1f185622bdf1763a230c064d8a878c4d0058102ff68676d0a5a4f553de75230d4e727d51dd8c1c35d65f9b6feba90dc87a5d8f1f6ec49e156137430647679897c779601b00f97b6075029a32f81c4478f365a7e22482d3db69d6d5d3ede9ef2f648b3f56eb8d9ba57e838ee8f2e457f * 14e8839b4900fcfb5fc4d6dc64753740f59ae0afbd0fe9379bdbee18bd606c57d014e2d44e079dc3ffaabd4226c5d64792c1df2c24d3260567156b6caf278c7355fd299ae6f993952011488cd39c1129862edead758d1f31e6a11221babffab183f0056aaf735d8616972a07a128e79bdd64ef96e8de71bab4a968e40148f017a52350cd1296347c93d5c82f2b36bb72bf1f185622bdf1763a230c064d8a878c4d0058102ff68676d0a5a4f553de75230d4e727d51dd8c1c35d65f9b6feba90dc87a5d8f1f6ec49e156137430647679897c779601b00f97b6075029a32f81c4478f365a7e22482d3db69d6d5d3ede9ef2f648b3f56eb8d9ba57e838ee8f2e457f = fffffffefffffffefffffffcfffffffcfffffffafffffffafffffffafffffff9fffffff8fffffff5fffffff6fffffff3fffffff3fffffff1fffffff2fffffff2fffffff3fffffff3fffffff1fffffff0ffffffedffffffedffffffebffffffeeffffffecffffffeaffffffe9ffffffe7ffffffe5ffffffe6ffffffe6ffffffe3ffffffe5ffffffe1ffffffe6ffffffe2ffffffdfffffffdfffffffddffffffe0ffffffe1ffffffddffffffddffffffdeffffffdaffffffdbffffffdbffffffd5ffffffd6ffffffd5ffffffd9ffffffd4ffffffd2ffffffd4ffffffd2ffffffd2ffffffd5ffffffcfffffffd3ffffffcfffffffd1ffffffceffffffcdffffffd4fb843a8a639af780f2991b71933c8cee17bef9b931f4040732f606756c9525158cd2e892094dde117a6630df12114bec3e082b91f6f7990c4dce50536572b8f5f273e6dd1b1ec6f5daebc86ef8f88ae4b6ad81659254148cc69818bbe89af2942d2acd48da8ae6599a0821c5c7b3eaacc35b1ab4b39e923045527dfe810265d01ac1471bd344b7736d5b65dcb9e6e5c0ecb0613f269bd8906c2e49d59303adce635ad27e3010f4a335c9da561ad0154445275896affc3f882cb2a0f87f4e83c6e9b79a055389d8bf7e1ef1a38e5ca794e568d1a47cb057af41c1a139e83ed83f06cdbdbc7d92a57dc6526bd9d30157bc5449833123f0b0cbb0565ea96adab4b1 vs fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffb843ab9639af7b1f2991ba0933c8d1e17bef9e431f4043432f6069f6c9525418cd2e8ba094dde397a66310a12114c113e082bb1f6f799304dce50766572b918f273e7001b1ec716daebc88df8f88b06b6ad8185925414aac69818d6e89af2b22d2acd67da8ae6749a0821dfc7b3eac9c35b1acab39e924945527e15810265e91ac14732d344b78a6d5b65f4b9e6e5d2ecb06155269bd8a36c2e49e89303ade1635ad2903010f4b135c9da631ad01552452758a2affc3f942cb2a1037f4e83cfe9b79a0d5389d8c77e1ef1ac8e5ca79be568d1a97cb057b641c1a13de83ed84406cdbdbf7d92a580c6526bdbd30157be5449833223f0b0ccb0565ea96adab4b1

mangofromage · 2019-10-12T21:51:03Z

Could you show the code that generated this?
Algorithm is explaned in this video https://www.youtube.com/watch?v=zm3tQ_BPgm8

mhaeuser · 2019-10-13T08:00:27Z

@kolkil Thanks, I will check it out asap. I'm a bit low on time right now and cannot provide you a sample caller, but you should be able to reproduce it by just parsing above's operands and calling the alternative mul.

Frankly, I still do not understand the algo, but I found it odd there was no overflow check around here: https://github.com/kolkil/tiny-bignum-c/blob/fast_multiplication/bn.c#L309
I added code to increment tmp_to_add on overflow, and I confirmed the new result is closer to the correct solution than before, but it's not quite there yet. If you find another issue (assuming this is correct), please let me know.

EDIT: Yes, what I found was correct, and this can overflow too: https://github.com/kolkil/tiny-bignum-c/blob/fast_multiplication/bn.c#L308
With both fixed, I get the correct result, so the algo seems fine.

I cannot give you a diff as, as I said, my local copy is modded, but for the first overflow it is sufficient to store the additive result separately and compare for being smaller than one of the previous operands (i.e. tmp2 = c->array[i] + (DTYPE)tmp, tmp2 < (DTYPE)tmp). This works because, when you think of the negative two's complement, -1 is MAX_UINTx (x being the width), so a wrapped around result must be smaller than both original operands.

By the way, with Clang and an X64 target, this uses CF, so it should be pretty good performing. This can be applied to the add and sub operations as well to get rid of the 64-bit int dependency. If someone manages that for mul too, the array int size could be raised to 64-bit to increase performance (given the mul solution is not slow).

For the second mentioned overflow, I just raised tmp_to_add to DTYPE_TMP and instead of initialising back to zero, I left-shifted by 32 bits to not discard the high bits. I am not convinced it cannot overflow again given unlucky input, but I will think of something later, or hope you will come up with a new solution (or an argument for why it cannot overflow again).

EDIT2: Looking at the amount of iterations where tmp_to_add can be increased and the high results multiplications can yield, I don't feel like tmp_to_add is a feasible option. It might need a function that propagates additions upwards on the BIGNUM itself

Thanks for your work

mangofromage · 2019-10-13T12:33:15Z

I understand the problem, gonna fix it soon, thank you ;)

kokke · 2021-02-01T20:44:36Z

@kolkil I don't mean to be rude at all; have you had a chance to look at this?

mangofromage added 3 commits August 19, 2019 16:39

alternative multiplication

9566ad7

alternative fast multiplication

e00f38c

multiplication test added

68b1c86

kokke reviewed Aug 19, 2019

View reviewed changes

kokke mentioned this pull request Feb 1, 2021

#ifndef before #define BN_ARRAY_SIZE #21

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast multiplication #11

Fast multiplication #11

mangofromage commented Aug 19, 2019

kokke Aug 19, 2019

mangofromage Aug 19, 2019

mangofromage Aug 19, 2019

kokke Aug 19, 2019

mangofromage Aug 19, 2019

mangofromage Aug 20, 2019

mangofromage commented Aug 23, 2019

kokke commented Sep 26, 2019

mhaeuser commented Oct 12, 2019 •

edited

Loading

mangofromage commented Oct 12, 2019 •

edited

Loading

mhaeuser commented Oct 13, 2019 •

edited

Loading

mangofromage commented Oct 13, 2019

kokke commented Feb 1, 2021


		int usable_len = BN_ARRAY_SIZE;

		/* this section speads up algorithm by "cutting" len of bignum*/

Fast multiplication #11

Are you sure you want to change the base?

Fast multiplication #11

Conversation

mangofromage commented Aug 19, 2019

kokke Aug 19, 2019

Choose a reason for hiding this comment

mangofromage Aug 19, 2019

Choose a reason for hiding this comment

mangofromage Aug 19, 2019

Choose a reason for hiding this comment

kokke Aug 19, 2019

Choose a reason for hiding this comment

mangofromage Aug 19, 2019

Choose a reason for hiding this comment

mangofromage Aug 20, 2019

Choose a reason for hiding this comment

mangofromage commented Aug 23, 2019

kokke commented Sep 26, 2019

mhaeuser commented Oct 12, 2019 • edited Loading

mangofromage commented Oct 12, 2019 • edited Loading

mhaeuser commented Oct 13, 2019 • edited Loading

mangofromage commented Oct 13, 2019

kokke commented Feb 1, 2021

mhaeuser commented Oct 12, 2019 •

edited

Loading

mangofromage commented Oct 12, 2019 •

edited

Loading

mhaeuser commented Oct 13, 2019 •

edited

Loading