Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast multiplication #11

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Fast multiplication #11

wants to merge 3 commits into from

Conversation

mangofromage
Copy link

This is alternative multiplication method based on column operations.
https://www.youtube.com/watch?v=zm3tQ_BPgm8 - idea is from this video.
I also added test that compares methods.
The current multiplication method should be replaced with new multiplication method.


int usable_len = BN_ARRAY_SIZE;

/* this section speads up algorithm by "cutting" len of bignum*/
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to pull this part. Let me first comment it, and then tell you the reason why I would like to avoid it,

I appreciate the effort and understand why you would want this here (and in every other functions. You could also maintain a bitset of highest used bit to optimize for speed.
However it adds to the size of both source- and object code. I also fear you might miss out on vectorization optimizations from the compiler, but I wouldn't know for sure. It also makes you susceptible to leaking info via timing if used in encryption algorithms. I mostly want to take a raincheck because it also breaks the beautiful simplicity of the naïive algorithm.

Thanks for the PR - I really appreciate the effort you've put into this

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vector operations bases on processor words so don't worry.
I understand the timing problem in cryptography.
I am not sure are you talking about the same peace of code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the section with finding actual length of number can be removed.

usable_len = usable_len > BN_ARRAY_SIZE ? BN_ARRAY_SIZE : usable_len;
//

for (int i = 0; i < usable_len; ++i)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is brilliant. I love how you've reduced the allocation of two bignum's into none. That really cuts straight to the bone 👍

However I would like to avoid the scoped-declarations of variables inside the for-loop for stylistic reasons and to be a bit more portable against shitty/subset compilers. I will pull it as-is and then edit it myself, no worries.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Declarations in for loops are provided by C standard, so if compiler does not support the standard it is not a compiler but shit. It is better style to use temporary variables like iterators in scopes. But if you want to keep same style it's ok for me.
The main improvement is reducing time complexity.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old multiplication is O(n^3). New is only O(n^2).

@mangofromage
Copy link
Author

For me it is ok to merge it and edit in way you like ;)

@kokke
Copy link
Owner

kokke commented Sep 26, 2019

I got away from this PR @kolkil - sorry about that - I took some time off and forgot everything about software development :)

I will try and get it merged and edit whatever small details I want to change.

If I keep forgetting, please remind me again that I want to merge this PR.

@mhaeuser
Copy link

mhaeuser commented Oct 12, 2019

Good day,

First off, I currently use a heavily modified version of this library and hence would ask for someone to try to reproduce this with a "clean" version of the library. However, as the alternative multiplication function does not use any of the other functions, I am fairly certain this is a rare bug in the logic.

I'm not yet sure what pattern triggers it, but I do know this works very well in most cases. However, when trying to integrate this with a custom crypto stack, I started to get weird results and decided to compare this mul algo against the original. The first result is the incorrect result of this faster multiplication algorithm (locally modded, but unmodded seems to cause the same issue), the second result is the correct (verified against OpenSSL BN) solution of the original.

If I find out what causes this, I'll report back, but I'll need to familiarise myself with the algorithmic idea for a bit.

14e8839b4900fcfb5fc4d6dc64753740f59ae0afbd0fe9379bdbee18bd606c57d014e2d44e079dc3ffaabd4226c5d64792c1df2c24d3260567156b6caf278c7355fd299ae6f993952011488cd39c1129862edead758d1f31e6a11221babffab183f0056aaf735d8616972a07a128e79bdd64ef96e8de71bab4a968e40148f017a52350cd1296347c93d5c82f2b36bb72bf1f185622bdf1763a230c064d8a878c4d0058102ff68676d0a5a4f553de75230d4e727d51dd8c1c35d65f9b6feba90dc87a5d8f1f6ec49e156137430647679897c779601b00f97b6075029a32f81c4478f365a7e22482d3db69d6d5d3ede9ef2f648b3f56eb8d9ba57e838ee8f2e457f * 14e8839b4900fcfb5fc4d6dc64753740f59ae0afbd0fe9379bdbee18bd606c57d014e2d44e079dc3ffaabd4226c5d64792c1df2c24d3260567156b6caf278c7355fd299ae6f993952011488cd39c1129862edead758d1f31e6a11221babffab183f0056aaf735d8616972a07a128e79bdd64ef96e8de71bab4a968e40148f017a52350cd1296347c93d5c82f2b36bb72bf1f185622bdf1763a230c064d8a878c4d0058102ff68676d0a5a4f553de75230d4e727d51dd8c1c35d65f9b6feba90dc87a5d8f1f6ec49e156137430647679897c779601b00f97b6075029a32f81c4478f365a7e22482d3db69d6d5d3ede9ef2f648b3f56eb8d9ba57e838ee8f2e457f = fffffffefffffffefffffffcfffffffcfffffffafffffffafffffffafffffff9fffffff8fffffff5fffffff6fffffff3fffffff3fffffff1fffffff2fffffff2fffffff3fffffff3fffffff1fffffff0ffffffedffffffedffffffebffffffeeffffffecffffffeaffffffe9ffffffe7ffffffe5ffffffe6ffffffe6ffffffe3ffffffe5ffffffe1ffffffe6ffffffe2ffffffdfffffffdfffffffddffffffe0ffffffe1ffffffddffffffddffffffdeffffffdaffffffdbffffffdbffffffd5ffffffd6ffffffd5ffffffd9ffffffd4ffffffd2ffffffd4ffffffd2ffffffd2ffffffd5ffffffcfffffffd3ffffffcfffffffd1ffffffceffffffcdffffffd4fb843a8a639af780f2991b71933c8cee17bef9b931f4040732f606756c9525158cd2e892094dde117a6630df12114bec3e082b91f6f7990c4dce50536572b8f5f273e6dd1b1ec6f5daebc86ef8f88ae4b6ad81659254148cc69818bbe89af2942d2acd48da8ae6599a0821c5c7b3eaacc35b1ab4b39e923045527dfe810265d01ac1471bd344b7736d5b65dcb9e6e5c0ecb0613f269bd8906c2e49d59303adce635ad27e3010f4a335c9da561ad0154445275896affc3f882cb2a0f87f4e83c6e9b79a055389d8bf7e1ef1a38e5ca794e568d1a47cb057af41c1a139e83ed83f06cdbdbc7d92a57dc6526bd9d30157bc5449833123f0b0cbb0565ea96adab4b1 vs fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffb843ab9639af7b1f2991ba0933c8d1e17bef9e431f4043432f6069f6c9525418cd2e8ba094dde397a66310a12114c113e082bb1f6f799304dce50766572b918f273e7001b1ec716daebc88df8f88b06b6ad8185925414aac69818d6e89af2b22d2acd67da8ae6749a0821dfc7b3eac9c35b1acab39e924945527e15810265e91ac14732d344b78a6d5b65f4b9e6e5d2ecb06155269bd8a36c2e49e89303ade1635ad2903010f4b135c9da631ad01552452758a2affc3f942cb2a1037f4e83cfe9b79a0d5389d8c77e1ef1ac8e5ca79be568d1a97cb057b641c1a13de83ed84406cdbdbf7d92a580c6526bdbd30157be5449833223f0b0ccb0565ea96adab4b1

@mangofromage
Copy link
Author

mangofromage commented Oct 12, 2019

Could you show the code that generated this?
Algorithm is explaned in this video https://www.youtube.com/watch?v=zm3tQ_BPgm8

@mhaeuser
Copy link

mhaeuser commented Oct 13, 2019

@kolkil Thanks, I will check it out asap. I'm a bit low on time right now and cannot provide you a sample caller, but you should be able to reproduce it by just parsing above's operands and calling the alternative mul.

Frankly, I still do not understand the algo, but I found it odd there was no overflow check around here: https://github.com/kolkil/tiny-bignum-c/blob/fast_multiplication/bn.c#L309
I added code to increment tmp_to_add on overflow, and I confirmed the new result is closer to the correct solution than before, but it's not quite there yet. If you find another issue (assuming this is correct), please let me know.

EDIT: Yes, what I found was correct, and this can overflow too: https://github.com/kolkil/tiny-bignum-c/blob/fast_multiplication/bn.c#L308
With both fixed, I get the correct result, so the algo seems fine.

I cannot give you a diff as, as I said, my local copy is modded, but for the first overflow it is sufficient to store the additive result separately and compare for being smaller than one of the previous operands (i.e. tmp2 = c->array[i] + (DTYPE)tmp, tmp2 < (DTYPE)tmp). This works because, when you think of the negative two's complement, -1 is MAX_UINTx (x being the width), so a wrapped around result must be smaller than both original operands.

By the way, with Clang and an X64 target, this uses CF, so it should be pretty good performing. This can be applied to the add and sub operations as well to get rid of the 64-bit int dependency. If someone manages that for mul too, the array int size could be raised to 64-bit to increase performance (given the mul solution is not slow).

For the second mentioned overflow, I just raised tmp_to_add to DTYPE_TMP and instead of initialising back to zero, I left-shifted by 32 bits to not discard the high bits. I am not convinced it cannot overflow again given unlucky input, but I will think of something later, or hope you will come up with a new solution (or an argument for why it cannot overflow again).

EDIT2: Looking at the amount of iterations where tmp_to_add can be increased and the high results multiplications can yield, I don't feel like tmp_to_add is a feasible option. It might need a function that propagates additions upwards on the BIGNUM itself

Thanks for your work

@mangofromage
Copy link
Author

I understand the problem, gonna fix it soon, thank you ;)

@kokke
Copy link
Owner

kokke commented Feb 1, 2021

@kolkil I don't mean to be rude at all; have you had a chance to look at this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants