Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade @oslojs/oauth2 dependency #108

Merged
merged 5 commits into from
Aug 6, 2024
Merged

Upgrade @oslojs/oauth2 dependency #108

merged 5 commits into from
Aug 6, 2024

Conversation

sergiodxa
Copy link
Owner

@sergiodxa sergiodxa commented Aug 6, 2024

Upgrade the doc to use the latest version of Oslo, copy parts of old versions of Oslo to this library to keep the compatibility while using the new version of Oslo

@sergiodxa sergiodxa added the dependencies Pull requests that update a dependency file label Aug 6, 2024
@sergiodxa sergiodxa self-assigned this Aug 6, 2024
@sergiodxa sergiodxa marked this pull request as ready for review August 6, 2024 05:44
@sergiodxa sergiodxa merged commit 368e877 into main Aug 6, 2024
5 checks passed
@sergiodxa sergiodxa deleted the upgrade-oslo branch August 6, 2024 05:45
Copy link
Contributor

@runjak runjak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work here - I like this better and think it's more forward facing than my attempt in #107 .

I'm mainly wondering about the usages of oslo for hashing and encoding.

For hashing I'm under the impression that SubtleCrypto may be a better fit as it should deliver better performance by virtue of being native code.

For encoding my understanding is that depending on Buffer as provided by node and compatible runtimes could be beneficial - again for performance - because it should also be native code.

Using Buffer would narrow the runtimes to ones that have it - node and bun do, but others may not.

Performance aside I think it also makes sense to depend on native code where possible to reduce dependencies.

I'll try to come up with some measurements to supplement my performance suspicions.

}

public setS256CodeChallenge(codeVerifier: string): void {
const codeChallengeBytes = sha256(new TextEncoder().encode(codeVerifier));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering whether this line could instead use the SubtleCrypto digest method. It would allow for the execution of native, possibly more optimised code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm - this surprises me:

I've made a fork of oslojs/crypto to check the compatibility and compare the performance using vitest bench.

To my surprise it appears that the Oslo implementation of sha256 may be faster than the SubtleCrypto one:
Screenshot 2024-08-06 at 11 37 09

It makes me wonder whether this is because of SubtleCrypto being async or something, and I may look into it further - but at least initially it is contrary to my intuition and claim.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm - maybe a weakness of my benchmark setup was to only ever hash 5 bytes.

The results become different when I modify the code to look like this:

import { bench, expect } from "vitest";
import { sha256 } from "./sha256.js";

const iterations = 10_000;

const byteCount = 4096;

bench(
	"SHA256: Oslo",
	() => {
		const randomValues = crypto.getRandomValues(new Uint8Array(byteCount));
		const hash = sha256(randomValues);

		expect(hash.length).toBe(32);
	},
	{ iterations }
);

bench(
	"SHA256: SubtleCrypto",
	async () => {
		const randomValues = crypto.getRandomValues(new Uint8Array(byteCount));
		const hash = await crypto.subtle.digest("SHA-256", randomValues);

		expect(hash.byteLength).toBe(32);
	},
	{ iterations }
);

Screenshot 2024-08-06 at 15 02 47

So there's certainly a question of input size at play here. At the same time I see a chance for remix-auth-oauth2 to have a dependency less.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking into node it seems crypto.subtle.digest is provided by digest in lib/internal/crypto/webcrypto.js, which seems to schedule a HashJob defined in src/crypto/crypto_hash.h.

It makes sense to me that this would be faster than oslojs, but I see a possible question as to the input size we're expecting, too. Given that Generator.codeVerifier only provides 32 bytes of randomness we may generally be dealing with shorter sequences of bytes.

For 32 bytes of input to the hash function I've still got Oslo ahead in hash timings. Generally hashing seems to be so fast here that it should barely make a difference - and that it shouldn't matter much whether it's done sync or async.

I think I'd have a slight preference for SubtleCrypto on the grounds of using fewer dependencies and using something that's less likely to cause issues if used for bigger inputs while still being okay for smaller input sizes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made a suggestion to drop @oslojs/crypto here: #109


public setS256CodeChallenge(codeVerifier: string): void {
const codeChallengeBytes = sha256(new TextEncoder().encode(codeVerifier));
const codeChallenge = encodeBase64urlNoPadding(codeChallengeBytes);
Copy link
Contributor

@runjak runjak Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Buffer.from(codeChallengeBytes).toString(base64url) could do the same, with the docs saying that it also adds no padding. It, too could be a chance for native code over rewritten encoding.

My understanding is that remix-auth-oauth2 may shy away from Buffer here for interoperability with other runtimes? If I'm reading right Bun also has Buffer around, but deno seems a bit different.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made a fork of oslojs/encoding to check the compatibility between encodeBase64urlNoPadding and using Buffer directly and to compare the performance using vitest bench.

Executing this locally with both node 20.7.0 as well as bun 1.1.21 gave these results:
Screenshot 2024-08-06 at 14 24 58

It seems that using Buffer could indeed be faster or at least be of similar speed. At the same time this could allow to remove the dependency on @oslojs/encoding.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The effect here becomes even more pronounced when I increase the string sizes:

import { bench, expect } from "vitest";
import { encodeBase64urlNoPadding } from "./base64.js";

const iterations = 1_000;

const from = 4_000;
const to = from + 100;

bench(
	"encodeBase64urlNoPadding: Oslo",
	() => {
		for (let i = from; i <= to; i++) {
			const bytes = new Uint8Array(i);
			crypto.getRandomValues(bytes);

			const oslo = encodeBase64urlNoPadding(bytes);

			expect(oslo).not.toEqual("");
		}
	},
	{ iterations }
);

bench(
	"encodeBase64urlNoPadding: Node",
	() => {
		for (let i = from; i <= to; i++) {
			const bytes = new Uint8Array(i);
			crypto.getRandomValues(bytes);

			const node = Buffer.from(bytes).toString("base64url");

			expect(node).not.toEqual("");
		}
	},
	{ iterations }
);

Screenshot 2024-08-06 at 14 56 59

I've looked into the code of node a bit and found that indeed the buffer encoding ends up in native code and from what I could gather leads somewhere to encode_base64 in deps/simdutf/simdutf.cpp, which seems to make use of SIMD instructions rather heavily.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that remix itself makes use of Buffers in some places. An example would be packages/remix-node/sessions/fileStorage.ts, where createFileSessionStorage uses Buffer.from.

Personally I'd be a fan of ditching the @oslojs/encoding dependency in favour of more node and bun specific APIs, but I feel this would be your call to make @sergiodxa :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related suggestion: #110

Copy link
Owner Author

@sergiodxa sergiodxa Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Buffer doesn't exists on Cloudflare, without node_compat, in general Cloudflare is the main reason to avoid these things as it has a more limited runtime APIs

Remix's @remix-run/node package is only for Node, in @remix-run/cloudflare they don't use Buffer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see - this wasn't on my radar - thanks 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants