SIMD-0178: SBPF Static Syscalls #178

LucasSte · 2024-10-03T16:57:00Z

No description provided.

proposals/0176-static-syscalls.md

proposals/0178-static-syscalls.md

buffalojoec

Looks great to me! I think we just need to land on which SBPF version this goes into?

topointon-jump

This is awesome! Removing relocations is a big win. My only comment is the opcode change - this feels a bit unnecessary. But it is not a deal-breaker.

topointon-jump · 2024-10-31T22:15:26Z

proposals/0178-static-syscalls.md

+The opcode `0x9D` must represent the return instruction, which supersedes the 
+`exit` instruction. The opcode (opcode `0x95`), previously assigned to the 
+`exit` instruction, must now be interpreted as the new syscall instruction.


What is the motivation behind changing this?

Also, changing the name from exit to return when it is the same instruction could be confusing. I have already seen this confused in other SIMDs.

Side note - we should bundle large sets of proposed ISA changes together into the same SBPF version upgrade, so that clients don't have to support a mis-mash of ISAs based on feature flags. I believe this is the intent of #161, but just re-iterating 🙏

Motivation is that exit was occupying the slot in the instruction class for controlflow with immediate values and it does not take an immediate value. The new syscall opcode however does, so it took its place.

topointon-jump · 2024-11-01T18:54:18Z

proposals/0178-static-syscalls.md

+## Detailed Design
+
+The following must go into effect if and only if a program indicates the SBPF 
+version XX or higher in its ELF header e_flags field, according to the 


Should we specify which version XX is?

topointon-jump · 2024-11-01T18:56:38Z

proposals/0178-static-syscalls.md

+The resolution of syscalls during ELF loading requires relocating addresses, 
+which is a performance burden for the validator. Relocations require an entire 
+copy of the ELF file in memory to either relocate addresses we fetch from the 
+symbol table or offset addresses to after the start of the virtual machine’s 
+memory. Moreover, relocations pose security concerns, as they allow the 
+arbitrary modification of program headers and programs sections. A new 
+separate opcode for syscalls modifies the behavior of the ELF loader, allowing 
+us to resolve syscalls without relocations.


ravyu-jump · 2024-11-04T19:41:01Z

proposals/0178-static-syscalls.md

+phase. `call imm` (opcode `0x85`) instructions must only refer to internal 
+calls and its immediate field must only be interpreted as a relative address 
+to jump from the program counter. 


Does this mean there is no longer a need to hash the immediates?

It does and is the intention.

proposals/0178-static-syscalls.md

deanmlittle · 2025-01-18T07:56:02Z

proposals/0178-static-syscalls.md

+program reaches the execution stage containing the `0x9D` opcode, an 
+`EbpfError::UnsupportedInstruction` must be raised.
+
+### Syscall numbering convention


Integer is not a great idea. SVM will continue to diverge more and more as non-mainnet SVM chains and L2s develop. If they wish to push their own syscalls, not only will they need to write their own tooling to handle it, if Solana mainnet adds another syscall that clashes and that same binary is shipped to another chain, or vice-versa, people could lose money. It would make way more sense to use the Murmur3 hash. If they decide to launch a hash collision, at least we tried to stop them.

Murmur3 is what the original implementation did, and I was against switching to indexes, but then I got busy on other stuff (I just noticed that master was switched to indexes).

Is the best argument for indexes that lookup is faster? And if that's the argument, isn't it moot considering that we JIT?

Hey @deanmlittle,
Thanks for your input. I didn't follow why teams would have to develop their own tooling. On the Solana SDK, changing from consecutive integers to a murmur hash is simply assigning another constant to the function pointers. The compiler toolchain can deal with both cases without changes.

On the other hand, I partially agree that using consecutive numbers may hinder the development of SVM chains. In Agave, we considered that using a contiguous array would add to much complexity to the code just to handle inactive syscalls or deprecated ones, so we are still using a BTree, as there are only 40 syscalls to lookup for.

Agave's implementation does not prevent SVM external users from calculating the murmur32 hash for their own syscalls, as any 32-bit integer can be used for indexing. The numbering convention they use does not need to match ours, provided that the numbers don't coincide.

Using consecutive numbers was a request from Firedancer. I believe either @topointon-jump, @ripatel-fd or @0x0ece can elaborate more on the reasons.

@LucasSte The main argument is to optimize for bytecode decode efficiency. Interpreting a syscall instruction with a hash requires at least two memory accesses, having an index requires only one. That's a significant cost saving for an instruction that may be executed up to 1 billion times per second in the future.

@deanmlittle There's no need to lose ABI security protections or break cross-SVM program compatibility. This SIMD makes no mention of removal of the symbol table. Currently, the dynamic symbol table of each program maps syscall names to the syscall hash. It would make sense to redefined st_value to carry the new syscall ID in this SIMD.

Then, the ELF loader can trivially reject programs that have an unknown syscall name or mismatching ID. And the bytecode verifier should reject syscall invocations that weren't verified via the symbol table. ABIv2 proposed previously by Anza similarly moves checks to bytecode verification from later stages, so this wouldn't be out of line.

The other concern you brought up is compatibility. A public GH repo listing IDs and their users is a common way to solve the enum problem. (Examples of other projects doing this: https://github.com/multiformats/multicodec/blob/master/table.csv https://www.iana.org/assignments/tls-extensiontype-values/tls-extensiontype-values.xhtml)

@LucasSte Could you clarify in the SIMD whether static syscalls are verified during ELF loading or bytecode verification? If we won't have these measures in place to support enums, I agree with Dean that we should keep the hash instead.

Is the best argument for indexes that lookup is faster? And if that's the argument, isn't it moot considering that we JIT?

@alessandrod There is a case for allowing zero-copy execution out of a bytecode buffer (i.e. interpreter). With direct mapping, we've seen that the average per-instruction overhead for mainnet executions is so high that a JIT barely outperforms Firedancer's interpreter even when the compiled program is in program cache. Bytecode translation is more susceptible to DoS due to the high cost of allocating memory and JIT compiling when spam invoking cold programs. FWIW, we are beginning mainnet testing of the full Firedancer client too, which is interpreter-only.

@alessandrod There is a case for allowing zero-copy execution out of a bytecode buffer (i.e. interpreter).

you don't have to fixup in place right?

I understand your perspective. From the anza POV tho, there would be no overhead in having hashes since we always JIT. From the devs POV, I think we can agree that having SVM compatibility is desirable if it doesn't come at a huge cost?

So is it worth making a protocol compromise to accommodate for fd's current implementation?

I don't have a hard preference, so I fine reverting the spec to mumur32. @ripatel-fd and @Lichtso are you OK with it?

out of curiosity, who are these alternative SVMs, L2s and syscalls?
(and btw, there's no requirement for the indexes to be continuous, an L2 could for example start from 50 to avoid collisions. Or they could also add the syscall to the L1 SVM since this is Solana and not Ethereum...)

To name a few: Soon, SonicSVM, Atlas, Eclipse, and redacted. There are many use cases not well-served by mainnet-beta due to protocol development grinding to a halt for the past 2 years for obvious reasons. People are starting to innovate elsewhere, and that's actually a very good thing, as if any of these experiments gain adoption or unlock new use cases, we have the ability to adopt them later after being tested in the wild. Making an absolute mess of the enum by padding the indices may sound like a great idea today, but the 20 different versions of the enum depending upon the chain and the inevitable 50th mainnet syscall breaking everything will eventually prove otherwise. If cross-SVM compatibility is of any importance, the Murmur hash is a much better solution for avoiding collisions.

protocol development grinding to a halt for the past 2 years for obvious reasons.

@deanmlittle This is a joke, right? 😂 Firedancer implemented 140 feature gates the last 24 months (82 features on Jun 2023, 220 or so today). That's about 6 feature gates per month. Solana has kept up an impressive development pace despite the network-wide focus on security after the outages.

Making an absolute mess of the enum by padding the indices may sound like a great idea today, but the 20 different versions of the enum depending upon the chain and the inevitable 50th mainnet syscall breaking everything will eventually prove otherwise.

This problem is simply not as dramatic as you make it out to be. IPv4 addresses have the same problem space (32 bits), there are about 1 million prefixes serving far more people today than the syscall table would ever have to.

I am also unsure the necessity of the symbol table if we went with this design?

Probably unnecessary. With ~10K syscalls, the probability of at least one hash collision is 1%.
With ~80K it's 50%. I think it's fine to assume that there will never be more than 10K syscalls.

So while much less scalable than a fixed mapping, should still be enough.

Thanks for all the feedback. I'll revert the syscalls code back to murmur32 hash. I'll update the SIMD text shortly.

LucasSte changed the title ~~SIMD-XXXX: SBPF Static Syscalls~~ SIMD-0178: SBPF Static Syscalls Oct 3, 2024

github-actions bot mentioned this pull request Oct 7, 2024

Upstream Updates - Mon Oct 7 00:14:36 UTC 2024 smartcontractkit/chainlink-solana#880

Closed

buffalojoec reviewed Oct 7, 2024

View reviewed changes

proposals/0176-static-syscalls.md Outdated Show resolved Hide resolved

proposals/0176-static-syscalls.md Outdated Show resolved Hide resolved

LucasSte marked this pull request as ready for review October 7, 2024 13:20

This was referenced Oct 8, 2024

Introduce ebpf::RETURN instruction solana-labs/rbpf#607

Closed

Include new syscall instruction in the (dis)assembler solana-labs/rbpf#611

Merged

buffalojoec reviewed Oct 17, 2024

View reviewed changes

proposals/0178-static-syscalls.md Outdated Show resolved Hide resolved

buffalojoec previously approved these changes Oct 21, 2024

View reviewed changes

LucasSte dismissed buffalojoec’s stale review via 9589861 October 22, 2024 21:29

This was referenced Oct 23, 2024

Add new syscall instruction to the verifier solana-labs/rbpf#620

Merged

Add new syscall instruction to interpreter and jitter solana-labs/rbpf#621

Merged

topointon-jump reviewed Nov 1, 2024

View reviewed changes

ravyu-jump reviewed Nov 4, 2024

View reviewed changes

0x0ece reviewed Dec 9, 2024

View reviewed changes

proposals/0178-static-syscalls.md Outdated Show resolved Hide resolved

Lichtso reviewed Dec 16, 2024

View reviewed changes

proposals/0178-static-syscalls.md Outdated Show resolved Hide resolved

LucasSte mentioned this pull request Dec 16, 2024

[SOL] Implement return instruction anza-xyz/llvm-project#120

Merged

0x0ece approved these changes Dec 17, 2024

View reviewed changes

0x0ece mentioned this pull request Dec 17, 2024

simd-0178: static syscalls firedancer-io/firedancer#3711

Merged

deanmlittle reviewed Jan 18, 2025

View reviewed changes

LucasSte added 7 commits January 23, 2025 18:05

SBPF Static Syscalls

1541363

Bump SIMD number

13b36f5

Fix typo

8a3932a

Rename exit instruction to return

4d78d7d

Add syscall numbering table

1732281

Update status

df95922

Update table

13ce52d

LucasSte added 4 commits January 23, 2025 18:05

Update error message

ca19222

Fix typo in sol_log syscall

0c1dc23

Fix syscall name and header flag

5d2ea35

Represent syscalls as the murmur32 hash of ther name

16c0bed

LucasSte force-pushed the static-syscalls branch from 55b3fb5 to 16c0bed Compare January 23, 2025 21:05

This was referenced Jan 23, 2025

[SOL] Revert "Update codes for static syscalls (#24)" anza-xyz/compiler-builtins#28

Merged

[SOL] Revert "Update syscalls num (#97)" anza-xyz/rust#100

Merged

Revert "Update syscalls code in definitions (#4196)" anza-xyz/agave#4669

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SIMD-0178: SBPF Static Syscalls #178

SIMD-0178: SBPF Static Syscalls #178

LucasSte commented Oct 3, 2024

buffalojoec left a comment

topointon-jump left a comment •

edited

Loading

topointon-jump Oct 31, 2024 •

edited

Loading

topointon-jump Nov 1, 2024 •

edited

Loading

topointon-jump Nov 3, 2024 •

edited

Loading

Lichtso Nov 4, 2024

topointon-jump Nov 1, 2024

topointon-jump Nov 1, 2024

ravyu-jump Nov 4, 2024

Lichtso Nov 4, 2024

deanmlittle Jan 18, 2025

alessandrod Jan 18, 2025

LucasSte Jan 18, 2025 •

edited

Loading

ripatel-fd Jan 19, 2025

alessandrod Jan 20, 2025

LucasSte Jan 22, 2025

0x0ece Jan 22, 2025

deanmlittle Jan 23, 2025 •

edited

Loading

ripatel-fd Jan 23, 2025

LucasSte Jan 23, 2025

SIMD-0178: SBPF Static Syscalls #178

Are you sure you want to change the base?

SIMD-0178: SBPF Static Syscalls #178

Conversation

LucasSte commented Oct 3, 2024

buffalojoec left a comment

Choose a reason for hiding this comment

topointon-jump left a comment • edited Loading

Choose a reason for hiding this comment

topointon-jump Oct 31, 2024 • edited Loading

Choose a reason for hiding this comment

topointon-jump Nov 1, 2024 • edited Loading

Choose a reason for hiding this comment

topointon-jump Nov 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LucasSte Jan 18, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deanmlittle Jan 23, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

topointon-jump left a comment •

edited

Loading

topointon-jump Oct 31, 2024 •

edited

Loading

topointon-jump Nov 1, 2024 •

edited

Loading

topointon-jump Nov 3, 2024 •

edited

Loading

LucasSte Jan 18, 2025 •

edited

Loading

deanmlittle Jan 23, 2025 •

edited

Loading