Skip to content

Commit

Permalink
Represent syscalls as the murmur32 hash of ther name
Browse files Browse the repository at this point in the history
  • Loading branch information
LucasSte committed Jan 23, 2025
1 parent 5d2ea35 commit 16c0bed
Showing 1 changed file with 20 additions and 68 deletions.
88 changes: 20 additions & 68 deletions proposals/0178-static-syscalls.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ a new `return` instruction to supersede the `exit` instruction.
The resolution of syscalls during ELF loading requires relocating addresses,
which is a performance burden for the validator. Relocations require an entire
copy of the ELF file in memory to either relocate addresses we fetch from the
symbol table or offset addresses to after the start of the virtual machines
symbol table or offset addresses to after the start of the virtual machine's
memory. Moreover, relocations pose security concerns, as they allow the
arbitrary modification of program headers and programs sections. A new
separate opcode for syscalls modifies the behavior of the ELF loader, allowing
Expand All @@ -45,26 +45,30 @@ specification of SIMD-0161.
We introduce a new instruction in the SBPF instruction set, which we call
`syscall`. It must be associated with all syscalls in the SBPF format. Its
encoding consists of an opcode `0x95` and an immediate, which must refer to a
previously registered syscall. For more reference on the SBF ISA format, see
the
previously registered syscall hash code. For more reference on the SBF ISA
format, see the
[spec document](https://github.com/solana-labs/rbpf/blob/main/doc/bytecode.md).

For simplicity, syscalls must be represented as a natural number greater than
zero, so that they can be organized in a lookup table. This choice allows for
quick retrieval of syscall information from integer indexes. An instruction
`syscall 2` must represent a call to the function registered at position two
in the lookup table.
We define the hash code for a syscall as the murmur32 hash of its respective
name. The 32-bit immediate value of the new `syscall` instruction must be the
integer representation of such a hash. For instance, the code for `abort` is
given by `murmur32("abort")`, so the instruction assembly should look like
`syscall 3069975057`.

Consequently, system calls in the Solana SDK and in any related compiler tools
must be registered as function pointers, whose address is a natural number
greater than zero, representing their position in a syscall lookup table. The
verifier must enforce that the immediate of a syscall instruction points to a
valid syscall, and throw `VerifierError::InvalidSyscall` otherwise.
must be registered as function pointers, whose address is the murmur32 hash of
their name. The bytecode verifier must enforce that the immediate value of a
syscall instruction points to a valid syscall, and throw
`VerifierError::InvalidSyscall` otherwise.

This new instruction comes together with modifications in the verification
phase. `call imm` (opcode `0x85`) instructions must only refer to internal
calls and its immediate field must only be interpreted as a relative address
to jump from the program counter.
This new instruction comes together with modifications in the semantics of
`call imm` (opcode `0x85`) instructions, which must only refer to internal
calls and their immediate field must only be interpreted as a relative address
to jump from the program counter.

Syscall names must NOT be present in the symbol table anymore, since the new
scheme does not require symbol relocations and obviates the need for symbols
to be referenced in the table.

### New return instruction

Expand All @@ -77,58 +81,6 @@ throw a `VerifierError::UnknowOpCode`. Likewise, if, by any means, a V1
program reaches the execution stage containing the `0x9D` opcode, an
`EbpfError::UnsupportedInstruction` must be raised.

### Syscall numbering convention

Syscalls must be represented by a unique integer to maintain a dense lookup
table data structure for indexing and dispatch. For a clear correlation
between the existing syscalls and their respective identification number,
syscalls must strictly follow the numbering below.

| Syscall name | Number |
|------------------------------------------|----------|
| abort | 1 |
| sol_panic_ | 2 |
| sol_memcpy_ | 3 |
| sol_memmove_ | 4 |
| sol_memset_ | 5 |
| sol_memcmp_ | 6 |
| sol_log_ | 7 |
| sol_log_64_ | 8 |
| sol_log_pubkey | 9 |
| sol_log_compute_units_ | 10 |
| sol_alloc_free_ | 11 |
| sol_invoke_signed_c | 12 |
| sol_invoke_signed_rust | 13 |
| sol_set_return_data | 14 |
| sol_get_return_data | 15 |
| sol_log_data | 16 |
| sol_sha256 | 17 |
| sol_keccak256 | 18 |
| sol_secp256k1_recover | 19 |
| sol_blake3 | 20 |
| sol_poseidon | 21 |
| sol_get_processed_sibling_instruction | 22 |
| sol_get_stack_height | 23 |
| sol_curve_validate_point | 24 |
| sol_curve_group_op | 25 |
| sol_curve_multiscalar_mul | 26 |
| sol_curve_pairing_map | 27 |
| sol_alt_bn128_group_op | 28 |
| sol_alt_bn128_compression | 29 |
| sol_big_mod_exp | 30 |
| sol_remaining_compute_units | 31 |
| sol_create_program_address | 32 |
| sol_try_find_program_address | 33 |
| sol_get_sysvar | 34 |
| sol_get_epoch_stake | 35 |
| sol_get_clock_sysvar | 36 |
| sol_get_epoch_schedule_sysvar | 37 |
| sol_get_last_restart_slot | 38 |
| sol_get_epoch_rewards_sysvar | 39 |
| sol_get_fees_sysvar | 40 |
| sol_get_rent_sysvar | 41 |
|------------------------------------------|----------|

## Alternatives Considered

None.
Expand Down

0 comments on commit 16c0bed

Please sign in to comment.