-
Notifications
You must be signed in to change notification settings - Fork 337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add intermediate InstructionDetails struct for instruction processing #2399
feat: add intermediate InstructionDetails struct for instruction processing #2399
Conversation
ah, forgot to cherry-pick tests, convert to draft till tests are imported. |
af05e47
to
b0b1b68
Compare
compute_budget_instruction_details.process_instruction( | ||
i as u8, | ||
program_id, | ||
instruction, | ||
)?; | ||
builtin_instruction_details.process_instruction(program_id, instruction)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this API very strange. I would prefer if we did something like add a method to SanitizedMessage
called something like try_process_compute_budget_instructions
or use a ComputeBudgetInstructionDetails::try_from_message
constructor which takes the SanitizedMessage
as an argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
continue from above discussion, say if we filtered a list of interested instructions, which might be a mix of builtins, pre-compiles, compute-budgets and perhaps some other types; It's still good to iterate through the list once, pass each instruction for different processors to collect individual info. So we are operates on per-instruction basis, instead of per SanitizedMessage
(or SVMMessage
). Here it loosely implements Visitor pattern, where different instructions are processed by different processors. Might be better to refactor to something like this:
for ix in fitlered_instructions {
ix.accept(&mut compute_budget_instruction_details)?;
ix.accept(&mut builtin_instruction_details)?;
... ...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the visitor pattern is fine I guess but I don't like that this interface operates over individual unfiltered instructions and does a hash for each program id as @apfitzge mentioned above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
been doing bench and profiling on iteration and hashing. Will share finding later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a782e33 adds an old-school quick builtins-filter, using the bench from #2498 , it addresses the concern: it no longer wastes time to hash non-builtin program key to check against hashmap.
bench of master
running 5 tests
test bench_process_compute_budget_instructions_builtins ... bench: 16,077 ns/iter (+/- 397)
test bench_process_compute_budget_instructions_compute_budgets ... bench: 31,168 ns/iter (+/- 952)
test bench_process_compute_budget_instructions_empty ... bench: 3,264 ns/iter (+/- 20)
test bench_process_compute_budget_instructions_mixed ... bench: 449,149 ns/iter (+/- 6,138)
test bench_process_compute_budget_instructions_non_builtins ... bench: 16,066 ns/iter (+/- 132)
bench before this commit - almost doubled time for all scenarios
running 5 tests
test bench_process_compute_budget_instructions_builtins ... bench: 42,953 ns/iter (+/- 1,501)
test bench_process_compute_budget_instructions_compute_budgets ... bench: 43,980 ns/iter (+/- 820)
test bench_process_compute_budget_instructions_empty ... bench: 6,138 ns/iter (+/- 176)
test bench_process_compute_budget_instructions_mixed ... bench: 1,353,764 ns/iter (+/- 27,608)
test bench_process_compute_budget_instructions_non_builtins ... bench: 35,780 ns/iter (+/- 1,129)
bench with this commit - non_builtins perf back t o original
running 5 tests
test bench_process_compute_budget_instructions_builtins ... bench: 40,815 ns/iter (+/- 269)
test bench_process_compute_budget_instructions_compute_budgets ... bench: 44,544 ns/iter (+/- 155)
test bench_process_compute_budget_instructions_empty ... bench: 5,770 ns/iter (+/- 49)
test bench_process_compute_budget_instructions_mixed ... bench: 389,491 ns/iter (+/- 5,886)
test bench_process_compute_budget_instructions_non_builtins ... bench: 15,358 ns/iter (+/- 515)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this still shows the first 2 scenarios as having much worse performance, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, these two demonstrate the added cost of locating builtin program cost from BUILTIN_INSTRUCTION_COSTS, but to builtins only
let mut compute_budget_instruction_details = ComputeBudgetInstructionDetails::default(); | ||
let mut builtin_instruction_details = BuiltinInstructionDetails::default(); | ||
|
||
for (i, (program_id, instruction)) in instructions.enumerate() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach of going instruction by instruction and comparing each program id is not very efficient. We should probably use a deduped program id list to find the compute budget program id, filter relevant instructions for processing, and then build the struct of the tx requested compute budget parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting idea to explore. So the idea is to collect as many instruction related information in one iteration, so we don't have to scan tx many times. For the immediate needs, InstructionDetails
currently collects info about compute-budget and builtin instructions. We could filter just builtin instructions first. I am also planning to add pre-compiled info into instructionDetails
(such as num_signatures), that might also be able to fit into same schema. I'd like to think about it more, maybe can do this in follow up PRs? As for this PR, it can continue keep currently behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't resolve conversations that aren't resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you try benching this approach as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't resolve conversations that aren't resolved
I closed it cause I thought this conversation has merged into above one, there I added filter and bench results. Sound like you have different idea to try here. Can you explain more? By "use a deduped program id list", I am assuming you are referring to compute-budget programs specifically, but we are not just processing compute budget anymore, also want to accumulate all builtin's cost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the idea is that a program_id_index
may appear multiple times in a single tx. We do not need to resolve it to builtin/compute-budget each time we see that index - only the first time.
pub struct InstructionDetails { | ||
builtin_instruction_details: BuiltinInstructionDetails, | ||
compute_budget_instruction_details: ComputeBudgetInstructionDetails, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I see the value of combining different "instruction details" into a single struct. Can we keep them separate? I don't see much value in eliminating a loop since the work inside the loop is more expensive than the loop itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind like this umbrella struct, it defines concept well: giving the list of instructions, it fetches and stores their "details". What "details" are can just be internal knowledge. Information in builtin_instruction_details
and compute_budget_instruction_details
are used together (in num_non_compute_budget_instructions()
), they probably could be flatted into one InstructionDetails
struct (as in my initial version). From call site perspective, it's easier to work with one instructionDetails
vs different "details". I broke them into separate pieces to make changing and unit testing easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
787326d removed the "inner details" structs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to keep the inner structs and remove the outer struct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I tried to say was the "inner details" are not truly independent two set of datas, rather they were artificially separated.
Can you help me to understand why keeping inner structs is better than what it is?
a782e33
to
f15f7e7
Compare
…deterministic ix info from transaction; Add function to sanitize and convert InstructionDetails to ComputeBudgetLimits; Issue solana-labs#2352
Co-authored-by: Andrew Fitzgerald <[email protected]>
… bug that misses counting non-builtins, that broke tests
f15f7e7
to
787326d
Compare
let mut compute_budget_instruction_details = ComputeBudgetInstructionDetails::default(); | ||
let mut builtin_instruction_details = BuiltinInstructionDetails::default(); | ||
|
||
for (i, (program_id, instruction)) in instructions.enumerate() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you try benching this approach as well?
I think this PR should just focus on the refactor and then have another PR handle the builtin instruction stuff. I'm not actually super onboard with #2212 because some people could be counting on builtin instructions giving them 200k cu's. We should move away from the 200k cu per ix logic entirely and make sure everyone specifies a cu limit in my opinion. |
Co-authored-by: Justin Starry <[email protected]>
Co-authored-by: Justin Starry <[email protected]>
Additional bench results for to_bytes
as_ref
|
At the moment, the conversation is on handling builtin stuff. Not sure if breaking pr would make difference.
When builtin is allocated 200K budget, it creates room to execute other instructions that otherwise could fail due budget limit. This inconsistency should be fixed, it could lead to overpacking blocks.
100% agree to remove 200K per ix logic, especially after all builtins now consume budgets just like other instructions; I don't see reason for cost model (banking side code) to treat builtins differently from other instructions. But "make sure everyone specifies cu limit" might take longer. My preference is to do this one first, then update cost-model to access "instructionDetails" for cu limits etc, so cost model and compute budget are on same page. Then later if we want to remove 200K cu logic, we just need to do that in one place. |
It does because typically refactors and new features are not done in the same PR |
It was intended as feat to add
|
Problem
Process instructions (for a transaction) is really 2 steps: parse tx to gather deterministic ixs info; then (optionally) use current working bank to sanitize and convert gather instruction details to something else.
An intermediate struct between these two steps can be useful.
Summary of Changes
InstructionDetails
structprocess_compute_budget_instruction()
into two steps:try_from()
to builtResult<InstructionDetails>
, andsanitize_and_convert_to_compute_budget_limits()
to getResult<ComputeBudgetLimits>
.process_compute_budget_instructions()
as a wrapper for backward compatibility, temporarily (it will be refactored out in follow up changes)Fixes #2352