-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reorder instructions to delay read after load #118
Comments
The composition of bvp-to-word is complicated and difficult to navigate, but we want to add a few simple optimisation passes that run right after translation into wordLang and before register allocation. The purpose of word_simp is to be an understandable entry point into this part of the compiler. The correctness requirements for word_simp are strict and simple: evaluate (prog,s) = (res,s2) /\ res <> SOME Error ==> evaluate (word_simp$compile_exp prog,s) = (res,s2) extract_labels (word_simp$compile_exp p) = extract_labels p The current implementation of word_simp is the identity function, but will be expanded to include at least: - clean up of wordLang programs: delete Skips and make Seq associate to left so that pattern matching is easier in other phases. I'm working on this. - combine certain consequtive If statements (issue #103) I'm working on this. - move load instructions upwards (issue #118 ?) Someone else might be working on this.
I agree that one should try to insert instructions between memory loads and the first use of the loaded value, i.e.:
However, I don't understand the part about "before conditionals". Looking at the link, I suspect the "before conditionals"-part might be a too literal reading of my example in the email. It's the first use of the loaded value that is important, not that it was part of a conditional expression. |
I updated the issue summary and name accordingly. |
@myreen Ramana and I took a look at this and I think this particular optimization is better done at the asm level. Anything we're missing? |
I think the asm level (I suspect you mean LabLang) is way too late for this optimisation. At LabLang you cannot see the difference between stack accesses and general memory accesses. Furthermore, the register allocator might have unknowingly picked register names that prevent the optimisation. The optimisation should be done before register allocation so that the allocator hasn't produced constraints that prevent reordering. A similar optimisation could be done in StackLang for stack loads. They should be pushed upwards. Note that this also needs to be done at StackLang before stack accesses look like general memory accesses. |
Three comments: This looks like a simple instruction scheduling optimisation to me. The typical idea is that if you do it before register allocation, it might increase live ranges and result in a worse allocation. If you do it after, there are false dependencies that impede the scheduling. I would guess on a modern out-of-order architecture that the hardware's issue buffer will easily do this optimisation for you, especially if you aren't moving the load very far. I wouldn't expect to see any performance gains on an x86 desktop/laptop. Is there a class of ARMs that are pipelined, but not out-of-order? I'd doubt it: on all of their multi-cores, speculative execution is observable. Would an embedded device be pipelined? Why does it matter if you can tell the difference between stack accesses and other accesses to memory? I guess stack accesses are likely to hit in L1 cache, but otherwise they are just normal loads, right? |
Even the simplest ARM processors (M0) have pipelines. I doubt there are processors without pipelines nowadays. I agree that on x86 this optimisation will in all likelihood be done on the fly, but small ARMs are not terrible smart with instruction reordering, as far as I know. The difference between stack and heap accesses: In the semantics of StackLang, it is obvious that you can always move a stack access independently of heap accesses. Once you are in LabLang, all of them become opaque memory accesses and thus reordering might be impossible. That's why it makes sense to do the reordering while they are still proper stack accesses in the semantics. In case @cmr wants to look at something else then here are some suggestions:
|
I have created new issues for the above suggestions, and I encourage you to create more if you have more. It's good to have simple tasks like this for newcomers to get an introduction to CakeML proofs. I will update the website too, to point to them (and we might consider an issue label to mark such issues, both so newbies can find them and so experienced hackers don't mop them up too quickly). The issues are #129, #130, and #131. |
This issue would benefit from some empirical experimentation first, e.g., looking at explorer output. |
Summary: in
wordLang
, moving one or two instructions to be after memory loads before they are first read, if possible, can improve performance on pipelined architectures.More details:
https://lists.cakeml.org/pipermail/users/2016-April/000088.html
I believe @cmr may be looking into this.
The text was updated successfully, but these errors were encountered: