-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Speedup rebase rebase #55
Conversation
1deda81
to
ed86b6d
Compare
@twinaphex LMK if you want the #if USE_STRUCTS parts to just be the default and delete the other code paths. |
2daa618
to
d691620
Compare
CI was failing and I noticed a few rebase merge issues and a small logical compare bug, pushed 2 updates to resolve. |
This pull request fixes 2 alerts when merging d691620 into 4dade20 - view on LGTM.com fixed alerts:
|
Pushed another commit to remove the older code. |
This pull request fixes 2 alerts when merging 1c12013 into 4dade20 - view on LGTM.com fixed alerts:
|
Report of bugs after my rebase, looking into it. Changing PR to a draft for now. |
The bit shifting and masking is expensive on ARM64 for some reason. The unions seem to greatly reduce the perfomance hit of these common calls.
The values for offset0 and offset1 were coming out to 63 when they should be no more than 3. I think the devide should have beena modulus? I wrote out the code with more vars to figure ouit what was going on
GPU_RUNNING running macro was pretty slow on ARM for some reason. Bitswise structs are faster in my testing
Signed-off-by: Joe Mattiello <[email protected]>
Signed-off-by: Joe Mattiello <[email protected]>
Signed-off-by: Joe Mattiello <[email protected]>
Signed-off-by: Joseph Mattello <[email protected]>
Signed-off-by: Joseph Mattello <[email protected]>
Signed-off-by: Joseph Mattello <[email protected]>
Signed-off-by: Joseph Mattello <[email protected]>
1c12013
to
ab5f584
Compare
This pull request fixes 2 alerts when merging ab5f584 into 390c44d - view on LGTM.com fixed alerts:
|
Rebase of #53 as per @twinaphex requests.
There are diff's around controller input fixes that I reverted from #53 that should be looked into in a new PR.
specifically these commits
fb695b1
f4ebb99
681a1f3
34ca42f
Original PR
I'm making a PR for some old optimizations I made for an iOS fork.
This PR is more for the devs to test and cherry-pick, I don't expect this code to be "merge quality", but certainly buildable/testable at least.
For instance, probably don't want these hacks that were specific to iOS loading,
811e73f
The blitter really is the slowest part. I wanted to write something with SIMD or better vectorization or improved memcpy or something. That was by far the slowest part when I profiled this pretty heavily a while ago (at least it was excruciatingly slow on ARM without my C struct hacks to help the compiler optimize better.
Update:
Forgot to mention I have these macros defined in my build (in XCode) that need to be added to a header or C compile flags
C
flags
Inline macro may need to be customized based on arch. Mac OS is using Clang/C99/gnu++11 syntax for inline.
I also have
unroll loops
aka-qunroll=no
but i forget why.