Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Speedup rebase rebase #55

Closed
wants to merge 23 commits into from

Conversation

JoeMatt
Copy link
Collaborator

@JoeMatt JoeMatt commented Oct 8, 2021

Rebase of #53 as per @twinaphex requests.

There are diff's around controller input fixes that I reverted from #53 that should be looked into in a new PR.

specifically these commits

fb695b1
f4ebb99
681a1f3
34ca42f

Original PR


I'm making a PR for some old optimizations I made for an iOS fork.

This PR is more for the devs to test and cherry-pick, I don't expect this code to be "merge quality", but certainly buildable/testable at least.

For instance, probably don't want these hacks that were specific to iOS loading,
811e73f

The blitter really is the slowest part. I wanted to write something with SIMD or better vectorization or improved memcpy or something. That was by far the slowest part when I profiled this pretty heavily a while ago (at least it was excruciatingly slow on ARM without my C struct hacks to help the compiler optimize better.


Update:
Forgot to mention I have these macros defined in my build (in XCode) that need to be added to a header or C compile flags

C

#define INLINE=inline 
#define USE_STRUCTS=1

flags

-DINLINE=inline -DUSE_STRUCTS=1 -D__GCCUNIX__  

Inline macro may need to be customized based on arch. Mac OS is using Clang/C99/gnu++11 syntax for inline.

I also have unroll loops aka -qunroll=no but i forget why.

@JoeMatt JoeMatt mentioned this pull request Oct 8, 2021
@JoeMatt JoeMatt force-pushed the speedup-rebase branch 2 times, most recently from 1deda81 to ed86b6d Compare October 8, 2021 05:20
@JoeMatt
Copy link
Collaborator Author

JoeMatt commented Oct 8, 2021

@twinaphex LMK if you want the #if USE_STRUCTS parts to just be the default and delete the other code paths.

@JoeMatt
Copy link
Collaborator Author

JoeMatt commented Oct 8, 2021

CI was failing and I noticed a few rebase merge issues and a small logical compare bug, pushed 2 updates to resolve.

@lgtm-com
Copy link

lgtm-com bot commented Oct 8, 2021

This pull request fixes 2 alerts when merging d691620 into 4dade20 - view on LGTM.com

fixed alerts:

  • 2 for Comparison result is always the same

@JoeMatt
Copy link
Collaborator Author

JoeMatt commented Oct 8, 2021

#define USE_STRUCTS=1
No longer needed.

Pushed another commit to remove the older code.

@lgtm-com
Copy link

lgtm-com bot commented Oct 8, 2021

This pull request fixes 2 alerts when merging 1c12013 into 4dade20 - view on LGTM.com

fixed alerts:

  • 2 for Comparison result is always the same

@JoeMatt JoeMatt marked this pull request as draft October 8, 2021 07:28
@JoeMatt
Copy link
Collaborator Author

JoeMatt commented Oct 8, 2021

Report of bugs after my rebase, looking into it. Changing PR to a draft for now.

@JoeMatt JoeMatt changed the title Speedup rebase rebase [WIP] Speedup rebase rebase Oct 13, 2021
The bit shifting and masking is expensive on ARM64 for some reason.
The unions seem to greatly reduce the perfomance hit of these common calls.
The values for offset0 and offset1 were coming out to 63 when they should be no more than 3. I think the devide should have beena modulus? I wrote out the code with more vars to figure ouit what was going on
GPU_RUNNING running macro was pretty slow on ARM for some reason. Bitswise structs are faster in my testing
Signed-off-by: Joe Mattiello <[email protected]>
Signed-off-by: Joe Mattiello <[email protected]>
Signed-off-by: Joe Mattiello <[email protected]>
Signed-off-by: Joseph Mattello <[email protected]>
Signed-off-by: Joseph Mattello <[email protected]>
Signed-off-by: Joseph Mattello <[email protected]>
@lgtm-com
Copy link

lgtm-com bot commented Oct 13, 2021

This pull request fixes 2 alerts when merging ab5f584 into 390c44d - view on LGTM.com

fixed alerts:

  • 2 for Comparison result is always the same

@JoeMatt JoeMatt mentioned this pull request Oct 13, 2021
@JoeMatt
Copy link
Collaborator Author

JoeMatt commented Oct 14, 2021

closing for individual PRs
#56 #57 #58 #59

@JoeMatt JoeMatt closed this Oct 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant