-
-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize pg_Two(Ints/Floats)FromObj #3214
Optimize pg_Two(Ints/Floats)FromObj #3214
Conversation
pg_IntFromObjIndex doesn't take advantage of the fact that pg_TwoIntsFromObj already knows that obj is a sequence. Since obj is known to be a sequence, and negative indices are not needed, PySequence_ITEM can be used instead of PySequence_GetItem for better performance.
You didn't full channel though xD, ig this could've been expanded to be more in line with pg_TwoDoublesFromObj with the whole sequencefast treatment. |
Is sequencefast actually worth it for a two element sequence? |
I've seen 5-10% improvements. Fastcall methods that don't do much benefit the most out of the bunch. This might even speed up fblits or stuff that takes lots of positions since one could also pass only tuples as positions. |
a4d97a0
to
5cd46ea
Compare
2f5e2dd
to
480a472
Compare
Alright you've pushed me to be better, so I've turned this simple speedup PR into a better (but way more complicated) speedup PR. I got a lot of inspiration from pg_TwoDoublesFromObj, but I haven't copied that strategy exactly here for now, and I don't plan to in this PR. I've updated the PR description with my findings. |
Happy to report that this strategy is actually faster than the FastSequence strategy lol. |
I’m surprised that you say that, I thought the sequence fast way was still faster. I just thought this way was less complex and slightly safer code (like for example in the Python source it checks for List and Tuple exactly, and we just check for list and tuple non exactly). I’ve also been thinking about how we could optimize this in Pythons C API. Like would it be possible to do a function that returns elements and size and checks whether it’s a sequence? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR changes run faster for me in testing under Python 3.11. Looks about 5% faster
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the PR 🎉
This PR takes advantage of several avenues to increase the performance of pg_TwoIntsFromObj, pg_TwoFloatsFromObj, pg_IntFromObj, and pg_FloatFromObj in base.c
Approach
Taking advantage of sequence knowledge better with PySequence_ITEM:
On main, the code for pg_TwoIntsFromObj, pg_TwoFloatsFromObj doesn't take advantage of the fact that it knows
obj
is a sequence, because it passes obj into generic methods (like pg_IntFromObjIndex) that have to check again thatobj
is a sequence. When we know thatobj
is a sequence we can use a faster way of getting elements out of it, PySequence_ITEM.Faster number parsing with PyFloat_AS_DOUBLE:
If we know something is a Python float (a double), we can get the underlying double easily and with no error checking needed this way. It can then be cast to in C to whatever we want. This is the fastest way of getting a number that I tested. This is a more performant strategy than using PyFloat_AsDouble for this.
Don't do a redundant length check:
On main, if a tuple is passed in the code will use PyTuple_Size to figure out if it needs to go down the recursive codepath, then PySequence_Length to see whether it belongs at all. Instead we can use PySequence_Size (same as Length, just seems to be the preferred alias so I changed it) once at the beginning to inform those 2 other checks.
Direct Testing
I saw an average 34% improvement for pg_TwoIntsFromObj, 40% improvement for pg_TwoIntsFromObj, 17% improvement for pg_IntFromObj, and 42% improvement for pg_FloatFromObj
To test this I turned surface.get_at() into a loop that would call whatever I wanted to test with the get_at position argument 10 million times. This gave me a reliable way to test the performance of these helper functions directly and consistently.
Test script
Here is my data:
https://docs.google.com/spreadsheets/d/1WBCVvzkL9HAZJ7Yo1N86-tAFhAP0d2J72Wcp8mveCl4/edit?usp=sharing
Indirect Testing
This speedup is significant enough to be seen in actual API that uses these functions internally. Here are some random functions I tested.
This is a micro optimization, but I think it is a worthwhile thing to really scrutinize, as these functions are used all over the place. (Channeling my inner @itzpr3d4t0r on this PR)