Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize pg_Two(Ints/Floats)FromObj #3214

Merged

Conversation

Starbuck5
Copy link
Member

@Starbuck5 Starbuck5 commented Nov 4, 2024

This PR takes advantage of several avenues to increase the performance of pg_TwoIntsFromObj, pg_TwoFloatsFromObj, pg_IntFromObj, and pg_FloatFromObj in base.c

Approach

  • Taking advantage of sequence knowledge better with PySequence_ITEM:
    On main, the code for pg_TwoIntsFromObj, pg_TwoFloatsFromObj doesn't take advantage of the fact that it knows obj is a sequence, because it passes obj into generic methods (like pg_IntFromObjIndex) that have to check again that obj is a sequence. When we know that obj is a sequence we can use a faster way of getting elements out of it, PySequence_ITEM.

  • Faster number parsing with PyFloat_AS_DOUBLE:
    If we know something is a Python float (a double), we can get the underlying double easily and with no error checking needed this way. It can then be cast to in C to whatever we want. This is the fastest way of getting a number that I tested. This is a more performant strategy than using PyFloat_AsDouble for this.

  • Don't do a redundant length check:
    On main, if a tuple is passed in the code will use PyTuple_Size to figure out if it needs to go down the recursive codepath, then PySequence_Length to see whether it belongs at all. Instead we can use PySequence_Size (same as Length, just seems to be the preferred alias so I changed it) once at the beginning to inform those 2 other checks.

Direct Testing

I saw an average 34% improvement for pg_TwoIntsFromObj, 40% improvement for pg_TwoIntsFromObj, 17% improvement for pg_IntFromObj, and 42% improvement for pg_FloatFromObj

To test this I turned surface.get_at() into a loop that would call whatever I wanted to test with the get_at position argument 10 million times. This gave me a reliable way to test the performance of these helper functions directly and consistently.

Test script

import time

import pygame
pygame.init()

s = pygame.Surface((20,20))

def run_test(test_name, test_obj):
    start = time.time()
    s.get_at(test_obj)
    print(test_name + ":\t", time.time() - start)

run_test("int tuple", (2,15))
run_test("int list", [2,15])
run_test("float tuple", (2.0,15.0))
run_test("float list", [2.0,15.0])
run_test("vector2", pygame.Vector2(2,15))
run_test("mixed tuple", (2.0,15))
run_test("mixed list", [2.0,15])
run_test("int tuple x1", ((2,15),))
run_test("int list x1", ([2,15],))
run_test("float tuple x1", ((2.0,15.0),))
run_test("float list x1", ([2.0,15.0],))
run_test("mixed tuple x1", ((2.0,15),))
run_test("mixed list x1", ([2.0,15],))

#For pg_IntFromObj and pg_FloatFromObj
#run_test("int", 33)
#run_test("float", 33.2)

Here is my data:
https://docs.google.com/spreadsheets/d/1WBCVvzkL9HAZJ7Yo1N86-tAFhAP0d2J72Wcp8mveCl4/edit?usp=sharing

Indirect Testing
This speedup is significant enough to be seen in actual API that uses these functions internally. Here are some random functions I tested.

import time
import pygame
pygame.init()
iterations = 10000000

def run_test(test_name, test_obj, test_arg):
    start = time.time()
    for _ in range(iterations):
        test_obj(test_arg)
    print(test_name + ":\t", time.time() - start)

r = pygame.Rect(1,2,3,4)
fr = pygame.FRect(1,2,3,4)
run_test("r_move_ip", r.move_ip, (2,3))
run_test("fr_move_ip", fr.move_ip, (2.0,3.2))
run_test("r_collidep", r.collidepoint, (2,3))

# Before seconds -> After seconds  (Percent improvement)
# r_move_ip 0.63 -> 0.51 (19%)
# fr_move_ip 0.45 -> 0.38 (16%)
# r_collidep 0.63 -> 0.52 (17%)

This is a micro optimization, but I think it is a worthwhile thing to really scrutinize, as these functions are used all over the place. (Channeling my inner @itzpr3d4t0r on this PR)

pg_IntFromObjIndex doesn't take advantage of the fact that pg_TwoIntsFromObj already knows that obj is a sequence. Since obj is known to be a sequence, and negative indices are not needed, PySequence_ITEM can be used instead of PySequence_GetItem for better performance.
@Starbuck5 Starbuck5 added the Performance Related to the speed or resource usage of the project label Nov 4, 2024
@Starbuck5 Starbuck5 requested a review from a team as a code owner November 4, 2024 06:02
@itzpr3d4t0r
Copy link
Member

You didn't full channel though xD, ig this could've been expanded to be more in line with pg_TwoDoublesFromObj with the whole sequencefast treatment.

@Starbuck5
Copy link
Member Author

Is sequencefast actually worth it for a two element sequence?

@itzpr3d4t0r
Copy link
Member

I've seen 5-10% improvements. Fastcall methods that don't do much benefit the most out of the bunch. This might even speed up fblits or stuff that takes lots of positions since one could also pass only tuples as positions.

@Starbuck5 Starbuck5 marked this pull request as draft November 6, 2024 08:50
@Starbuck5 Starbuck5 force-pushed the optimize-pg-two-things-from-obj branch from a4d97a0 to 5cd46ea Compare November 8, 2024 08:03
@Starbuck5 Starbuck5 force-pushed the optimize-pg-two-things-from-obj branch from 2f5e2dd to 480a472 Compare November 8, 2024 08:03
@Starbuck5 Starbuck5 marked this pull request as ready for review November 8, 2024 08:35
@Starbuck5
Copy link
Member Author

Starbuck5 commented Nov 8, 2024

Alright you've pushed me to be better, so I've turned this simple speedup PR into a better (but way more complicated) speedup PR. I got a lot of inspiration from pg_TwoDoublesFromObj, but I haven't copied that strategy exactly here for now, and I don't plan to in this PR.

I've updated the PR description with my findings.

@itzpr3d4t0r
Copy link
Member

Happy to report that this strategy is actually faster than the FastSequence strategy lol.

@Starbuck5
Copy link
Member Author

I’m surprised that you say that, I thought the sequence fast way was still faster. I just thought this way was less complex and slightly safer code (like for example in the Python source it checks for List and Tuple exactly, and we just check for list and tuple non exactly).

I’ve also been thinking about how we could optimize this in Pythons C API. Like would it be possible to do a function that returns elements and size and checks whether it’s a sequence?

Copy link
Member

@MyreMylar MyreMylar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR changes run faster for me in testing under Python 3.11. Looks about 5% faster

👍

Copy link
Member

@ankith26 ankith26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the PR 🎉

@ankith26 ankith26 added this to the 2.5.3 milestone Dec 12, 2024
@ankith26 ankith26 merged commit 6026d1e into pygame-community:main Dec 12, 2024
25 checks passed
@Starbuck5 Starbuck5 deleted the optimize-pg-two-things-from-obj branch December 13, 2024 06:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Related to the speed or resource usage of the project
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants