-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizing trap calls and giving extra 200 fps to anyone #846
Comments
See this game-side PR for a cvar allowing the user to choose intermediate values between no prediction and heavy prediction: See this PR for a WIP not-working not-build attempt to implement a function that would allow the client to fetch all the command backups in one |
Here is a screenshot of Orbit profiling the Unvanquished game: One can see that |
Thanks to @DolceTriade fixing my engine patch, here is how behaves the patch to fetch all backup commands in one call: Before: you can clearly see on fps graph when I disable (huge fps boost) and enable prediction (huge fps drop): After: you cannot notice when I disable and enable prediction (same high fps in both case): |
So, now when It happens that the code of the lagometer also does one So, if I hack the engine to unlock fps above 1000, the default plat23 scene in lowest graphics preset never go under 1000fps on my side. If I want to be crazy and look at the sky, I get 1600fps with some peaks at 1800fps… and if I disable |
free fps for everyone! |
So, summary of what's done:
The next performance glutton is |
I never see any sky on those screenshots? Otherwise, the point is to prepare the code for an API improvement, right? What's the status of all this? |
Here the idea to look at the sky is to look at a place where there are no entities. Sky textures being rendered or not doesn't count, and especially we better want to disable sky texture rendering to compare framerate right now because sky rendering suffers from a strong performance issue (see #849), and here we're not comparing trap call performance against sky rendering performance, but trap call performance against no trap call performance.
My previous comment is the current status, it is up to date. |
So, the current implementations for batch calls are:
¹ The first one is less needed since we already have an alternative working implementation, but we may still want it to make the code less convoluted, see this comment and following: |
On game side, as far as I know every of those cvar checks are doing one trap call per frame:
|
With all those batched calls plus this:
In stupid “I look at ATCSHD outside floor with lowest preset and 640×480 resolution with cg_draw2D disabled and r_smp enabled” test the performance went from April 27: May 7: And the 3000 fps reached 10 days ago was mostly lucky, the framerate curve was far less flat, it was more averaging at Also at the time my computer was almost running nothing but Unvanquished, while right now my system is full of other software messing with the resources (the CPU load in the top-left graph is the one of the whole system). GPU is AMD Radeon PRO W7600. |
Also see: Unvanquished/Unvanquished#3157. |
So, I noticed that when I switch a local game with lowest graphics preset from prediction off and on (
cg_nopredict off
on client org_synchronousClients 0
on server), performance drops from 1000fps to 500fps.On an online game using a public server, and ultra graphics preset, disabling prediction gave me 200fps more. I actually reached 500fps on a public game.
It happens that when client-side prediction is enabled, most of the CPU time is spent in
CG_PredictPlayerState
, and in that function, most of the time is spent intrap_GetUserCmd
.There is such code in
CG_PredictPlayerState
:This code is calling
trap_GetUserCmd
63 times per frame… 😱️In engine,
CMD_BACKUP
is64
, this value was already64
at Quake3 source code release time.. Some comment that were already there at the time also said:So, despite the code always using the max, the comment says such max is not for everyone.
My first attempt to save performance without entirely disabling the prediction was to add a cvar that would only fetch a given amount of command backups. And it works. We may still use such cvar in graphics preset to do less prediction on lower one. The good thing with such patch is that it doesn't break engine compatibility.
But, but, but. I assume doing IPC is slow, very slow, compared to just running code directly in CPU cache.
On engine side, the function behind
trap_GetUserCmd
just does that:I see nothing in that function that can eat 200 or 500 fps. But well, IPC is always slower than what can do a code in CPU cache
So, I thought… What if we do a
trap_GetUserCmds
function that would fetch packets in one go? We would query the whole array of64
commands in one go (or the amount we would only want if using a cvar to customize this), in a single IPC, and then, iterate over all the commands?I tried to implement a
CL_GetUserCmds
function that does just that but my code failed to build because it missed some dedicatedWrite
function.What do you think about it? To implement the "single trap" I would need some help…
The text was updated successfully, but these errors were encountered: