Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance in benchmarks #264

Closed
jtjeferreira opened this issue May 17, 2021 · 12 comments
Closed

Performance in benchmarks #264

jtjeferreira opened this issue May 17, 2021 · 12 comments

Comments

@jtjeferreira
Copy link

Hi!

As discussed in gitter I am opening an issue to document some findings about the zio-grpc performance in this benchmark. I started this journey investigating why the akka-grpc results were so bad (https://discuss.lightbend.com/t/akka-grpc-performance-in-benchmarks/8236/) but then got curious what would be the numbers for other implementations...

The zio-grpc implementation of the benchmark was done in this PR and the results I got were

Benchmark info:
37a7f8b Mon, 17 May 2021 16:06:05 +0100 João Ferreira scala zio-grpc implementatio
Benchmarks run: scala_fs2_bench scala_akka_bench scala_zio_bench java_hotspot_grpc_pgc_bench
GRPC_BENCHMARK_DURATION=50s
GRPC_BENCHMARK_WARMUP=5s
GRPC_SERVER_CPUS=3
GRPC_SERVER_RAM=512m
GRPC_CLIENT_CONNECTIONS=50
GRPC_CLIENT_CONCURRENCY=1000
GRPC_CLIENT_QPS=0
GRPC_CLIENT_CPUS=9
GRPC_REQUEST_PAYLOAD=100B
-----
Benchmark finished. Detailed results are located in: results/211705T162018
--------------------------------------------------------------------------------------------------------------------------------
| name               |   req/s |   avg. latency |        90 % in |        95 % in |        99 % in | avg. cpu |   avg. memory |
--------------------------------------------------------------------------------------------------------------------------------
| java_hotspot_grpc_pgc |   59884 |       16.19 ms |       40.65 ms |       54.12 ms |       88.15 ms |  256.21% |     204.7 MiB |
| scala_akka         |    7031 |      141.70 ms |      281.35 ms |      368.74 ms |      592.53 ms |  294.91% |    175.44 MiB |
| scala_fs2          |    7005 |      142.20 ms |      231.57 ms |      266.35 ms |      357.07 ms |  274.57% |    351.34 MiB |
| scala_zio          |    6835 |      145.74 ms |      207.45 ms |      218.25 ms |      266.37 ms |  242.61% |    241.43 MiB |
--------------------------------------------------------------------------------------------------------------------------------
@jtjeferreira
Copy link
Author

The flamegraphs are available here https://drive.google.com/drive/folders/1Ef8nc_t01O8pTuD4eCiNk3J1WFPGJAms but the only obvious thing is the outside TLAB allocations caused by scalapb.zio_grpc.server.ZServerCall#sendMessage

image

@thesamet
Copy link
Contributor

Thanks! I'm trying to interpret the flamegraph. It looks like the majority of the time is spent in grpc itself writing the responses to the output steam (from sendMessage and above). That's unexpected given that that this is a shared call path across all implementation, while grpc-java performs significantly faster.

@jtjeferreira
Copy link
Author

Here are some more details from the JFR recording.

GC is definitely a problem:

image

Threads:

image

Memory:

image

Top contender is zio.internal.FiberContext.evaluateNow/zio.internal.SingleThreadedRingBuffer.<init>

but also scalapb.zio_grpc.server.ZServerCall#sendMessage which I meantioned above

image

@thesamet
Copy link
Contributor

Possibly the benchmarks can improve substantially if the heap is more generous (via -Xmx) or a different garbage collector is used.

@jtjeferreira
Copy link
Author

Possibly the benchmarks can improve substantially if the heap is more generous (via -Xmx) or a different garbage collector is used.

true, I am running the benchmarks with GRPC_SERVER_RAM=512m and with -XX:MinRAMPercentage=70 -XX:MaxRAMPercentage=70". OTOH the java benchmark is running with that same GC and settings

@jtjeferreira
Copy link
Author

But maybe there are ways to avoid all those allocations:

  • SingleThreadedRingBuffer is responsible for 14% of the inside TLAB allocations...
  • and sendMessage is responsible for 80% of the outside TLAB allocations...

@thesamet
Copy link
Contributor

Isn't sendMessage shared function that the Java implementation uses too?

It's reasonable to expect to have the ZIO implementation perform significantly more allocations per RPC. There's a number of grpc-java API calls per request. Each call is wrapped in an effect and these effects are chained with ZIO combinators.

@jtjeferreira
Copy link
Author

So after “wasting” all these hours profiling, I noticed that the heap settings were not being applied. After changing that, the results are a bit better.

https://discuss.lightbend.com/t/akka-grpc-performance-in-benchmarks/8236/14

@thesamet
Copy link
Contributor

Thanks for working on it! It looks like applying the setting also improved the plain Java benchmark so we now have a higher target to chase :)

@jtjeferreira
Copy link
Author

It looks like applying the setting also improved the plain Java benchmark so we now have a higher target to chase :)

No. this was only a problem in the scala implementations

@jtjeferreira
Copy link
Author

Hi again,

just wanted to share some details about the updated benchmark now with the right heap settings

image

The left side corresponds to the ServiceBuilder Executor threads, the middle side to netty threads, and the right side to zio threads. For comparison here is the java flamegraph, where we can see the netty side is very similar, but less threads are used.

image

Something that I noticed is the use of unsafeRun which I wonder if could be done in another way.

This flamegraph is very similar to fs2-grpc, and they also use "unsafeRun" typelevel/fs2-grpc#386 (comment)

@thesamet
Copy link
Contributor

thesamet commented Feb 5, 2023

Closing due to inactivity. A few changes introduced recently that we expect would result in better performance. Would be great to have an update on this when you get a chance.

@thesamet thesamet closed this as completed Feb 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants