optimization around get for raw usage #44

danmayer · 2025-02-04T00:33:58Z

OK, starting to hit a wall with most of the profile and benchmark data, but we will talk more with some YJIT folks next week.

This is a small win, but does show up in the profiles and benchmarks, and should be more significant with remote vs local memcached servers as it reduces the total bytes on the socket write and read. This performance tweak only works on the raw: true client, which is what we use.

multi_get goes from 1.27x slower to 1.20x slower than raw socket
get goes from 1.17x slower to 1.14x slower than raw socket

multi_get before:

❯❯❯$ BENCH_TARGET=get_multi RUBY_YJIT_ENABLE=1 bundle exec bin/benchmark                                      <bundler> [faster_get]
yjit: true
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
        get 100 keys    33.000 i/100ms
get 100 keys raw sock
                        39.000 i/100ms
Calculating -------------------------------------
        get 100 keys    333.944 (± 4.8%) i/s    (2.99 ms/i) -      3.333k in  10.003634s
get 100 keys raw sock
                        422.941 (± 7.3%) i/s    (2.36 ms/i) -      4.212k in  10.009880s

Comparison:
get 100 keys raw sock:      422.9 i/s
        get 100 keys:      333.9 i/s - 1.27x  slower

multi_get after:

❯❯❯$ BENCH_TARGET=get_multi RUBY_YJIT_ENABLE=1 bundle exec bin/benchmark                                    <bundler> [faster_get ●]
yjit: true
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
        get 100 keys    33.000 i/100ms
get 100 keys raw sock
                        42.000 i/100ms
Calculating -------------------------------------
        get 100 keys    344.285 (± 5.2%) i/s    (2.90 ms/i) -      3.465k in  10.092004s
get 100 keys raw sock
                        412.262 (± 5.3%) i/s    (2.43 ms/i) -      4.116k in  10.012749s

Comparison:
get 100 keys raw sock:      412.3 i/s
        get 100 keys:      344.3 i/s - 1.20x  slower

and get before:

❯❯❯$ BENCH_TARGET=get RUBY_YJIT_ENABLE=1 bundle exec bin/benchmark                                            <bundler> [faster_get]
yjit: true
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
           get dalli   352.000 i/100ms
            get sock   399.000 i/100ms
get sock non-blocking
                       342.000 i/100ms
Calculating -------------------------------------
           get dalli      3.436k (± 5.1%) i/s  (291.01 μs/i) -     34.496k in  10.066406s
            get sock      4.016k (± 3.0%) i/s  (248.99 μs/i) -     40.299k in  10.042930s
get sock non-blocking
                          3.448k (± 5.9%) i/s  (290.00 μs/i) -     34.542k in  10.046747s

Comparison:
            get sock:     4016.2 i/s
get sock non-blocking:     3448.3 i/s - 1.16x  slower
           get dalli:     3436.3 i/s - 1.17x  slower

and get after:

❯❯❯$ BENCH_TARGET=get RUBY_YJIT_ENABLE=1 bundle exec bin/benchmark                                          <bundler> [faster_get ●]
yjit: true
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
           get dalli   353.000 i/100ms
            get sock   397.000 i/100ms
get sock non-blocking
                       356.000 i/100ms
Calculating -------------------------------------
           get dalli      3.540k (± 3.5%) i/s  (282.45 μs/i) -     35.653k in  10.082393s
            get sock      4.028k (± 3.5%) i/s  (248.25 μs/i) -     40.494k in  10.064542s
get sock non-blocking
                          3.548k (± 4.3%) i/s  (281.86 μs/i) -     35.600k in  10.050622s

Comparison:
            get sock:     4028.3 i/s
get sock non-blocking:     3547.8 i/s - 1.14x  slower
           get dalli:     3540.5 i/s - 1.14x  slower

danmayer · 2025-02-04T00:36:42Z

lib/dalli/protocol/meta.rb


+        post_get_req = optimized_for_raw ? "v k q\r\n" : "v f k q\r\n"


by not having the f we save 2 bytes on all 100 get calls and we remove the bytes that includes the bitflags in the response, which in this case we do not want or need.

danmayer · 2025-02-04T00:37:45Z

lib/dalli/protocol/meta.rb

@@ -68,25 +74,42 @@ def read_multi_req(keys)
          # VA value_length flags key
          tokens = line.split
          value = @connection_manager.read_exact(tokens[1].to_i)
+          bitflags = optimized_for_raw ? 0 : @response_processor.bitflags_from_tokens(tokens)


since we didn't ask for the response to have fit flags they will not be there and will be 0, skip all the parsing. across the whole batch.

grcooper

Are there test for this?

danmayer · 2025-02-04T16:46:41Z

mostly it was driven by the benchmark and profile, I can add a test

optimization around get for raw usage

32a13b4

danmayer requested review from grcooper and nickamorim February 4, 2025 00:35

danmayer commented Feb 4, 2025

View reviewed changes

grcooper reviewed Feb 4, 2025

View reviewed changes

danmayer added 2 commits February 4, 2025 09:55

add tests ensuring the raw optimized code flow is used

0f47fd5

fix flaky test

7b0877e

danmayer requested a review from grcooper February 4, 2025 17:30

grcooper approved these changes Feb 4, 2025

View reviewed changes

danmayer merged commit 2351639 into main Feb 4, 2025
14 checks passed

danmayer deleted the faster_get_multi_get branch February 4, 2025 18:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimization around get for raw usage #44

optimization around get for raw usage #44

danmayer commented Feb 4, 2025 •

edited

Loading

danmayer Feb 4, 2025

danmayer Feb 4, 2025

grcooper left a comment

danmayer commented Feb 4, 2025


		post_get_req = optimized_for_raw ? "v k q\r\n" : "v f k q\r\n"

optimization around get for raw usage #44

optimization around get for raw usage #44

Conversation

danmayer commented Feb 4, 2025 • edited Loading

danmayer Feb 4, 2025

Choose a reason for hiding this comment

danmayer Feb 4, 2025

Choose a reason for hiding this comment

grcooper left a comment

Choose a reason for hiding this comment

danmayer commented Feb 4, 2025

danmayer commented Feb 4, 2025 •

edited

Loading