[CELEBORN-1815] Support UnpooledByteBufAllocator #3043

pan3793 · 2024-12-31T15:11:14Z

What changes were proposed in this pull request?

This PR introduces a configuration celeborn.network.memory.allocator.pooled to allow users to disable PooledByteBufAllocator globally and always use UnpooledByteBufAllocator.

Why are the changes needed?

In some extreme cases, the Netty's PooledByteBufAllocator might have tons of 4MiB chunks but only a few sizes of the capacity are used by the real data(see #3018), for scenarios that stability is important than performance, it's desirable to allow users to disable the PooledByteBufAllocator globally.

Does this PR introduce any user-facing change?

Add a new feature, disabled by default.

How was this patch tested?

Pass UT to ensure correctness. Performance and memory impact need to be verified in the production scale cluster.

pan3793 · 2024-12-31T15:14:48Z

common/src/main/java/org/apache/celeborn/common/network/util/NettyMemoryMetrics.java

+        source.addGauge(
+            MetricRegistry.name(metricPrefix, "usedDirectMemory"),
+            labels,
+            unpooledMetric::usedDirectMemory);


UnpooledByteBufAllocator only supports these two metrics

pan3793 · 2024-12-31T15:20:13Z

...mmon/src/main/java/org/apache/celeborn/plugin/flink/network/FlinkTransportClientFactory.java

@@ -45,7 +45,7 @@ public FlinkTransportClientFactory(
      TransportContext context, List<TransportClientBootstrap> bootstraps, int bufferSizeBytes) {
    super(context, bootstraps);
    bufferSuppliers = JavaUtils.newConcurrentHashMap();
-    this.pooledAllocator = new UnpooledByteBufAllocator(true);
+    this.allocator = new UnpooledByteBufAllocator(true);


Interesting... Flink client overrides it to forcibly use UnpooledByteBufAllocator

This is related to this PR. The Flink client's memory is quite limited.
#1324

@FMX Thanks for the information. BTW, it's better to leave some brief comments to explain the special logic

cfmcgrady · 2024-12-31T16:23:14Z

common/src/main/java/org/apache/celeborn/common/network/util/NettyUtils.java

-      boolean allowDirectBufs, boolean allowCache, int numCores) {
-    if (numCores == 0) {
-      numCores = Runtime.getRuntime().availableProcessors();
+  private static ByteBufAllocator createByteBufAllocator(


The method comments should also be updated

update: move the existing comment to the caller side, add new comments to explain each parameters

cfmcgrady

LGTM. except a nit.

codecov · 2025-01-01T17:25:30Z

Codecov Report

Attention: Patch coverage is 28.39506% with 58 lines in your changes missing coverage. Please review.

Project coverage is 32.67%. Comparing base (fde6365) to head (b3d569a).
Report is 11 commits behind head on main.

Files with missing lines	Patch %	Lines
...leborn/common/network/util/NettyMemoryMetrics.java	0.00%	42 Missing ⚠️
...pache/celeborn/common/network/util/NettyUtils.java	40.75%	9 Missing and 7 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3043      +/-   ##
==========================================
- Coverage   32.68%   32.67%   -0.00%     
==========================================
  Files         336      336              
  Lines       20032    20053      +21     
  Branches     1792     1796       +4     
==========================================
+ Hits         6546     6551       +5     
- Misses      13123    13137      +14     
- Partials      363      365       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

FMX

LGTM.

pan3793 · 2025-01-02T12:55:18Z

Thanks, merged to main(0.6)

[CELEBORN-1815] Support UnpooledByteBufAllocator

e2de372

pan3793 requested review from AngersZhuuuu, FMX and cfmcgrady December 31, 2024 15:11

pan3793 commented Dec 31, 2024

View reviewed changes

fix flink

7dc3531

pan3793 commented Dec 31, 2024

View reviewed changes

cfmcgrady reviewed Dec 31, 2024

View reviewed changes

cfmcgrady approved these changes Dec 31, 2024

View reviewed changes

comment

b3d569a

FMX approved these changes Jan 2, 2025

View reviewed changes

pan3793 closed this in a318eb4 Jan 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CELEBORN-1815] Support UnpooledByteBufAllocator #3043

[CELEBORN-1815] Support UnpooledByteBufAllocator #3043

pan3793 commented Dec 31, 2024 •

edited

Loading

pan3793 Dec 31, 2024

pan3793 Dec 31, 2024

FMX Jan 2, 2025

pan3793 Jan 2, 2025

cfmcgrady Dec 31, 2024

pan3793 Jan 1, 2025

cfmcgrady left a comment

codecov bot commented Jan 1, 2025 •

edited

Loading

FMX left a comment

pan3793 commented Jan 2, 2025

[CELEBORN-1815] Support UnpooledByteBufAllocator #3043

[CELEBORN-1815] Support UnpooledByteBufAllocator #3043

Conversation

pan3793 commented Dec 31, 2024 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

pan3793 Dec 31, 2024

Choose a reason for hiding this comment

pan3793 Dec 31, 2024

Choose a reason for hiding this comment

FMX Jan 2, 2025

Choose a reason for hiding this comment

pan3793 Jan 2, 2025

Choose a reason for hiding this comment

cfmcgrady Dec 31, 2024

Choose a reason for hiding this comment

pan3793 Jan 1, 2025

Choose a reason for hiding this comment

cfmcgrady left a comment

Choose a reason for hiding this comment

codecov bot commented Jan 1, 2025 • edited Loading

Codecov Report

FMX left a comment

Choose a reason for hiding this comment

pan3793 commented Jan 2, 2025

pan3793 commented Dec 31, 2024 •

edited

Loading

codecov bot commented Jan 1, 2025 •

edited

Loading