Memory usage optimization: discard compute closure after evaluation i…

…n Lazy (#214) This PR implements a memory usage optimization related to `Lazy`: once we have evaluated a Lazy value, we no longer need to retain the closure that performed the evaluation. Discarding closures (and, transitively, their dependencies) after lazy evaluation saves memory. ## Motivation and discovery of this issue We have a single jsonnet file which has very high peak memory requirements during evaluation. I captured a heap dump and noticed significant memory usage due to `sjsonnet.Lazy[]`, with over half the shallow size consumed by `Lazy[]` arrays: ![image](https://github.com/user-attachments/assets/b09f5011-a81a-40fe-af66-cf364f7456de) From the merged paths view, we see that most of these are retained from anonymous classes: ![image](https://github.com/user-attachments/assets/5d8dd12a-e0d1-4472-8cf7-0117f8baf030) For example, here `sjsonnet.Evaluator$$anonfun$visitAsLazy$2` corresponds to the `() => visitExpr(e)` in https://github.com/databricks/sjsonnet/blob/759cea713bc8e39f45b844a75358fd780f75d480/sjsonnet/src/sjsonnet/Evaluator.scala#L81-L84 That is defining an anonymous implementation of [sjsonnet.Lazy](https://github.com/databricks/sjsonnet/blob/759cea713bc8e39f45b844a75358fd780f75d480/sjsonnet/src/sjsonnet/Val.scala#L19-L26) using the [single abstract method type](https://bargsten.org/scala/anonymous-class-and-sam/) syntax. Here, [visitExpr](https://github.com/databricks/sjsonnet/blob/759cea713bc8e39f45b844a75358fd780f75d480/sjsonnet/src/sjsonnet/Val.scala#L551-L552) takes an implicit [ValScope](https://github.com/databricks/sjsonnet/blob/759cea713bc8e39f45b844a75358fd780f75d480/sjsonnet/src/sjsonnet/ValScope.scala#L5-L20) parameter. ValScope is a [value class](https://docs.scala-lang.org/overviews/core/value-classes.html), wrapping a `bindings: Array[Lazy]`. Thus, our `sjsonnet.Evaluator$$anonfun$visitAsLazy$2` anonymous class captures the values needed to evaluate the `visirExpr(e)` closure, capturing the `bindings` array and thereby contributing to the high count of retained `Array[Lazy]`. We can also see this from the object inspection in the heap dump: ![image](https://github.com/user-attachments/assets/7469bfd1-4c08-4a21-ab20-661614abf7dd) ## Insight behind the optimization: we don't need the closure after evaluation In the heap dump that I looked at, most of these anonymous Lazy instances had non-null `cached` fields, meaning that their lazy values had been computed. At this point the value will not be recomputed so we can discard the closure, and, transitively discard its heavyweight bindings, which in turn reference more closures, and bindings, and so on. I also draw inspiration (but not implementation) from a neat behavior of Scala `lazy val`s where class constructor parameters that are used exclusively in `lazy val` are discarded after successful lazy evaluation. For instance, the code ```scala class Lazy(f: () => String) { lazy val value = f() } ``` decompiles (via [cfr](https://www.benf.org/other/cfr/)) into: ```java public class Lazy { private String value; private Function0<String> f; private volatile boolean bitmap$0; private String value$lzycompute() { Lazy lazy = this; synchronized (lazy) { if (!this.bitmap$0) { this.value = (String)this.f.apply(); this.bitmap$0 = true; } } this.f = null; return this.value; } public String value() { if (!this.bitmap$0) { return this.value$lzycompute(); } return this.value; } public Lazy(Function0<String> f) { this.f = f; } } ``` demonstrating how the closure is discarded after lazy evaluation. ## Implementation This PR implements a similar optimization. I introduce a new class ` LazyWithComputeFunc(@volatile private[this] var computeFunc: () => Val) extends Lazy` which can be used in place of the anonymous classes and which discards the closure after evaluation. The underlying implementation is slightly tricky because of a few performance considerations: - It's really important that we optimize performance for the case where a value is computed once and read many times. Previously, commit d26e9db previously took pains to ensure that `force` was monomorphic, so I couldn't change that. - Similarly, I don't to introduce any use of locks synchronization, nor any volatile accesses on hot paths. - Finally, I must ensure that the code is thread safe: it's acceptable to redundantly compute a value if we have concurrent initialization, but we can't NPE or otherwise crash. Here, I have chosen to make `computeFunc` a volatile field and check it inside of `compute()`. In ordinary cases, `compute()` will only be called once because `force` checks whether `cached` has already been computed. In the rare case of concurrent calls to `compute()`, we first check whether `computeFunc` has been nulled: if it is null then some other thread computed and cached a value (assigned from within `compute()` itself) and that other thread's write is guaranteed to be visible to the race-condition-losing thread because the volatile read of `computeFunc` provides piggybacked visibility of writes from the other racing thread (see https://stackoverflow.com/a/8769692). ## Testing This passes existing unit tests. I did some manual heap dump tests to validate that this cuts memory usage on small toy examples. I have not yet run this end-to-end on the real workload which generated the original heap dumps.
databricks · Dec 3, 2024 · 44abc72 · 44abc72
1 parent d5326c6
commit 44abc72
Show file tree

Hide file tree

Showing 3 changed files with 26 additions and 8 deletions.
diff --git a/sjsonnet/src/sjsonnet/Evaluator.scala b/sjsonnet/src/sjsonnet/Evaluator.scala
@@ -80,7 +80,7 @@ class Evaluator(resolver: CachedResolver,
 
   def visitAsLazy(e: Expr)(implicit scope: ValScope): Lazy = e match {
     case v: Val => v
-    case e => () => visitExpr(e)
+    case e => new LazyWithComputeFunc(() => visitExpr(e))
   }
 
   def visitValidId(e: ValidId)(implicit scope: ValScope): Val = {
@@ -104,7 +104,7 @@ class Evaluator(resolver: CachedResolver,
           val b = bindings(i)
           newScope.bindings(base+i) = b.args match {
             case null => visitAsLazy(b.rhs)(newScope)
-            case argSpec => () => visitMethod(b.rhs, argSpec, b.pos)(newScope)
+            case argSpec => new LazyWithComputeFunc(() => visitMethod(b.rhs, argSpec, b.pos)(newScope))
           }
           i += 1
         }
@@ -490,9 +490,9 @@ class Evaluator(resolver: CachedResolver,
       val b = bindings(i)
       arrF(i) = b.args match {
         case null =>
-          (self: Val.Obj, sup: Val.Obj) => () => visitExpr(b.rhs)(scope(self, sup))
+          (self: Val.Obj, sup: Val.Obj) => new LazyWithComputeFunc(() => visitExpr(b.rhs)(scope(self, sup)))
         case argSpec =>
-          (self: Val.Obj, sup: Val.Obj) => () => visitMethod(b.rhs, argSpec, b.pos)(scope(self, sup))
+          (self: Val.Obj, sup: Val.Obj) => new LazyWithComputeFunc(() => visitMethod(b.rhs, argSpec, b.pos)(scope(self, sup)))
       }
       i += 1
     }
@@ -564,7 +564,7 @@ class Evaluator(resolver: CachedResolver,
             case null =>
               visitAsLazy(b.rhs)(newScope)
             case argSpec =>
-              () => visitMethod(b.rhs, argSpec, b.pos)(newScope)
+              new LazyWithComputeFunc(() => visitMethod(b.rhs, argSpec, b.pos)(newScope))
           }
           i += 1
           j += 1

diff --git a/sjsonnet/src/sjsonnet/Materializer.scala b/sjsonnet/src/sjsonnet/Materializer.scala
@@ -73,7 +73,7 @@ abstract class Materializer {
     case ujson.Null => Val.Null(pos)
     case ujson.Num(n) => Val.Num(pos, n)
     case ujson.Str(s) => Val.Str(pos, s)
-    case ujson.Arr(xs) => new Val.Arr(pos, xs.map(x => (() => reverse(pos, x)): Lazy).toArray[Lazy])
+    case ujson.Arr(xs) => new Val.Arr(pos, xs.map(x => new LazyWithComputeFunc(() => reverse(pos, x))).toArray[Lazy])
     case ujson.Obj(xs) =>
       val builder = new java.util.LinkedHashMap[String, Val.Obj.Member]
       for(x <- xs) {

diff --git a/sjsonnet/src/sjsonnet/Val.scala b/sjsonnet/src/sjsonnet/Val.scala
@@ -16,7 +16,7 @@ import scala.reflect.ClassTag
  * evaluated dictionary values, array contents, or function parameters
  * are all wrapped in [[Lazy]] and only truly evaluated on-demand
  */
-abstract class Lazy {
+protected abstract class Lazy {
   protected[this] var cached: Val = null
   def compute(): Val
   final def force: Val = {
@@ -25,6 +25,24 @@ abstract class Lazy {
   }
 }
 
+/**
+ * Thread-safe implementation that discards the compute function after initialization.
+ */
+final class LazyWithComputeFunc(@volatile private[this] var computeFunc: () => Val) extends Lazy {
+  def compute(): Val = {
+    val f = computeFunc
+    if (f != null) {  // we won the race to initialize
+      val result = f()
+      cached = result
+      computeFunc = null  // allow closure to be GC'd
+    }
+    // else: we lost the race to compute, but `cached` is already set and
+    // is visible in this thread due to the volatile read and writes via
+    // piggybacking; see https://stackoverflow.com/a/8769692 for background
+    cached
+  }
+}
+
 /**
   * [[Val]]s represented Jsonnet values that are the result of evaluating
   * a Jsonnet program. The [[Val]] data structure is essentially a JSON tree,
@@ -395,7 +413,7 @@ object Val{
             if(argVals(j) == null) {
               val default = params.defaultExprs(i)
               if(default != null) {
-                argVals(j) = () => evalDefault(default, newScope, ev)
+                argVals(j) = new LazyWithComputeFunc(() => evalDefault(default, newScope, ev))
               } else {
                 if(missing == null) missing = new ArrayBuffer
                 missing.+=(params.names(i))