Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Memory usage optimization: discard compute closure after evaluation i…
…n Lazy (#214) This PR implements a memory usage optimization related to `Lazy`: once we have evaluated a Lazy value, we no longer need to retain the closure that performed the evaluation. Discarding closures (and, transitively, their dependencies) after lazy evaluation saves memory. ## Motivation and discovery of this issue We have a single jsonnet file which has very high peak memory requirements during evaluation. I captured a heap dump and noticed significant memory usage due to `sjsonnet.Lazy[]`, with over half the shallow size consumed by `Lazy[]` arrays: ![image](https://github.com/user-attachments/assets/b09f5011-a81a-40fe-af66-cf364f7456de) From the merged paths view, we see that most of these are retained from anonymous classes: ![image](https://github.com/user-attachments/assets/5d8dd12a-e0d1-4472-8cf7-0117f8baf030) For example, here `sjsonnet.Evaluator$$anonfun$visitAsLazy$2` corresponds to the `() => visitExpr(e)` in https://github.com/databricks/sjsonnet/blob/759cea713bc8e39f45b844a75358fd780f75d480/sjsonnet/src/sjsonnet/Evaluator.scala#L81-L84 That is defining an anonymous implementation of [sjsonnet.Lazy](https://github.com/databricks/sjsonnet/blob/759cea713bc8e39f45b844a75358fd780f75d480/sjsonnet/src/sjsonnet/Val.scala#L19-L26) using the [single abstract method type](https://bargsten.org/scala/anonymous-class-and-sam/) syntax. Here, [visitExpr](https://github.com/databricks/sjsonnet/blob/759cea713bc8e39f45b844a75358fd780f75d480/sjsonnet/src/sjsonnet/Val.scala#L551-L552) takes an implicit [ValScope](https://github.com/databricks/sjsonnet/blob/759cea713bc8e39f45b844a75358fd780f75d480/sjsonnet/src/sjsonnet/ValScope.scala#L5-L20) parameter. ValScope is a [value class](https://docs.scala-lang.org/overviews/core/value-classes.html), wrapping a `bindings: Array[Lazy]`. Thus, our `sjsonnet.Evaluator$$anonfun$visitAsLazy$2` anonymous class captures the values needed to evaluate the `visirExpr(e)` closure, capturing the `bindings` array and thereby contributing to the high count of retained `Array[Lazy]`. We can also see this from the object inspection in the heap dump: ![image](https://github.com/user-attachments/assets/7469bfd1-4c08-4a21-ab20-661614abf7dd) ## Insight behind the optimization: we don't need the closure after evaluation In the heap dump that I looked at, most of these anonymous Lazy instances had non-null `cached` fields, meaning that their lazy values had been computed. At this point the value will not be recomputed so we can discard the closure, and, transitively discard its heavyweight bindings, which in turn reference more closures, and bindings, and so on. I also draw inspiration (but not implementation) from a neat behavior of Scala `lazy val`s where class constructor parameters that are used exclusively in `lazy val` are discarded after successful lazy evaluation. For instance, the code ```scala class Lazy(f: () => String) { lazy val value = f() } ``` decompiles (via [cfr](https://www.benf.org/other/cfr/)) into: ```java public class Lazy { private String value; private Function0<String> f; private volatile boolean bitmap$0; private String value$lzycompute() { Lazy lazy = this; synchronized (lazy) { if (!this.bitmap$0) { this.value = (String)this.f.apply(); this.bitmap$0 = true; } } this.f = null; return this.value; } public String value() { if (!this.bitmap$0) { return this.value$lzycompute(); } return this.value; } public Lazy(Function0<String> f) { this.f = f; } } ``` demonstrating how the closure is discarded after lazy evaluation. ## Implementation This PR implements a similar optimization. I introduce a new class ` LazyWithComputeFunc(@volatile private[this] var computeFunc: () => Val) extends Lazy` which can be used in place of the anonymous classes and which discards the closure after evaluation. The underlying implementation is slightly tricky because of a few performance considerations: - It's really important that we optimize performance for the case where a value is computed once and read many times. Previously, commit d26e9db previously took pains to ensure that `force` was monomorphic, so I couldn't change that. - Similarly, I don't to introduce any use of locks synchronization, nor any volatile accesses on hot paths. - Finally, I must ensure that the code is thread safe: it's acceptable to redundantly compute a value if we have concurrent initialization, but we can't NPE or otherwise crash. Here, I have chosen to make `computeFunc` a volatile field and check it inside of `compute()`. In ordinary cases, `compute()` will only be called once because `force` checks whether `cached` has already been computed. In the rare case of concurrent calls to `compute()`, we first check whether `computeFunc` has been nulled: if it is null then some other thread computed and cached a value (assigned from within `compute()` itself) and that other thread's write is guaranteed to be visible to the race-condition-losing thread because the volatile read of `computeFunc` provides piggybacked visibility of writes from the other racing thread (see https://stackoverflow.com/a/8769692). ## Testing This passes existing unit tests. I did some manual heap dump tests to validate that this cuts memory usage on small toy examples. I have not yet run this end-to-end on the real workload which generated the original heap dumps.
- Loading branch information