-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Several performance optimizations, primarily aimed at reducing garbage object creation in common cases #258
Several performance optimizations, primarily aimed at reducing garbage object creation in common cases #258
Conversation
val valueCache = if (sup == null) { | ||
Val.Obj.getEmptyValueCacheForObjWithoutSuper(builder.size()) | ||
} else { | ||
new java.util.HashMap[Any, Val]() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better with a size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the PR description and the Scaladoc of getEmptyValueCacheForObjWithoutSuper
for more details, but this is deliberate:
Because this object has a super object, computing an upper bound on the field count is not free. In this PR I'm aiming to be conservative and only down-size in cases where we have a cheap tight bound.
Also, an object might have a large number of fields but many of them might not end up being computed in a particular evaluation or materialization. In those cases, we want to avoid sizing the map larger than the previous default.
val rValue = if (r.containsVisibleKey(key)) r.valueRaw(key, r, pos)(ev) else null | ||
if (rValue == null || !rValue.isInstanceOf[Val.Null]) { | ||
if (lValue != null && rValue == null) { | ||
kvs(kvsIdx) = (key, new Val.Obj.ConstMember(false, Visibility.Normal, lValue)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change to use ConstMember
for the case where we have an already-computed value which doesn't need cleaning (which must be done lazily) ends up reducing allocations because we don't need to generate a lambda / thunk and capture state from the closure.
I'm considering adding a branch in the createMember(recSingle(rValue))
path (both here and in recSingle
) to do a similar optimization for leaf non-Obj values that come in from the RHS.
|
||
case _ => v | ||
} | ||
def distinctKeys(lKeys: Array[String], rKeys: Array[String]): Array[String] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan to add code comments here and throughout the PR before it merges, BTW. I'll revisit this with fresh eyes tomorrow.
@@ -221,37 +244,65 @@ object Val{ | |||
allKeys | |||
} | |||
|
|||
@inline def hasKeys = !getAllKeys.isEmpty | |||
@inline def hasKeys: Boolean = { | |||
val m = if (static || `super` != null) getAllKeys else value0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notably, I don't call getValue0
here or in the other new branches added below: it turns out that value0
is guaranteed to be defined for non-static objects, so getValue0
is only effective for static objects.
When I started working on sjsonnet
, I found the invariants on some of the Obj
internals to be somewhat confusing. I'm considering adding a big block comment at the top of this class to document my current understanding. I think we especially need documentation for the invariants / constraints on static objects.
As a performance optimization, is there any jmh number can share, thanks, I'm planing to learn some jsonnet today. |
This PR bundles together several small optimizations, most aimed at reducing garbage object creation. Collectively, these changes result in a large performance improvement for some of our largest jsonnet inputs.
Optimizations
These are somewhat coherently split into multiple smaller intermediate commits, which can help for navigating this change. I'll describe each optimization in more detail below.
Optimizing
Obj
key lookup methods for objects withoutsuper
The
hasKeys
,containsKey
,containsVisibleKey
,allKeyNames
, andvisibleKeyNames
methods can be optimized in the common case of objects that don't havesuper
objects.We already perform an optimization for
static
objects, pre-populatingallKeys
, but for non-static objects we had to rungatherAllKeys
to populate aLinkedHashMap
of keys and a boolean indicating visibility.If a non-static object has no
super
then we can just usevalue0
to compute the keys: this avoids an additionalLinkedHashMap
allocation and also lets us pre-size the resulting string arrays, avoiding wasteful array copies from resizes or trims.In
visibleKeyNames
, I chose to pre-size the output builder based on the total key count: this is based a common-case assumption that most objects don't have hidden keys.This optimization makes a huge difference in
std.foldl(std.megePatch, listOfPatches, {})
, where the intermediate merge targets' visible are repeatedly recomputed. In these cases, the intermediate objects contain only visible keys, allowing this optimization to maximally avoid unnecessary array allocations.Pre-size various hash maps
This builds on an idea from #197 : there are multiple places where we construct hashmaps that are either over- or under-sized: an over-sized map wastes space (I saw >90% of backing array slots wasted in some heap dumps) and an under-sized map wastes time and space in re-sizing upwards during construction.
Here, I've generalized that PR's pre-sizing to apply in more contexts.
One notable special case is the
valueCache
: if an object inherits fields then it's not free to determine the map size. As a result, I've only special-sized this forsuper
-free objects. This map is a little bit different fromvalue0
orallFields
because its final size is a function of whether or not object field values are actually computed. Given this, I've chosen to start pretty conservatively by avoiding changing the size in cases where it's not an obvious win; I may revisit this further in a future followup.Change
valueCache
from a Scala map to a Java mapThis was originally necessary because the Scala 2.12 version of
mutable.HashMap
does not support capacity / load factor configuration, which got in the way with the pre-sizing work described above.But a nice secondary benefit is that Java maps let me avoid closure / anonfun allocation in
map.getOrElse(k, default)
calls: even if we don't invokedefault
, we still end up doing some allocations for the lambda / closure / thunk. I had noticed this overhead previously inObj.value
and this optimization should fix it.Remove most Scala sugar in
std.mergePatch
, plus other optimizationsThe
recMerge
andrecSingle
methods used bystd.mergePatch
contained big Scalafor
comprehensions and usedOption
for handling nulls. This improves readability but comes at a surprising performance cost. I would have naively assumed that most of those overheads would have been optimized out but empirically this was not the case in my benchmarks.Here, I've rewritten this with Java-style imperative
while
loops and explicit null checks.Optimize
std.mergePatch
's distinct key computationAfter fixing other bottlenecks, I noticed that the
step in
std.mergePatch
was very expensive. Under the hood, this constructs a combined array, allocates an ArrayBuilder, and uses an intermediate HashSet for detecting already-seen keys.Here, I've added an optimized fast implementation for the cases where
r.visibleKeyNames.length < 8
. I think it's much more common for the LHS of a merge to be large and the RHS to be small, in which case we're conceptually better off by building a hash set on the RHS and removing RHS elements as they're seen during the LHS traversal. But if the RHS is small enough then the cost of hashing and probing will be higher than a simple linear scan of a small RHS array.Here,
8
is a somewhat arbitrarily chosen threshold based on some local microbenchmarking.Special overload of
Val.Obj.mk
to skip an array copyPretty self-explanatory: we often have an
Array[(String, Val.Member)]
and we can avoid a copy by defining aVal.Obj.mk
overload which accepts the array directly.Make
PrettyNamed
implicits into constantsThis is pretty straightforward, just changing a
def
to aval
, but it makes a huge different in reducing ephemeral garabge in some parts of the evaluator.Other changes
I also added
Error.fail
calls in a couple ofcase _ =>
matches which should never be hit. We weren't actually hitting these, but it felt like a potentially dangerous pitfall to silently ignore those cases.