Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add A/B comparison support #445

Open
joaospinto opened this issue Nov 2, 2023 · 8 comments
Open

Add A/B comparison support #445

joaospinto opened this issue Nov 2, 2023 · 8 comments

Comments

@joaospinto
Copy link

Hi! It would be nice to have an A/B mode for speedscope.

Let me clarify what I mean by this. Suppose I have a program, and gather its initial profiling data. This would be the "A" output. Then I go and make a bunch of changes, and gather some new profiling data. This would be the "B" output. Then I'd want to compare what changed between A and B.

This is a very common use case, and I wonder what your thoughts are about adding some native functionality for this.

Thanks!

@zacharyfmarion
Copy link
Contributor

+1 I have also thought about this - maybe to start with a "compare" tab that allows you to upload two profiles - then shows a table where you can sort by total / self time +/-. A fancier version would be to render a profile diff like this to show which functions got cheaper or more expensive. @jlfwong how opposed are you to large UI additions to the website? Not sure what your philosophy is around keeping the app minimal

@jlfwong
Copy link
Owner

jlfwong commented Dec 22, 2023

I'm not opposed to large UI additions, with a few caveats...

  1. I'd definitely want to hash out details before going to a PR. For something of this scope, I'd want to start with both UI mocks and an eng spec. This is something on the order of a full-blown project, and I'd want a place to discuss product and eng tradeoffs before this turns into a big PR (it would almost definitely be better as a series of small PRs)
  2. As you might've noticed, the time I put into speedscope fluctuates a lot, and there are periods where I don't spend too much time focused on it. For more isolated changes to things that are purely import-based, it's easier for me to review quickly because it's easy to think through the scope of implications for a change. For something with wide-reaching implications like this, I'd expect you'd need to bear with me for much longer.

Of course even if I drag my feet, if you have a version that works, you always have the option of self-hosting a version somewhere for people to play with :)

@zacharyfmarion
Copy link
Contributor

zacharyfmarion commented Dec 31, 2023

@joaospinto @jlfwong I took some time to look into this in the past week - have a working self-hosted version here if either of you want to play around with it.

It would definitely would need some cleanup and iteration like you describe above. I'm going to collect some feedback from some engineers I work with and then maybe can start a spec + work out the UI details with you. Here are some screenshots of how it currently works:

  1. New Compare tab
Screenshot 2023-12-31 at 11 01 10 AM
  1. Table view showing frame differences
Screenshot 2023-12-31 at 11 01 37 AM
  1. Stacked call stack flamegraphs to compare function calls
Screenshot 2023-12-31 at 1 45 18 PM

Initial thoughts:

  • It's hard to be totally accurate matching before + after frames if line + col numbers are in the frame key, since if you change a function and want to compare the before and after, the keys will not match. We could potentially introduce a "compareKey" in addition to the normal key that is basically the following:
function getCompareKey(frame: Frame) {
  if (['anonymous', '(unnamed)'].includes(frame.name)) {
    return `${frame.name}:${frame.file}:${frame.line}:${frame.col}`
  }

  // Assume that function names are unique per file if not anonymous
  return `${frame.name}:${frame.file}`
}
  • Would require refactoring some global state because of some assumptions that no longer hold up. In my current code I have just introduced a new atom called compareProfileGroupAtom but there might be a more elegant approach.
  • Definitely lots of error handling / edge cases to think through. Off the top of my head: erroring if there are no matching frames between profiles, erroring if there is a significant difference in total time between profiles (maybe allow selecting a slice of a profile to dial in on a similar execution path), ensuring that we are properly aggregating frames so we don't say that a function got faster but there is another frame for the same function where it got slower, etc.
  • Supporting multi-profile viewing might be a natural extension of this if the code is already being refactored. Probably not something I would want to scope in though if I implemented this.

@jlfwong maybe you could play around with the UI and we could start discussing the UX? The code itself is probably not worth looking at at the moment. Here are some example profiles that I have been using to test.

@zacharyfmarion
Copy link
Contributor

An alternative would be to have some kind of global toggle to go between "single viewing mode" and "compare mode", and re-using the existing tabs. I did some explorations into red/green flamegraph rendering to show improvements / regressions, but the issue is that if you are looking at a non-aggregated timeseries view of a flamegraph, you really want to know if a specific function call got slower or faster, not if the total time spent in that function got slower or faster. That requires an algorithm for diffing two n-ary trees, which is hard and made my head hurt. It should theoretically be possible though. Here is an example flamegraph diff just matching on frames (showing self time change):

Screenshot 2023-12-31 at 6 19 52 PM

There is also a lot more UI complexity introduced by these kinds of graphs, for example:

  • Should we allow a user to specify a minimum threshold in ms for what is considered a change that matters?
  • Toggle between showing self change vs total change? Or just always show self change?

There is definitely a world in which frame-based flamegraph diffing is good enough (it does give useful signal) even though in certain cases it will mark something as green which actually is lower or red which actually is faster. Kind of depends on how fast and loose we want to fly with correctness / interpretability.

@jlfwong maybe we can discuss a bit here first just to get your initial thoughts and then I can flesh out an approach? Definitely going to have less time once I head back to work but would love to work with you on it if you think it's valuable. Also no rush responding, happy new year 🥳

@zacharyfmarion
Copy link
Contributor

zacharyfmarion commented Jan 1, 2024

Here is a draft of the motivation to clarify exactly why I think this would be valuable:


Often when we are doing performance optimization work, we are iteratively improving a certain part of a performance trace. This process looks like:

  1. Take one trace, see what is slow
  2. Attempt to fix
  3. Take another trace
  4. Open two speedscope windows and switch between tabs to see whether the certain area of the profile we attempted to improve actually got faster

Generally I use the flamechart to get a view of what is slow, and then the sandwich view to see exactly what the total or self time difference was for a given function. This process is suboptimal as you only see changes if they are visually obvious in the flamegraph, or if you know what you’re looking for.

This means that you could very well introduce an optimization that improves the thing you are targeting, but slows something else down. Without manually checking every part of the profile to see what got slower or faster, it’s hard to make sure that you aren’t introducing a regression. Additionally, it’s just not as convenient as having a view that tells you exactly what got slower or faster between profiles.

Ideally we could make this process simpler by integrating a view into the speedscope UI that natively allows for comparing two profiles. This would reduce the burden of switching between tabs and also surface exactly what got slower or faster, instead of just what the user visually notices.


I think there is a counterargument that says "it's not that hard just to look" and that this is kind of complicated and there are a lot of ways that incorrect / misleading information could be surfaced, so it's not worth it. From my perspective the compare tab showing the table view is a good middle ground, but open to discussion!

@cb-bradbeebe
Copy link

cb-bradbeebe commented Mar 27, 2024

@jlfwong What do you think about the proposal above? I've been playing around with its compare functionality and it has come in very handy.

@iogbole
Copy link

iogbole commented Aug 23, 2024

@jlfwong - Would you consider accepting this proposal?

@liangGTY
Copy link

Hi, I wanted to check in about the progress of the A/B comparison feature for Speedscope that was discussed earlier by @zacharyfmarion and @jlfwong .

I saw that there has been some work on a self-hosted version with a "Compare" tab and flamegraph diffing for profiling data. Is there any update on when this feature might be available in the main Speedscope project? Are there any blockers or plans for additional UI iterations?

I would also be interested in helping test or provide feedback on the feature if it's ready for review.

Looking forward to hearing more about it!

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants