-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maven transitive scanning is slow due to the new registry support #1447
Comments
Hmm that does line up with a change I made to the Maven resolver google/deps.dev@b3ba033 Though what benchmarks I did run on it found it should've been faster. |
I profiled again before the v1.9.0 downgrade and after, and didn't see the maven code as a hotspot in either. So it's possible we were processing a lot of Java repositories at the time. However after deploying the new revision to GKE (where the only change was the downgrade), our CPU resources dropped significantly (from ~5.0 CPU to ~1.0 CPU) Our p99 latencies for the scan also went from occasional spikes of 4000 seconds back to ~50 seconds. I'll see if I can isolate a problematic repo |
@spencerschrock thank you for your updates! I suspect #1286 might be the problematic change. I already have a draft PR which should reduce the number of Maven repositories we fetch. |
One example is github.com/SpringCloud/spring-cloud-gray which took 3-4 minutes to scan on v1.9.0 and on v1.9.1 I gave up after 15 minutes Similarly github.com/SpringCloud/spring-cloud-dubbo |
|
It is a known issue that Maven resolution is slow due to the new registry support, and I am working on that. I will change the title of this issue to reflect the root cause. |
That change also added the |
#1447 When fetching requirements of a Maven dependency, we should not keep track of the registries defined in pom.xml of dependencies. Doing that will add a lot irrelevant registries to the client and sends unnecessary requests, and finally slows down the resolution. This PR also disables `PreFetch` in Maven transitive scanning. `PreFetch` calls `MatchVersions` which sends requests to `maven-metadata.xml` for native Maven client. `maven-metadata.xml` is not necessarily needed for all dependencies. Disabling `PreFetch` will save us from making requests to these files. However, calling `PreFetch` should be fine for deps.dev client, and this can be a TODO in the future. With this fix, resolving [pom.xml](https://github.com/google/osv-scanner/blob/main/internal/remediation/fixtures/zeppelin-server/pom.xml) is now shortened to 10-20 seconds.
Scorecard's public scanning infrastructure has been having some runtime issues since upgrading to v1.9.1. I'm hoping downgrade in our deployment to confirm the issue goes away, but it lines up with when we merged the dependabot PR.
pprof
profiling shows a lot of time indeps.dev/util/semver
, specifically around the Maven dependencies.After chatting with the deps.dev team, it's know for
deps.dev/util/semver
to be a slow operation. I don't think its usage is new between v1.9.0 and v1.9.1, but perhaps it's used in a new way?v1.9.0...v1.9.1
The text was updated successfully, but these errors were encountered: