Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type-Api support and validation speedup #218

Merged
merged 56 commits into from
Jan 24, 2025
Merged

Conversation

Pfeil
Copy link
Member

@Pfeil Pfeil commented Aug 28, 2024

I replaced the guava cache with an async parallel (no real async in java) cache with higher performance. Some tests do not succeed yet, because the details field is missing in some exception bodies. Not sure why it happens, but it is probably simple to fix and enough to do some speedup experiments.

This PR will last until we have a stable and significant performance gain (at least down to 25% or something) and have integrated all low-hanging fruits.

This requires some refactorings in very old parts of the Typed PID Maker, where I want to get rid of a lot of code.

  • Whether we really achieve a speed gain here has yet to be tested.
    • We currently get a 2x to 3x improvements on my notebook on Wi-Fi. Which is quite bad considering I experimented with different thread pool sizes.
  • Fix tests
    • The issue is likely that the exceptions are wrapped into CompletionExceptions. Fix should be easy.
  • instead of waiting for the profile, start by validating each attribute's value independently. This should give some speedup and ensures we check additional properties in any case.
  • Instead of using the JSON schemas in dtr-test, evaluate if the type API (current deployment or its current version) can replace it for us.
  • If one type is a profile, also check for mandatory attributes,
    • and for "allowAdditionalAttributes" being set. not supported by EOSC DTR, so we currently do not support it per profile until a DTR with support for this comes up.
      • If no additionalAttributesAllowed value is available, assume it to be true in the schema. Typed PID Maker configuration should not prevent this. The configuration should only lift this restriction (allow even though not allowed), not restrict it further (forbit instead of allowing).
    • Allow to override allowAdditionalAttibutes to true by configuration file or API parameter. TODO double check it does only override to true, not to false!
  • Use type-api as well as the legacy schemas from dtr-test for validation. Make sure it can be more easily extended to further schema generators.
  • fix tests
  • fix test coverage
  • ensure we have good validation tests
  • check if I should rename some of the config properties internally (or document them better)
  • double check if we documented all properties in application-default.properties
  • merge/rebase main into this branch

Benchmarking:

  • For faster and statistically more reliable testing, add an option to disable the cache from application.properties and document it in the application.properties file. DONE: We already have it! Setting pit.typeregistry.cache.maxEntries = 0 should be enough for our case.
  • Set up some proper benchmarking suite to have comparable results. Use hurl 5 for that. Proposal:
    • We have some testing scripts for the docker container. Replace them with hurl
    • and make a new benchmark test.
    • Update the README with how to run the tests and what the benchmarks parameters are we'd like to use for continuous comparison.
  • Make some tests on a wired connection to figure out how we can achieve the best performance.
    • Document the results and commit number in the readme file.
  • Compare the performance of this branch to using old school executors (also on this branch) and prefer those if the results are ok, as it seems the Java ecosystem is weird on virtual threads (1), (2).
    • Test if the old school solution runs also with zulu in the CI. Keep temurin in the CI after the test, though.
    • Report in Teams about the results (and also document them here somewhere)
  • We can do some analysis, maybe with a flame graph (like here) or VisualVM.
  • I think we documented the long validation times here and there (see Potential timeout error #136, Swagger documentation texts, application.properties, webpage, etc). Depending on the results, we should adjust this documentation to shorter times.
  • After the Java 21 update (merged, we can start here!), we can try the new virtual thread executor
  • Lazy-Type-Loading: We do collect more information than we might need (all profile attributes, even though we may only need a few of them). We can improve here: Profiles mustn't know their subtypes, we just have to compare the PIDs and request them as required. This should result in minimal validation cost and will potentially distribute the need for cache refreshes onto several requests.
  • Profile-Preloading?: Define a set of PIDs which will regularly be updated (and on application startup). For example, search for profiles containing the setting which indicates which is the profile attribute, or just resolve all DTR profiles. Now that we have async and some proper approaches, this sounds more like a hack to me.
  • Try hooking into the http clients and log a warning if a request to an external service takes too long.
  • See if these logs pop up in our benchmarks when outliers are coming in. -> they do. The hickups are caused by the requests to external sources.
  • Try some benchmarks in parallel and see how it performs. -> no measurable difference

Issues to report in external repositories:

  • The type 21.T11148/7fdada5846281ef5d461 has a wrong schema being generated. The property name in the json is unexpectedly the name of the parent type. Make an issue at the type-api repo.
  • The type 21.T11969/d15381199a44a16dc88d is getting an invalid schema. The schema says it must be an object, but all oneOf options are string or number. There is already an issue for this type, add this example as a test and an indicator this is also happening on the new EOSC dtr, not only on dtr-test.

Make issues for changes which shall be done in extra PRs:

  • evaluate switching to https://github.com/networknt/json-schema-validator for json schema validation. Metastore uses it, and the current library is in maintenance mode (though, a fork exists).
  • remove implicit profile validation and only rely on attribute validation TODO: ask Thomas if it isn't fine already if we allow zero profiles, but have implicit checks if we detect a profile? Or is this too unpredictable? Could be a configuration option (validation strategy).
  • do profile validation only explicitly, e.g. create?dryrun=true&profile=a&profile=b. Note that it must work and be well tested together with the other parameters, like dryrun and validate.

Finishing tasks:

  • Think about breaking changes (look at the tests that needed to be changed)
    • Errors for validation may occur in different order than before.
    • The new validation may be less restrictive (additionalAttributes assumed false by default previously, now is true). Before: No additional attributes was assumed.
    • Having no profile is ok, does not throw validation error. This is to support more use cases of other projects. Before:, at least one profile was required.
  • merge into dev 3.0.0 branch

@Pfeil Pfeil force-pushed the validation-speedup-experiments branch 7 times, most recently from 6bb28b6 to aad4408 Compare August 28, 2024 23:20
@Pfeil Pfeil force-pushed the validation-speedup-experiments branch from aad4408 to efb4bf5 Compare August 29, 2024 12:53
@coveralls
Copy link

coveralls commented Aug 29, 2024

Pull Request Test Coverage Report for Build #452

Details

  • 290 of 352 (82.39%) changed or added relevant lines in 18 files are covered.
  • 3 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+3.8%) to 76.246%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/main/java/edu/kit/datamanager/pit/cli/CliTaskBootstrap.java 0 1 0.0%
src/main/java/edu/kit/datamanager/pit/pidsystem/impl/HandleProtocolAdapter.java 2 3 66.67%
src/main/java/edu/kit/datamanager/pit/typeregistry/schema/SchemaInfo.java 1 3 33.33%
src/main/java/edu/kit/datamanager/pit/pitservice/impl/TypingService.java 5 10 50.0%
src/main/java/edu/kit/datamanager/pit/pitservice/impl/EmbeddedStrictValidatorStrategy.java 39 45 86.67%
src/main/java/edu/kit/datamanager/pit/typeregistry/RegisteredProfile.java 16 22 72.73%
src/main/java/edu/kit/datamanager/pit/typeregistry/schema/DtrTestSchemaGenerator.java 34 41 82.93%
src/main/java/edu/kit/datamanager/pit/typeregistry/impl/TypeApi.java 97 106 91.51%
src/main/java/edu/kit/datamanager/pit/domain/Operations.java 18 30 60.0%
src/main/java/edu/kit/datamanager/pit/typeregistry/schema/TypeApiSchemaGenerator.java 31 44 70.45%
Files with Coverage Reduction New Missed Lines %
src/main/java/edu/kit/datamanager/pit/pidsystem/impl/HandleProtocolAdapter.java 1 32.43%
src/main/java/edu/kit/datamanager/pit/pitservice/impl/EmbeddedStrictValidatorStrategy.java 2 84.31%
Totals Coverage Status
Change from base Build #441: 3.8%
Covered Lines: 918
Relevant Lines: 1204

💛 - Coveralls

@Pfeil Pfeil force-pushed the validation-speedup-experiments branch 3 times, most recently from a38cdee to b5fba64 Compare August 29, 2024 15:02
@Pfeil Pfeil force-pushed the validation-speedup-experiments branch from b5fba64 to f976bdd Compare August 29, 2024 15:30
@Pfeil Pfeil added the maintenance Not a bug, but should be done. label Oct 11, 2024
@Pfeil

This comment was marked as resolved.

@Pfeil Pfeil self-assigned this Nov 8, 2024
@Pfeil Pfeil force-pushed the validation-speedup-experiments branch from 6e64ea4 to d4317c8 Compare November 16, 2024 00:49
@Pfeil Pfeil changed the title Validation speedup experiments Type-Api support and validation speedup Nov 16, 2024
@Pfeil Pfeil force-pushed the validation-speedup-experiments branch 3 times, most recently from 4b57404 to 6b07c6f Compare November 19, 2024 19:53
- support for records without profiles
- support for records with multiple profiles
- support for multiple profile attribute keys/types
- support for additional attributes
- in general, attribute validation and profile validation are now separate tasks
@Pfeil Pfeil mentioned this pull request Jan 13, 2025
@Pfeil Pfeil linked an issue Jan 13, 2025 that may be closed by this pull request
@Pfeil Pfeil force-pushed the validation-speedup-experiments branch from 991e6ae to 0aada09 Compare January 21, 2025 23:55
@Pfeil
Copy link
Member Author

Pfeil commented Jan 22, 2025

Benchmark results

These are the first observations from the benchmark results:

  • Both, the new-virtual-thread-per-task method and the new-thread-per-task method achieve similar results of ~250ms per validation (average of 1000 runs) or even lower (e.g. 160ms). I assume the difference in average was due to network connectivity to handle.net, dtr-test or the typeapi.lab.
  • There are outliers:
    • The first validation takes longer (usually ~780ms).
      • It looks like an active cache, but it should be disabled. The size is set to 0. Some spring initialization maybe?
      • Setting the Cache to a size of 1000, the average on 100 runs is 5ms. The issue of this number is definitely not the caching, it is rather some kind of initialization. The number of the first request is in this case slightly higher, so it may be some cache initialization.
      • New theory: The HTTP clients start a session in the first call and do not close it, resulting in faster subsequent requests. BUT: Disabling this would falsify the validation times also, because we usually do send multiple requests to each service within one validation task. We'd have to add code to close it after a request, but it does not really make sense. I am fine with this finding.
    • Sometimes there are outliers of ~1s or ~6s, but sometimes also ~20s or even more. Assumption are hickups in external services, but this is yet to be tested.

The last assumption seems to hold. Some logging excerpts from the benchmarks (note: the logging is only done if a request takes more than 400ms, which is quite tolerant):

2025-01-23T14:23:52.985Z  WARN 23 --- [-4-thread-49045] e.k.d.pit.typeregistry.impl.TypeApi      : Long http request to https://typeapi.lab.pidconsortium.net/v1/types/21.T11148/b8457812905b83046284 (1032)
2025-01-23T14:23:59.571Z  WARN 23 --- [-2-thread-28708] e.k.d.p.t.schema.TypeApiSchemaGenerator  : Long http request to https://typeapi.lab.pidconsortium.net/v1/types/schema/21.T11148/aafd5fb4c7222e2d950a (5184)
2025-01-23 15:36:48 2025-01-23T14:36:48.229Z  WARN 23 --- [-2-thread-21609] e.k.d.p.t.schema.DtrTestSchemaGenerator  : Long http request to https://hdl.handle.net/21.T11148%2F076759916209e5d62bd5 (553ms)
2025-01-23 15:36:48 2025-01-23T14:36:48.234Z  WARN 23 --- [-2-thread-21608] e.k.d.p.t.schema.DtrTestSchemaGenerator  : Long http request to https://hdl.handle.net/21.T11148%2F1a73af9e7ae00182733b (558ms)
2025-01-23 15:36:48 2025-01-23T14:36:48.240Z  WARN 23 --- [-2-thread-21599] e.k.d.p.t.schema.DtrTestSchemaGenerator  : Long http request to https://hdl.handle.net/21.T11148%2F82e2503c49209e987740 (567ms)
2025-01-23 15:36:48 2025-01-23T14:36:48.241Z  WARN 23 --- [-2-thread-21597] e.k.d.p.t.schema.DtrTestSchemaGenerator  : Long http request to https://hdl.handle.net/21.T11969%2Fa00985b98dac27bd32f8 (570ms)
2025-01-23 15:36:48 2025-01-23T14:36:48.245Z  WARN 23 --- [-2-thread-21606] e.k.d.p.t.schema.DtrTestSchemaGenerator  : Long http request to https://hdl.handle.net/21.T11148%2F1c699a5d1b4ad3ba4956 (569ms)
2025-01-23 15:36:48 2025-01-23T14:36:48.248Z  WARN 23 --- [-2-thread-21612] e.k.d.p.t.schema.DtrTestSchemaGenerator  : Long http request to https://hdl.handle.net/21.T11148%2F6ae999552a0d2dca14d6 (571ms)
2025-01-23 15:36:48 2025-01-23T14:36:48.249Z  WARN 23 --- [-2-thread-21613] e.k.d.p.t.schema.DtrTestSchemaGenerator  : Long http request to https://hdl.handle.net/21.T11148%2F2f314c8fe5fb6a0063a8 (572ms)
2025-01-23 15:36:48 2025-01-23T14:36:48.250Z  WARN 23 --- [-2-thread-21605] e.k.d.p.t.schema.DtrTestSchemaGenerator  : Long http request to https://hdl.handle.net/21.T11148%2F7fdada5846281ef5d461 (575ms)
2025-01-23 15:36:48 2025-01-23T14:36:48.253Z  WARN 23 --- [-2-thread-21603] e.k.d.p.t.schema.DtrTestSchemaGenerator  : Long http request to https://hdl.handle.net/21.T11148%2Fb8457812905b83046284 (579ms)
2025-01-23 15:36:48 2025-01-23T14:36:48.255Z  WARN 23 --- [-2-thread-21601] e.k.d.p.t.schema.DtrTestSchemaGenerator  : Long http request to https://hdl.handle.net/21.T11148%2Fa753134738da82809fc1 (582ms)

@Pfeil Pfeil marked this pull request as ready for review January 23, 2025 17:13
@Pfeil Pfeil changed the base branch from master to dev-v3 January 24, 2025 13:38
@Pfeil Pfeil merged commit 471a5a3 into dev-v3 Jan 24, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Not a bug, but should be done.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Potential timeout error Enable using the Type API instead of Cordra to get a profiles JSON Schema.
2 participants