Type-Api support and validation speedup #218

Pfeil · 2024-08-28T11:57:54Z

I replaced the guava cache with ~~an async~~ parallel (no real async in java) cache with higher performance. ~~Some tests do not succeed yet, because the details field is missing in some exception bodies. Not sure why it happens, but it is probably simple to fix and enough to do some speedup experiments.~~

This PR will last until we have a stable and significant performance gain (at least down to 25% or something) and have integrated all low-hanging fruits.

This requires some refactorings in very old parts of the Typed PID Maker, where I want to get rid of a lot of code.

Benchmarking:

Issues to report in external repositories:

The type 21.T11148/7fdada5846281ef5d461 has a wrong schema being generated. The property name in the json is unexpectedly the name of the parent type. Make an issue at the type-api repo.
The type 21.T11969/d15381199a44a16dc88d is getting an invalid schema. The schema says it must be an object, but all oneOf options are string or number. There is already an issue for this type, add this example as a test and an indicator this is also happening on the new EOSC dtr, not only on dtr-test.

Make issues for changes which shall be done in extra PRs:

evaluate switching to https://github.com/networknt/json-schema-validator for json schema validation. Metastore uses it, and the current library is in maintenance mode (though, a fork exists).
remove implicit profile validation and only rely on attribute validation TODO: ask Thomas if it isn't fine already if we allow zero profiles, but have implicit checks if we detect a profile? Or is this too unpredictable? Could be a configuration option (validation strategy).
do profile validation only explicitly, e.g. create?dryrun=true&profile=a&profile=b. Note that it must work and be well tested together with the other parameters, like dryrun and validate.

Finishing tasks:

Think about breaking changes (look at the tests that needed to be changed)
- Errors for validation may occur in different order than before.
- The new validation may be less restrictive (additionalAttributes assumed false by default previously, now is true). Before: No additional attributes was assumed.
- Having no profile is ok, does not throw validation error. This is to support more use cases of other projects. Before:, at least one profile was required.
merge into dev 3.0.0 branch

coveralls · 2024-08-29T13:35:47Z

Pull Request Test Coverage Report for Build #452

Details

290 of 352 (82.39%) changed or added relevant lines in 18 files are covered.
3 unchanged lines in 2 files lost coverage.
Overall coverage increased (+3.8%) to 76.246%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
src/main/java/edu/kit/datamanager/pit/cli/CliTaskBootstrap.java	0	1	0.0%
src/main/java/edu/kit/datamanager/pit/pidsystem/impl/HandleProtocolAdapter.java	2	3	66.67%
src/main/java/edu/kit/datamanager/pit/typeregistry/schema/SchemaInfo.java	1	3	33.33%
src/main/java/edu/kit/datamanager/pit/pitservice/impl/TypingService.java	5	10	50.0%
src/main/java/edu/kit/datamanager/pit/pitservice/impl/EmbeddedStrictValidatorStrategy.java	39	45	86.67%
src/main/java/edu/kit/datamanager/pit/typeregistry/RegisteredProfile.java	16	22	72.73%
src/main/java/edu/kit/datamanager/pit/typeregistry/schema/DtrTestSchemaGenerator.java	34	41	82.93%
src/main/java/edu/kit/datamanager/pit/typeregistry/impl/TypeApi.java	97	106	91.51%
src/main/java/edu/kit/datamanager/pit/domain/Operations.java	18	30	60.0%
src/main/java/edu/kit/datamanager/pit/typeregistry/schema/TypeApiSchemaGenerator.java	31	44	70.45%

Files with Coverage Reduction	New Missed Lines	%
src/main/java/edu/kit/datamanager/pit/pidsystem/impl/HandleProtocolAdapter.java	1	32.43%
src/main/java/edu/kit/datamanager/pit/pitservice/impl/EmbeddedStrictValidatorStrategy.java	2	84.31%

Totals
Change from base Build #441:	3.8%
Covered Lines:	918
Relevant Lines:	1204

💛 - Coveralls

# Conflicts: # build.gradle

…mas and structure

- support for records without profiles - support for records with multiple profiles - support for multiple profile attribute keys/types - support for additional attributes - in general, attribute validation and profile validation are now separate tasks

…experiments

This is useful for deployment, but also to see in benchmarks if the parameter has been set properly.

…ration steps.

… mature

Pfeil · 2025-01-22T15:55:19Z

Benchmark results

These are the first observations from the benchmark results:

Both, the new-virtual-thread-per-task method and the new-thread-per-task method achieve similar results of ~250ms per validation (average of 1000 runs) or even lower (e.g. 160ms). I assume the difference in average was due to network connectivity to handle.net, dtr-test or the typeapi.lab.
There are outliers:
- The first validation takes longer (usually ~780ms).
  - It looks like an active cache, but it should be disabled. The size is set to 0. Some spring initialization maybe?
  - Setting the Cache to a size of 1000, the average on 100 runs is 5ms. The issue of this number is definitely not the caching, it is rather some kind of initialization. The number of the first request is in this case slightly higher, so it may be some cache initialization.
  - New theory: The HTTP clients start a session in the first call and do not close it, resulting in faster subsequent requests. BUT: Disabling this would falsify the validation times also, because we usually do send multiple requests to each service within one validation task. We'd have to add code to close it after a request, but it does not really make sense. I am fine with this finding.
- Sometimes there are outliers of ~1s or ~6s, but sometimes also ~20s or even more. Assumption are hickups in external services, but this is yet to be tested.

The last assumption seems to hold. Some logging excerpts from the benchmarks (note: the logging is only done if a request takes more than 400ms, which is quite tolerant):

2025-01-23T14:23:52.985Z  WARN 23 --- [-4-thread-49045] e.k.d.pit.typeregistry.impl.TypeApi      : Long http request to https://typeapi.lab.pidconsortium.net/v1/types/21.T11148/b8457812905b83046284 (1032)
2025-01-23T14:23:59.571Z  WARN 23 --- [-2-thread-28708] e.k.d.p.t.schema.TypeApiSchemaGenerator  : Long http request to https://typeapi.lab.pidconsortium.net/v1/types/schema/21.T11148/aafd5fb4c7222e2d950a (5184)

2025-01-23 15:36:48 2025-01-23T14:36:48.229Z  WARN 23 --- [-2-thread-21609] e.k.d.p.t.schema.DtrTestSchemaGenerator  : Long http request to https://hdl.handle.net/21.T11148%2F076759916209e5d62bd5 (553ms)
2025-01-23 15:36:48 2025-01-23T14:36:48.234Z  WARN 23 --- [-2-thread-21608] e.k.d.p.t.schema.DtrTestSchemaGenerator  : Long http request to https://hdl.handle.net/21.T11148%2F1a73af9e7ae00182733b (558ms)
2025-01-23 15:36:48 2025-01-23T14:36:48.240Z  WARN 23 --- [-2-thread-21599] e.k.d.p.t.schema.DtrTestSchemaGenerator  : Long http request to https://hdl.handle.net/21.T11148%2F82e2503c49209e987740 (567ms)
2025-01-23 15:36:48 2025-01-23T14:36:48.241Z  WARN 23 --- [-2-thread-21597] e.k.d.p.t.schema.DtrTestSchemaGenerator  : Long http request to https://hdl.handle.net/21.T11969%2Fa00985b98dac27bd32f8 (570ms)
2025-01-23 15:36:48 2025-01-23T14:36:48.245Z  WARN 23 --- [-2-thread-21606] e.k.d.p.t.schema.DtrTestSchemaGenerator  : Long http request to https://hdl.handle.net/21.T11148%2F1c699a5d1b4ad3ba4956 (569ms)
2025-01-23 15:36:48 2025-01-23T14:36:48.248Z  WARN 23 --- [-2-thread-21612] e.k.d.p.t.schema.DtrTestSchemaGenerator  : Long http request to https://hdl.handle.net/21.T11148%2F6ae999552a0d2dca14d6 (571ms)
2025-01-23 15:36:48 2025-01-23T14:36:48.249Z  WARN 23 --- [-2-thread-21613] e.k.d.p.t.schema.DtrTestSchemaGenerator  : Long http request to https://hdl.handle.net/21.T11148%2F2f314c8fe5fb6a0063a8 (572ms)
2025-01-23 15:36:48 2025-01-23T14:36:48.250Z  WARN 23 --- [-2-thread-21605] e.k.d.p.t.schema.DtrTestSchemaGenerator  : Long http request to https://hdl.handle.net/21.T11148%2F7fdada5846281ef5d461 (575ms)
2025-01-23 15:36:48 2025-01-23T14:36:48.253Z  WARN 23 --- [-2-thread-21603] e.k.d.p.t.schema.DtrTestSchemaGenerator  : Long http request to https://hdl.handle.net/21.T11148%2Fb8457812905b83046284 (579ms)
2025-01-23 15:36:48 2025-01-23T14:36:48.255Z  WARN 23 --- [-2-thread-21601] e.k.d.p.t.schema.DtrTestSchemaGenerator  : Long http request to https://hdl.handle.net/21.T11148%2Fa753134738da82809fc1 (582ms)

Pfeil force-pushed the validation-speedup-experiments branch 7 times, most recently from 6bb28b6 to aad4408 Compare August 28, 2024 23:20

speedup: use fast, async cache

efb4bf5

Pfeil force-pushed the validation-speedup-experiments branch from aad4408 to efb4bf5 Compare August 29, 2024 12:53

speedup: use default work stealing executor for "async" cache

160bfe0

Pfeil force-pushed the validation-speedup-experiments branch 3 times, most recently from a38cdee to b5fba64 Compare August 29, 2024 15:02

speedup: use extra executors for validation and deserialization

f976bdd

Pfeil force-pushed the validation-speedup-experiments branch from b5fba64 to f976bdd Compare August 29, 2024 15:30

Pfeil mentioned this pull request Aug 30, 2024

Improvements on non-atomic value validation #179

Open

Pfeil added the maintenance Not a bug, but should be done. label Oct 11, 2024

Merge branch 'master' into validation-speedup-experiments

0d8ce40

# Conflicts: # build.gradle

This comment was marked as resolved.

Sign in to view

Pfeil self-assigned this Nov 8, 2024

chore: rename TypeRegistry to DtrTest, as it depends on dtr-test sche…

c7e6cbb

…mas and structure

Pfeil force-pushed the validation-speedup-experiments branch from 6e64ea4 to d4317c8 Compare November 16, 2024 00:49

Pfeil changed the title ~~Validation speedup experiments~~ Type-Api support and validation speedup Nov 16, 2024

Pfeil force-pushed the validation-speedup-experiments branch 3 times, most recently from 4b57404 to 6b07c6f Compare November 19, 2024 19:53

Pfeil added 3 commits November 20, 2024 18:48

feat: add main code base for type-api support

439e8b0

feat: more flexible validation

5ee3e4a

- support for records without profiles - support for records with multiple profiles - support for multiple profile attribute keys/types - support for additional attributes - in general, attribute validation and profile validation are now separate tasks

feat: use virtual threads for async execution

dabaa4d

cleanup: avoid log spamming in the large record tests.

70ee315

Pfeil mentioned this pull request Jan 13, 2025

Potential timeout error #136

Closed

Pfeil linked an issue Jan 13, 2025 that may be closed by this pull request

Potential timeout error #136

Closed

Pfeil added 11 commits January 14, 2025 18:23

fix(test): improve output and adjust test to new profile behavior

3ea8931

fix(test): fix invalid values before validation of real record

b12a4dd

cleanup: remove useless newline in exception message

8a9609b

cleanup: make executor types easily changeable in one line

90539c2

cleanup: rewrite unpacking of exceptions without recursion

643e2be

cleanup: better names for some config properties

d2b5582

cleanup: use clearer name for additional attributes

a19e2ea

docs: document config properties in application-default.properties

d9d507c

Merge remote-tracking branch 'origin/master' into validation-speedup-…

698392e

…experiments

feat: warn via logging if the cache size is unusual low.

c1fdc5d

This is useful for deployment, but also to see in benchmarks if the parameter has been set properly.

feat: implement dockerized benchmarks

0aada09

Pfeil force-pushed the validation-speedup-experiments branch from 991e6ae to 0aada09 Compare January 21, 2025 23:55

Pfeil added 5 commits January 22, 2025 01:45

fix: make sure test script does not influence caches or else in prepa…

c6fb821

…ration steps.

cleanup: remove TODO comment

4c9350b

benchmarks: wait longer between health check and starting the requests

a3c496b

benchmarks: avoid virtual threads for now until the ecosystem is more…

8fe47fc

… mature

benchmarks: add first results

ad429ba

Pfeil added 4 commits January 22, 2025 17:06

CI: test zulu now that we do not use virtual threads anymore

e4aa487

CI: remove zulu

81f9d23

docs: fix information about validation times

85e1ddf

feat: log warnings if a request took longer than 400ms

a92bd4d

Pfeil marked this pull request as ready for review January 23, 2025 17:13

Pfeil added 2 commits January 24, 2025 14:27

test: test for records with invalid values

0927d2b

cleanup: fix linter warnings

a29e029

Pfeil changed the base branch from master to dev-v3 January 24, 2025 13:38

Pfeil merged commit 471a5a3 into dev-v3 Jan 24, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Type-Api support and validation speedup #218

Type-Api support and validation speedup #218

Pfeil commented Aug 28, 2024 •

edited

Loading

coveralls commented Aug 29, 2024 •

edited

Loading

This comment was marked as resolved.

Pfeil commented Jan 22, 2025 •

edited

Loading

Type-Api support and validation speedup #218

Type-Api support and validation speedup #218

Conversation

Pfeil commented Aug 28, 2024 • edited Loading

coveralls commented Aug 29, 2024 • edited Loading

Pull Request Test Coverage Report for Build #452

Details

💛 - Coveralls

This comment was marked as resolved.

Pfeil commented Jan 22, 2025 • edited Loading

Benchmark results

Pfeil commented Aug 28, 2024 •

edited

Loading

coveralls commented Aug 29, 2024 •

edited

Loading

Pfeil commented Jan 22, 2025 •

edited

Loading