Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Improve logging in core components #3332

Draft
wants to merge 58 commits into
base: main
Choose a base branch
from

Conversation

ff137
Copy link
Contributor

@ff137 ff137 commented Nov 7, 2024

🚧 WIP - to be refactored

List of changes:

  • Expands logging within core and config components: Conductor, PluginRegistry, and DefaultContextBuilder
  • Replaces (almost) all print statements with info logs
  • General improvement of some code for readability / maintainability (mainly in the PluginRegistry)
  • Includes unit test coverage

Comparison of start up logs before and after to be included.

Partially resolves #3202

Note:

  • reviewing the logs, I can see a lot of inefficiency in the number of times that ClassLoader.load_module is called. We see the log "Module ... is already loaded" 500 times during startup. And because I've added more verbose debug logging, there are 5 logs between trying to load it and recognising that it's already loaded. So, currently my log changes will introduce 2500 unnecessary log lines. As a solution, I'll implement caching to the load_module function, so it doesn't execute unnecessarily.
  • Additionally, the logging now reveals that upon every /status check, the AskarStorage module is attempted to be re-loaded ... representing some inefficiency. Will see if I can resolve that.

@ff137 ff137 self-assigned this Nov 7, 2024
@ff137
Copy link
Contributor Author

ff137 commented Nov 7, 2024

Not sure why the following Scenario Integration Tests are failing:

FAILED examples/anoncreds_issuance_and_revocation::anoncreds_issuance_and_revocation
FAILED examples/presenting_revoked_credential::presenting_revoked_credential
FAILED examples/self_attested::self_attested

@ff137
Copy link
Contributor Author

ff137 commented Nov 7, 2024

Looks like BDD Interop tests also silently fail: ("silently" because the Github action shows a green tick)

Failing scenarios:
  features/0036-issue-credential.feature:9  Issue a credential with the Holder beginning with a proposal
  features/0036-issue-credential.feature:23  Issue a credential with the Holder beginning with a proposal with DID Exchange Connection
  features/0036-issue-credential.feature:53  Issue a credential with the Issuer beginning with an offer
  features/0037-present-proof.feature:19  Present Proof where the prover does not propose a presentation of the proof and is acknowledged -- @1.1 
  features/0037-present-proof.feature:20  Present Proof where the prover does not propose a presentation of the proof and is acknowledged -- @1.2 
  features/0037-present-proof.feature:37  Present Proof where the prover does not propose a presentation of the proof and is acknowledged with a DID Exchange Connection -- @1.1 
  features/0037-present-proof.feature:38  Present Proof where the prover does not propose a presentation of the proof and is acknowledged with a DID Exchange Connection -- @1.2 
  features/0037-present-proof.feature:55  Present Proof of specific types and proof is acknowledged with a Drivers License credential type -- @1.1 
  features/0037-present-proof.feature:56  Present Proof of specific types and proof is acknowledged with a Drivers License credential type -- @1.2 
  features/0037-present-proof.feature:74  Present Proof of specific types and proof is acknowledged with a Biological Indicators credential type -- @1.1 
  features/0037-present-proof.feature:91  Present Proof of specific types and proof is acknowledged with multiple credential types -- @1.1 
  features/0037-present-proof.feature:109  Present Proof where the prover does not propose a presentation of the proof and is acknowledged -- @1.1 
  features/0037-present-proof.feature:110  Present Proof where the prover does not propose a presentation of the proof and is acknowledged -- @1.2 
  features/0037-present-proof.feature:129  Present Proof where the prover has proposed the presentation of proof in response to a presentation request and is acknowledged -- @1.1 
  features/0037-present-proof.feature:130  Present Proof where the prover has proposed the presentation of proof in response to a presentation request and is acknowledged -- @1.2 
  features/0037-present-proof.feature:150  Present Proof where the prover has proposed the presentation of proof from a different credential in response to a presentation request and is acknowledged -- @1.1 
  features/0037-present-proof.feature:151  Present Proof where the prover has proposed the presentation of proof from a different credential in response to a presentation request and is acknowledged -- @1.2 
  features/0183-revocation.feature:79  Credential revoked and replaced with a new updated credential, holder proves claims with the updated credential with timesstamp -- @1.1 
  features/0183-revocation.feature:380  Issuer revokes a credential and sends a v1 revocation notification -- @1.1 
  features/0434-out-of-band.feature:9  Issue a v1 indy credential using connectionless out of band invitation
  features/0434-out-of-band.feature:50  Present a v1 indy proof using connectionless out of band invitation

7 features passed, 4 failed, 4 skipped
36 scenarios passed, 21 failed, 103 skipped
348 steps passed, 21 failed, 1014 skipped, 0 undefined
Took 13m2.976s

@ff137 ff137 marked this pull request as draft November 8, 2024 09:05
@ff137 ff137 marked this pull request as ready for review November 11, 2024 08:24
@dbluhm
Copy link
Contributor

dbluhm commented Nov 11, 2024

@jamshale heads up on the test failures

Self-attested

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/src/app/example.py", line 180, in <module>
    asyncio.run(main())
  File "/usr/local/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/usr/src/app/example.py", line 95, in main
    alice_pres_ex = await alice.post(
  File "/usr/src/app/.venv/lib/python3.10/site-packages/acapy_controller/controller.py", line 475, in post
    return await self.request(
  File "/usr/src/app/.venv/lib/python3.10/site-packages/acapy_controller/controller.py", line 338, in request
    body = await self._handle_response(resp, data=data, json=json_)
  File "/usr/src/app/.venv/lib/python3.10/site-packages/acapy_controller/controller.py", line 303, in _handle_response
    raise ControllerError(f"Request failed: {resp.url} {body}")
acapy_controller.controller.ControllerError: Request failed: http://alice:3001/present-proof/send-request 404: Not Found

Presenting revoked cred

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/src/app/example.py", line 198, in <module>
    asyncio.run(main())
  File "/usr/local/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/usr/src/app/example.py", line 113, in main
    await bob.record(topic="revocation-notification")
  File "/usr/src/app/.venv/lib/python3.10/site-packages/acapy_controller/controller.py", line 577, in record
    return await self.event(topic, select=select, event_type=record_type)
  File "/usr/src/app/.venv/lib/python3.10/site-packages/acapy_controller/controller.py", line 652, in event
    raise ControllerError(
acapy_controller.controller.ControllerError: Event from Bob with topic revocation-notification not received before timeout

AnonCreds issuance and revocation

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/src/app/example.py", line 187, in <module>
    asyncio.run(main())
  File "/usr/local/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/usr/src/app/example.py", line 152, in main
    await bob.record(topic="revocation-notification")
  File "/usr/src/app/.venv/lib/python3.10/site-packages/acapy_controller/controller.py", line 577, in record
    return await self.event(topic, select=select, event_type=record_type)
  File "/usr/src/app/.venv/lib/python3.10/site-packages/acapy_controller/controller.py", line 652, in event
    raise ControllerError(
acapy_controller.controller.ControllerError: Event from Bob with topic revocation-notification not received before timeout

@ff137
Copy link
Contributor Author

ff137 commented Nov 11, 2024

Yeah, those test failures are a real head-scratcher for me... Doesn't seem to occur in other PRs. And I can't see what in this contribution will cause the problem.

Maybe there's a way to print the ACA-Py agent logs as part of those Test workflows, see if that reveals anything.

Only solution I can think of is to rebase and drop / re-add commits until I can identify what causes the problem 😅

@dbluhm
Copy link
Contributor

dbluhm commented Nov 11, 2024

Just remembered Jamie is out until Wednesday (I was hoping to just throw this his way lol); I'll take a deeper look at this. It should be possible to get some more logs on failures in the scenario tests, I think.

@jamshale
Copy link
Contributor

jamshale commented Nov 11, 2024

I'm not back until Friday but quickly looked at the logs. 2 of them are failing when trying to receive the revocation webhook notification. Perhaps the log level affected the library checking for this event.

One of the issues with using an external library for this is that we may need to get the library changed for changes like this.

For the self attest test, I can't really think of a reason off the top of my head. Running an individual test with higher logging could help find the issue.

I'd say because these tests are passing with main, this must either be an issue with the code or the acapy-controller library.

@dbluhm
Copy link
Contributor

dbluhm commented Nov 12, 2024

An unfortunate side effect of this change is that pyright/pylance does not detect logger.trace as a valid method. This results in a pretty significant amount of noise in editor, especially in modules with particularly heavy use of trace logs.

Copy link
Contributor

@dbluhm dbluhm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spotted the issue. See comments.

acapy_agent/core/plugin_registry.py Outdated Show resolved Hide resolved
acapy_agent/core/plugin_registry.py Outdated Show resolved Hide resolved
@dbluhm
Copy link
Contributor

dbluhm commented Nov 12, 2024

While testing, I couldn't get the trace logs to print from the plugin registry. I tried setting the log level to "trace" (in the docker-compose for the anoncreds scenario test) and only saw debug and higher. Did I miss something?

@ff137
Copy link
Contributor Author

ff137 commented Nov 12, 2024

An unfortunate side effect of this change is that pyright/pylance does not detect logger.trace as a valid method. This results in a pretty significant amount of noise in editor, especially in modules with particularly heavy use of trace logs.

Hmm, that sucks...

So, my initial idea was to try swap out the default python logger for loguru, which is much better and offers plenty of enhancements over the python logger. Loguru includes trace levels, and built in solutions for serialised logging and timed-file-logging config, etc, which was manually implemented here.

What that would look like is defining a centralised loguru "get_logger" method somewhere, with all the acapy-config, and importing that, instead of using logging.getLogger.

Some other nice features are colorized logging, and being able to bind context with logs. (e.g. bound_logger = logger.bind(thread_id=thread_id) and then you can implicitly include metadata in all logs when using bound_logger instead.)

I think that's still on the cards, because it really is more elegant to work with. But it does represent a big change to a core component (logging). And it'll take some effort to ensure everything remains identical. The potential bulk refactoring can cause some headaches for other maintainers with merge conflicts. And then there's effort in getting everyone on board with the change.

So, at first I wanted to scrap the loguru idea, and just go for this trace addition... Buut, what do you think? Any potential appetite for switching to loguru? It'll take some more time, but I can take a swing at it.

@ff137
Copy link
Contributor Author

ff137 commented Nov 12, 2024

While testing, I couldn't get the trace logs to print from the plugin registry. I tried setting the log level to "trace" (in the docker-compose for the anoncreds scenario test) and only saw debug and higher. Did I miss something?

Hmm, it worked on my local with ACAPY_LOG_LEVEL: trace...

Example for loading protocols.endorse_transaction

2024-11-07 19:55:39,251 acapy_agent.core.plugin_registry TRACE Processing events for plugin: acapy_agent.protocols.endorse_transaction
2024-11-07 19:55:39,251 acapy_agent.utils.classloader TRACE Attempting to load module: definition with package: acapy_agent.protocols.endorse_transaction
2024-11-07 19:55:39,251 acapy_agent.utils.classloader TRACE Preloading parent package: acapy_agent.protocols.endorse_transaction
2024-11-07 19:55:39,251 acapy_agent.utils.classloader TRACE Attempting to load module: acapy_agent.protocols.endorse_transaction with package: None
2024-11-07 19:55:39,251 acapy_agent.utils.classloader TRACE Resolved full module path: acapy_agent.protocols.endorse_transaction
2024-11-07 19:55:39,251 acapy_agent.utils.classloader TRACE Module acapy_agent.protocols.endorse_transaction is already loaded
2024-11-07 19:55:39,251 acapy_agent.utils.classloader TRACE Adjusted mod_path for relative import: .definition
2024-11-07 19:55:39,251 acapy_agent.utils.classloader TRACE Resolved full module path: acapy_agent.protocols.endorse_transaction.definition
2024-11-07 19:55:39,251 acapy_agent.utils.classloader TRACE Module acapy_agent.protocols.endorse_transaction.definition is already loaded
2024-11-07 19:55:39,251 acapy_agent.core.plugin_registry TRACE Loading versioned events for: acapy_agent.protocols.endorse_transaction
2024-11-07 19:55:39,251 acapy_agent.core.plugin_registry TRACE Loading events from: acapy_agent.protocols.endorse_transaction.v1_0.routes
2024-11-07 19:55:39,251 acapy_agent.utils.classloader TRACE Attempting to load module: acapy_agent.protocols.endorse_transaction.v1_0.routes with package: None
2024-11-07 19:55:39,251 acapy_agent.utils.classloader TRACE Resolved full module path: acapy_agent.protocols.endorse_transaction.v1_0.routes
2024-11-07 19:55:39,251 acapy_agent.utils.classloader TRACE Parent module path: acapy_agent.protocols.endorse_transaction.v1_0, Module name: routes
2024-11-07 19:55:39,251 acapy_agent.utils.classloader TRACE Attempting to load module: acapy_agent.protocols.endorse_transaction.v1_0 with package: None
2024-11-07 19:55:39,251 acapy_agent.utils.classloader TRACE Resolved full module path: acapy_agent.protocols.endorse_transaction.v1_0
2024-11-07 19:55:39,251 acapy_agent.utils.classloader TRACE Module acapy_agent.protocols.endorse_transaction.v1_0 is already loaded
2024-11-07 19:55:39,251 acapy_agent.utils.classloader TRACE Adjusted mod_path after loading parent: .routes
2024-11-07 19:55:39,251 acapy_agent.utils.classloader TRACE Finding spec for module: .routes with package: acapy_agent.protocols.endorse_transaction.v1_0
2024-11-07 19:55:39,251 acapy_agent.utils.classloader TRACE Importing module: .routes with package: acapy_agent.protocols.endorse_transaction.v1_0
2024-11-07 19:55:39,255 acapy_agent.core.plugin_registry TRACE Registering events from: acapy_agent.protocols.endorse_transaction.v1_0.routes

You can see that's why I didn't want it at debug level, because it'll clutter the existing, expected logs for debug.

@dbluhm
Copy link
Contributor

dbluhm commented Nov 12, 2024

So, at first I wanted to scrap the loguru idea, and just go for this trace addition... Buut, what do you think? Any potential appetite for switching to loguru? It'll take some more time, but I can take a swing at it.

I'm not opposed to the idea in principle but definitely agree that it could be a bit of a messy change to integrate with ongoing work streams.

How would this affect plugins? Would plugins also have to use loguru or is there a way to get loguru to integrate with Python's builtin logging?

Are there any performance implications to loguru over builtin logging module?

I think I'm also still a bit on the fence about the idea of a trace log level to begin with. Obviously it's not unheard of to have a level below debug if other loggers like loguru support it. Several of our rust based dependencies like Askar and Indy-VDR also support a trace log level. In the case of those dependencies, though, the inspect-ability of those dependencies is really low because they're compiled libraries. Python, on the other hand, being interpreted, it's pretty trivial to step into a debugger and see exactly what is going on when necessary. I'm a bit skeptical of the value of having the intense kind of detail being emitted to the trace level in these changes.

I think we can definitely up our logging game in ACA-Py but the kind of change I think I'm more interested in is probably more effective/judicious use of the INFO level. 90% of the time when I'm scanning through logs, I'm looking for what messages were sent and received or what Admin API requests came in and their responses. In other words, I usually want to filter through less information but I have to use debug level right now or I don't get what I'm after.

@swcurran swcurran requested a review from dbluhm November 12, 2024 17:03
@ff137 ff137 marked this pull request as draft November 12, 2024 18:02
@ff137
Copy link
Contributor Author

ff137 commented Nov 12, 2024

@dbluhm That makes sense. I think the trace level can be omitted... and instead, just add some sensible debug logs, and start the initiative to move more of the useful logs to info level.

The primary motivator here was to debug an issue with an agent not starting up at all. It was from testing out a fix for the RuntimeWarning from the IndyVdrLedgerPool (#3328), and there were no debug logs or errors to indicate what was causing the agent to hang.

Then I realised there is actually quite a lot of steps in the startup logic with no logging whatsoever. So, I just went at it, adding some sensible logging steps throughout. But, that made debug logging too verbose ... and figured, that is what trace is for.

Fortunately, I've already achieved the goal of isolating what was causing the issue. So, I'd say this particular way of implementing trace logging is optional. And, Loguru should be the preferred way to implement trace functionality.

I do think Loguru is worth it. The performance improvement is probably negligible (until they implement it in C), and for plugins it shouldn't be any issue.

I'm imagining we can define a central, configured logging instance, similar to how we've done here - and then all cases of

import logging
LOGGER = logging.getLogger(__name__)

can just be replace-all'd with

from acapy_agent.config.logging import get_logger
LOGGER = get_logger(__name__)

Then everything else remains the same.

Using standard python logging would still be possible - it just won't benefit from the loguru features. So it would just require letting all the devs know, hey, use get_logger instead.

I think it's one of those "nice-to-haves" that once it's in place, it's a godsend, and makes developing so much easier -- lazy logging, context binding, colorized logging, and much easier configuration. There's just the effort in proving that it's a seamless shift, and that all existing logging configuration can be ported to the new way of doing things. Difficult to forecast how much effort that'll be. But I can try chip away at it, sometime over the next several weeks.

So, over the next while I'll try clean up this PR to just contribute some reasonable addition to the startup logs, and if it proves seamless enough, loguru can come later :-)

@ff137 ff137 changed the title ✨ Add TRACE log level and improve logging in core components ✨ Improve logging in core components Nov 25, 2024
@ff137 ff137 marked this pull request as ready for review November 27, 2024 11:32
Copy link

sonarqubecloud bot commented Dec 5, 2024

Quality Gate Failed Quality Gate failed

Failed conditions
0.0% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

@ff137
Copy link
Contributor Author

ff137 commented Jan 21, 2025

I'll wrap this one up sometime soon. I think these are worthwhile changes, which we've pulled into our agent. It helps identify some potential improvements, which, unresolved leads to some noisy debug logs. Will see if I can improve on that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Noisy logs
3 participants