Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazily initialize the internal CloudWatch Logs client to avoid a deadlock when using log4net #269

Merged
merged 1 commit into from
May 22, 2024

Conversation

ashovlin
Copy link
Member

@ashovlin ashovlin commented May 21, 2024

Issue #, if available:

#253, aws/aws-sdk-net#1968, DOTNET-5636

Description of changes:

There's a possible deadlock when using:

  1. AWS.Logger.Log4net
  2. AWSSDK.Core >= 3.7.1 (which introduced a configurable IMDS URI)
  3. Either EC2 IMDS credentials or no credentials

I wrote a longer description with stack traces in aws/aws-sdk-net#1968 (comment), but in summary:

  1. The application thread calls LogManager.GetLogger to activate our AWSAppender. This gets this lock in log4net's DefaultRepositorySelector, then initializes the internal CloudWatch Logs client. The client initialization needs to access the configured retry mode here, via the static FallbackInternalConfigurationFactory.RetryMode.
  2. AWSLoggerCore loads credentials which potentially kicks off this timer in DefaultInstanceProfileAWSCredentials. This starts a new thread for loading the IMDS credentials. As of AWSSDK.Core 3.7.1 when the IMDS endpoint became configurable, this relies on the static FallbackInternalConfigurationFactory.EC2MetadataServiceEndpoint for resolving the IMDS endpoint here. The static constructor for FallbackInternalConfigurationFactory will create EnvironmentVariableInternalConfiguration instance, which gets its own logger here.

This deadlocks when the static constructor for FallbackInternalConfigurationFactory is run on thread 2 because:

  1. Thread 1 is waiting on the FallbackInternalConfigurationFactory static constructor to finish while holding the log4net lock.
  2. Thread 2 is running the FallbackInternalConfigurationFactory static constructor, but is blocked on thread 1 when creating additional loggers inside the static constructor.

By delaying the initialization of the internal CloudWatch Logs client, we allow thread 1 above to finish before thread 2 starts.

Testing

  1. Existing test cases pass
  2. Manually tested with the application https://github.com/alverdal/awslog4net-example via logging via AWS.Logger.Log4net.AWSAppender hangs web app aws-sdk-net#1968 (comment)

Alternatives I considered:

  • We could remove logging from FallbackInternalConfigurationFactory - but I think it's useful, and this would be hard to enforce this permanently.
  • We could remove the dependency from DefaultInstanceProfileAWSCredentials on FallbackInternalConfigurationFactory, and have it resolve the IMDS itself from the env var or profile without FallbackInternalConfigurationFactory (and its logging) - again I think the logging is useful, and like keeping the configuration loading + caching centralized

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@ashovlin ashovlin requested review from normj and 96malhar May 21, 2024 16:34
@ashovlin ashovlin changed the base branch from master to dev May 21, 2024 16:34
@ashovlin ashovlin force-pushed the shovlia/lazy-client branch from 9227fb3 to 8c166e9 Compare May 21, 2024 16:40
Copy link
Member

@normj normj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is basically since the SDK itself will also attempt to use Log4net loggers for logging we need to avoid using the SDK until after the Log4net's initializing phase is done. In this case that means not creating clients as part of the appendders being configured for Log4net in its init phase.

@ashovlin ashovlin merged commit be4d1b6 into dev May 22, 2024
2 checks passed
@ashovlin ashovlin deleted the shovlia/lazy-client branch May 22, 2024 19:47
@ashovlin ashovlin mentioned this pull request May 22, 2024
ashovlin added a commit that referenced this pull request May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants