-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MM-61881: Use os.MkdirAll, which is thread-safe, to ensure Terraform state dir #849
Conversation
The same race condition does happen in the implementation of os.MkdirAll[0], but that code there takes it into account and re-checks the second error (the one on Mkdir), and it ends up returning nil if the directory is indeed created, thus avoiding the race condition and making os.MkdirAll thread-safe. [0] https://github.com/golang/go/blob/e33f7c42b084182a3a88ef79857e33c11627159a/src/os/path.go#L19-L66
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool!
Yeah, the idea with comparison was to have completely decoupled state paths to avoid this problem altogether. But I guess we rely on the same parent directory. |
@streamer45 The states are still different: they're different files, suffixed by an index. But the containing directory is the same, yes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be technically pedantic, I think the FS should generally ensure concurrency safety of these operations. This is more of a high-level race condition caused by applying a set of thread safe operations in a non-atomic way.
Not sure I understand your comment. What was non-thread safe (and now is) is the |
I mean, there was nothing strictly unsafe about |
But getting an error due to the order of potentially concurrent operations is a race condition! 😂 (I will stop discussing this here, just wanted to force you to be even a bit more pedantic 😛) |
lol, I guess it's getting into semantics. From a high level, I agree with you that using I guess I (personally) can't fully consider at the same level a function that returns a proper error vs one that causes data corruption or undefined behaviour. Good discussion though, especially before bed time 👍 |
All that Rust experience is getting into your head Claudio. It's not healthy. |
The same race condition does happen in the implementation of os.MkdirAll[0], but that code there takes it into account and re-checks the second error (the one on Mkdir), and it ends up returning nil if the directory is indeed created, thus avoiding the race condition and making os.MkdirAll thread-safe. [0] https://github.com/golang/go/blob/e33f7c42b084182a3a88ef79857e33c11627159a/src/os/path.go#L19-L66
Summary
As explained in the ticket, we concurrently run every deployment inside a
comparison run
. Each of those deployments callterraform.New()
, which in turn callsensureTerraformStateDir
. This function is not thread-safe, since it first callsos.Stat
to know whether the directory exists, and if it doesn’t, it then callsos.Makedir
. It can happen that:os.Stat
and sees the directory doesn’t existos.Makedir
, and succeeds, creating the directoryos.Makedir
, and fails, since the directory already existsTechnically, the same race condition does happen in the implementation of os.MkdirAll, but that code takes it into account and re-checks the second error (the one on
Mkdir
), and it ends up returningnil
if the directory was already created, thus avoiding the race condition altogether and making it thread-safe.Also, the code is way cleaner this way, of course :)
Ticket Link
https://mattermost.atlassian.net/browse/MM-61881