-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow_Overwrite option in History does not do what name and code suggests #2653
Comments
What can be done?
|
@weiyuan-jiang The issue just more basic. History specifies a file path. We have no control over what someone may or may not have put in the path. If the user does something silly like put a file there that's wrong/empty, whatever, we can't control that. What we an do is this in a single execution:
If we are talking multiple executions I don't think we can/should do anything to allow the same file to be written to in multiple executions. This just gets dangerous/difficult |
@bena-nasa I'm concerned that we should be able to support a collection that has say 4 time slices per day but the run executes just 12 hours in a segment. Granted this is rare at the moment, but I would not want to toss out the GEOS use case just to support external people. |
@tclune, I don't understand the comment or what you are getting at here . What you are describing is no use or capability History currently has... Right now if you are asking for output every 6 hours, but set the duration 24 hours, and you only run 12 hours in the segment, like say you start at 0z, you would get the 6z and 12z output in the file. What does not work is if you start at 0z, stop at 12z, in the above scenario (6 hour frequency, 24 hour duration), we've never supported appending to an existing file or if it somehow worked it was just dumb luck. Nor is anyone actually trying to do this as far as I know. Are you thinking how you want this work in the new History in 3G perhaps...? |
I had assumed your intent was to extend the file during the next segment. (Regardless of how it may or may not work now.) |
I see two problems with not supporting this "extend the file":
|
@tclune. Yes your point 2 is right, but I don't really have an intent here beyond what was in the original issue. Extending files across segments is not what I was worrying about when opening the issue. |
In theory (2) could be handled by History refusing to run if it sees that a subsequent continuation segment will reuse the same file name. Force the user to either align History with segments or ensure segments are longer than any specific file duration. (Except things that will only have one slice of course.) |
TLDR;
Allow_Overwrite, is NOT clobbering an existing file if it exists despite what the code in History would seem to imply should be happening.
Long Version
The UFS model, which uses GOCART and MAPL, exposed or re-iterated a long standing issue with the interaction History and PFIO which was causing the UFS folks an issue we were asked to look at. The fundamental issue is, how does the server know when to create a file or open an existing file and if opening an existing file, how does it know it's actually "good"? Also how does the server know if you really do want to clobber an existing file rather than append if the file exists? Spoiler, it can't.
The problem arose when UFS was to my understanding, before running the executable, either creating a symlink of an empty file in the path History will want to write or perhaps that symlink might have been broken. Either way it is bad. Let's review the current behavior.
In History.
In History there is a global
Allow_Overwrite option
. It does 2 things.First if set to true. It sets the mode of the collection to clobber. The problem is that the server the way it is designed, it just cannot clobber an existing file, so it is pointless to set this. And the client may not even transmit this to the server.
Second function, every time history decides to write, it first checks, is this a new file, if so it generates the filename, or is the same file because I'm appending multiple time steps to the file.
Either way when it is time to write to a NEW file, History does an
inquire
on the file path. If the file exists, one of 2 things can happen. Either we abort or we go on based onAllow_Overwrite
(default false). This is basically a safety to check so we crash if the file exists, or just go on depending on whatAllow_Overwrite
is set to.Note that that apparently a broken symlink returns false to an inquiry of its existence so History will always get past this point if a broken symlink were somehow created before execution.
Now if we have gotten to past this point, History makes the request to write to a file path at at given time index with no further checks and no instructions such as whether to open or clobber/re-create are passed to the server.
On the Server Side
Now consider what happens on the server side. The server just know the client requested the filename and time index to write to, but that is it.
So now the server does another
inquire
to see if the file exists. If the file exists, currently it assumes the file is "good" in that it has been created and populate with the right metadata. It simply opens the file, and assumes it can do a PUT_VAR to that file. Most likely this would be because has requested multiple time slices to a file so the file was previously created in the current execution.So now we see one problem. In the case where there were empty symlinks, and History was set with Allow_Overwrite to true, it just tries to write to the file which of course fails. In general if the server had not previously created the file with the metadata consistent with the request it will clearly fail and we have no mechanism to check this, if it is even possible to do robustly.
A second problem. What if the user on the client really did want to clobber any file that was there. This just won't happen with the current logic.
The DAS exposed another example where the same file was created in a previous execution without a variable, then on the next execution if the file is not moved away, the write failed when trying to write that variable that did not exist in the file left sitting there.
Now suppose the file does not exist. The server the simply tries to create the file with a NO_CLOBBER option. If something weird like a broken symlink were given, netcdf is smart enough to fail at this point. This seems to be what was happening with UFS.
Clearly there a a lot of ways to make things go haywire were if you are not careful as the examples referenced here demonstrate.
What can be done?
At least within a single execution the server could track if a file has already been created by the server. Therefore if it gets a request to write to file it knows it did not create that execution, it could then make some decisions how to handle this (clobber it or fail).
We also should allow a file to be clobbered if that's what the client side said to do.
If we want to allow a file to be appended to between executions, this gets harder and is beyond the scope of this and is another whole disucssion.
Links to old issues
Here are links to issues that are related to this.
#1620
GEOS-ESM/GEOSadas#188
#2638
The text was updated successfully, but these errors were encountered: