Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix general observations fmuobs for dictionaries #648

Closed

Conversation

daniel-sol
Copy link

@daniel-sol daniel-sol commented Jan 12, 2024

General observations are not written when choosing yaml option, this PR Fixes this
Fixes issue #351

Updated info 18/1 2024.
Some background info. The general observations are different to the observation types in that they are not directly contained in the observation file, but contained in separate files only referenced in the observation file. This is seemingly relatively straight forward, with these files being simple text files containing two columns (without any header). The first being the observation value, and the second being the value error, or uncertainty, in one file generally called something.obs. Here a general observation line contains one reference to a file, and this is then treated as one atomic observation with a lable, corresponding roughly to summary observation.

But there is not really any proper datastructure behind this. This is clearly seen in the case of rft data which deviates strongly from this pattern, here to get the complete picture of what is going on there are are three files that need to be accessed per observation:

  1. The observations, contained in a file with value and error pairs, e.g. observation.obs
  2. The spatial assignment to the observations, in a twin file called observations.txt i.e x y md tvd zone (But no headers)
  3. The name of the well contained in a file generally called well_dates.txt, but could be called something else entirely, see the some_field rft observations contained in the somewhere/completely/different/rft_ERT_use_MDadjusted/ folder.

There is really no logic, all this is patched together in an gen_data job in the ert config file.
To make it worse general observations can be used for absolutely anything, but there is nothing hindring people from bunging everything into the same folder as the observation file, but the convention, but not requirement from ert's side is that these datatypes are group into folders, with sometimes intelligible names. This straw to clutch is in this PR used as the key, because without it one would be completely in the dark for what the general observation is.

Hope this was somewhat informative, please get in touch if you want a chat..

@codecov-commenter
Copy link

codecov-commenter commented Jan 12, 2024

Codecov Report

Attention: 12 lines in your changes are missing coverage. Please review.

Comparison is base (6e675c1) 86.13% compared to head (d43c9d0) 86.28%.
Report is 4 commits behind head on main.

❗ Current head d43c9d0 differs from pull request most recent head 6a4c3a0. Consider uploading reports for the commit 6a4c3a0 to get more accurate results

Files Patch % Lines
src/subscript/fmuobs/gen_obs_writers.py 92.12% 10 Missing ⚠️
src/subscript/pack_sim/pack_sim.py 60.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #648      +/-   ##
==========================================
+ Coverage   86.13%   86.28%   +0.15%     
==========================================
  Files          50       51       +1     
  Lines        7069     7234     +165     
==========================================
+ Hits         6089     6242     +153     
- Misses        980      992      +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@daniel-sol daniel-sol force-pushed the fix_general_observations_fmuobs branch from 0cf4cf3 to 4c777e8 Compare January 12, 2024 14:42
@mferrera
Copy link
Collaborator

This PR adds a significant amount of text test data. It is a bit hard to tell because there are quite a few files, but are all of them needed?

@daniel-sol daniel-sol force-pushed the fix_general_observations_fmuobs branch from 993e3b5 to cf974dc Compare January 17, 2024 14:29
@daniel-sol
Copy link
Author

This PR adds a significant amount of text test data. It is a bit hard to tell because there are quite a few files, but are all of them needed?
Most of the files are needed, but I will remove some of them, e.g the files in folder welltest. They are not used in the testing. Regarding the other files I would rather keep them for some granularity, but just get in touch if it is unclear what they do..

@daniel-sol daniel-sol force-pushed the fix_general_observations_fmuobs branch from cf974dc to 6cf15a0 Compare January 19, 2024 10:10
Copy link
Collaborator

@alifbe alifbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @daniel-sol

Instead of assuming observation type based on folder name, we would recommend to probably let user specify it directly. It could be in the form of dictionary file that map observation key to observation type. Or perhaps a comment in ERT observation config. This will ensure that there is no guessing which might be prone to error.

The code can be extended to give warning when observation is not defined in the dictionary or vice versa when it exist in the dictionary but not in the ERT observation file.

@daniel-sol
Copy link
Author

daniel-sol commented Jan 26, 2024

Hi @daniel-sol

Instead of assuming observation type based on folder name, we would recommend to probably let user specify it directly. It could be in the form of dictionary file that map observation key to observation type. Or perhaps a comment in ERT observation config. This will ensure that there is no guessing which might be prone to error.

The code can be extended to give warning when observation is not defined in the dictionary or vice versa when it exist in the dictionary but not in the ERT observation file.

I am a bit reluctant to do that. I do see your point, but one of my philosophies is to not add too much to the burden of the end user. This would be one of those things, what we could to is to introduce something an option to add something to your fmu-config if you want to fix it? But I don't want it to be a requirement.

See last commit. Where I have added the possibility of changing names from the fmu_config, and I have also added that if a user hasn't put the file in a folder it will end up with a key called unspecified, and with a strong warning..

Fix after deprecated types are removed
@daniel-sol daniel-sol requested a review from alifbe January 26, 2024 14:49
Missing ; made testdata incorrect
@asnyv
Copy link
Contributor

asnyv commented Jan 30, 2024

Agree with @alifbe that the test data here seems way larger than needed. Most of it seems like with repeated lines of the same formatting, and/or several files of the same type? Has to be possible to cut these drastically?
In most cases it should be possible to test the same thing with 5 lines of test data as with 15000 lines.
The length of the data also makes it cumbersome to look at whether the setup seems reasonable.
Think we should also test this locally with as many different models as possible prior to merge to see if the current solution is robust.

@alifbe
Copy link
Collaborator

alifbe commented Feb 2, 2024

@daniel-sol could you please rebase your PR to main branch? I think the workflow is not triggered anymore since your PR went stale.

@alifbe alifbe requested a review from rnyb February 2, 2024 09:14
@anders-kiaer
Copy link
Collaborator

Will have a chat with @sondreso on the GENOBS topic in February. Suggestion from Daniel and myself is that we wait with merge + adding GENOBS support until after that meeting 🙂

@anders-kiaer
Copy link
Collaborator

anders-kiaer commented Mar 20, 2024

FYI: Summary of yesterday's workshop (which included representatives from ERT, fmu-dataio, SUMO, webviz and subscript) on observations is summarized in #683 by @daniel-sol and myself.

@anders-kiaer
Copy link
Collaborator

This PR will be superseded by a new PR currently in the making.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants