-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reformat BIOS ancillary information #281
Comments
@abhaasgoyal Do we want to use this issue to document the work for the gridinfo file? It probably needs some updating with our current plan. |
This work will be performed using the act9test-serial-no-luc test case. We will work from the CABLE-POP_TRENDY branch directly. We need a first control run using the original binary inputs. As we convert inputs to netcdf format, running the same test case should give us identical results. |
Current implementation plan:
@SeanBryan51 could you check whether there was anything to add there |
This looks good to me. In step 1 we can do a regression test to test the input files were converted correctly to NetCDF and in step 2 we can do a regression test on the combined input files (the gridinfo file). We will have to look more into step 3 and make sure values initialised from the gridinfo are not being overwritten by other sources (e.g. a restart file). |
@abhaasgoyal I'm happy to look through the code with you about the precedence of inputs. We might be able to get the answer quickly. Doing it with you allows you to understand where it comes from. |
Requirements:
Context
|
I'm currently looking into the regridding of a
|
Question 1: Place the centres to the offset point by half the resolution. This means we avoid having to deal with half gridcells in latitude at the poles, since the cell boundary lies on -90 and +90°. And that simplifies the regridding (see response to 2 below). Question 2: Going with the answer of question 1, all the grid cells in the new grid at 0.05° will fit entirely within a grid cell at 1°. So we can use nearest neighbour for all interpolation. All the quantities are given per m2 when they depend on the area. Obviously, if a new grid cell straddled several grid cells that would be more difficult. |
Currently we have 4 functions to convert reading BIOS inputs in binary form to NetCDF:
/g/data/rp23/experiments/2024-04-17_BIOS3-merge/BIOS3_forcing/gridinfo_inprogress/convert_gridinfo_Aust.sh We see that the script preprocesses the input doing the necessary calculations. We plan to work with reading ancillary data purely from NetCDF files one function at a time considering precedence. Currently going for soil parameters in To check precedence (in increasing order), currently, the order of reading parameters is:
Now, putting all ancillary variables into grid info is means that the metfile would have the highest precedence. We want to keep these values only when the parameters in met forcings are better. Hence, we decided on having an additional flag for reading Before further updating the issue, @har917 I had some questions: Q1: There is a comment in |
A couple of comments. most importantly: I suspect that On 1. BIOS requires some additional gridded inputs - it is a software engineering choice as to whether these should form part of a single BIOS_gridinfo file or as a separate netcdf input. My position is that it is better to combine into one for input management purposes (other applications of CABLE won't necessarily have to read in the additional layers so it shouldn't break them) - however this may come down to exactly where those inputs are read in by the code. Currently the separate files are read in at separate locations to the gridinfo - it may be better to retain that separation and retain separate input files. On 2. I would hope that we can avoid having to have additional QA steps on the inputs inside CABLE. It creates confusion/lack of transparency - these additional QA steps are better placed as comments in the netcdf metadata in my mind*. On 3. I don't have good visibility of what gets written into the restart file in terms of ancillary values. I wouldn't be surprised if none/not all of the BIOS gridded ancillaries make their way into the restart (NB there are gridded ancillaries and parameter values derived from gridded ancillaries to be aware of - especially in CASA). I think I would prefer if values in the restart were to take precedence over the ancillary data in the case of a difference (there are some use cases where it could be useful to go the other way round - think parameter estimation, predictability or sensitivity studies - but I think those can be handled easily outside as a preprocessing step on an existing restart) Perhaps the way forward - in the longer term - is to implement a check with a warning to log if a difference is found? *Having said that we've gone and added some new ones recently due to path-of-least-resistance issues. |
On 1:
What are you worried about? Or why do you need that capability? For the moment, we have identified the restart and the met file as potential alternate source for the data that are read after the gridinfo file but before the BIOS reading call. Ian, you agree that the restart should take precedence so changing the behaviour here would be ok. I would like to try and read the gridinfo file in one location and see if we get the same values in the corresponding CABLE variables as when reading the BIOS files. If we don't get the same values throughout the code then we know something else is overwriting these values and we can hunt down what is happening. If we get the same values, then we don't need to complicate things. On 2: great, we agree. All pre-processing in input file creation. @SeanBryan51 Ian above is saying BIOS is using a gridinfo file at 0.05°, you talk of starting from a 1° gridinfo file. Need a check? |
I don't think we need that capability per say. However, changing where/when the code is currently reading inputs risks mixing up the process of converting the inputs, and reading them in, and (in this case) porting them around the code base. @SeanBryan51 For clarity the BIOS gridinfo file referenced is at 0.5 degree resolution - it maybe global (land north of >60S) - and has been set up to provide default initial conditions for a 4 tile version of CABLE (primary forest, secondary forest, grass, crop). The usual gridinfo file provides default initial conditions for 17 tiles (if memory serves). |
@abhaasgoyal here is the script I used to generate the 0.05 degree gridinfo: https://gist.github.com/SeanBryan51/f89a30ee1e22495965dccb8fdff0f8c4 The gridinfo file I used for the input is the same as what is used in the |
@SeanBryan51 I had a look at the regridding script for gridinfo. It looks alright. I've tried the script on the 0.5° gridinfo under I had a side-by-side look at the data from the original gridinfo and the new one. I selected the same region on the new one first and then plotted with ncview. All the variables seem to have the same values and same patterns. The patchfrac variable is extremely weird but that's true of the original data. The files differ because the regridding adds:
|
@ccarouge Thanks for trying out the regridding script. I have tested CABLE (act9test-serial-no-luc) with the 0.05 degree gridinfo. Model output using the new gridinfo is bitwise identical with the output from the original configuration (provided we ignore the Footnotes
|
@ccarouge @SeanBryan51 - good to hear of progress. The |
@ccarouge Some of the reads are done into the LUC code, where do we want to put these layers? Identify what is read. |
@ccarouge I think I need a bit more detail as to the question here. However, we have to be careful about interpreting routine names - this is because the PFT distribution and land use change code is interwoven (and somewhat oddly named) Quite where the actual read should take place is a different question - I would be looking to see how TRENDY_POP does this (or similar) and follow the same pathway. This may require carrying some irrelevant layers (filled with zeros) around the code base for non-BIOS simulations - but that's probably better than having multiple reads of the same file. |
Thanks Ian. Looking at the details here is what I have:
So we would want these 2 BIOS layers to get into the ClimateFile but I'm guessing this file is created by CABLE in the "do climate" phase? Is that correct @har917 ? Or would we want a way to overwrite the data from the ClimateFile? The handling of fracC4 (why the variable for the filename in the file is C4frac_file 😢 ) is also more problematic as it's not the same variable as what POP_TRENDY uses. I would think we could read fracC4 or mtemp_min20 depending if bios is on or off? The question is whether the ClimateFile should have both fields in or whether we should have 2 files: one for POP and one for BIOS? more details on the differences between fracC4 and mtemp_min20Both these variables are used to determine whether the grass at a point is a C3 grass or a C4 grass (they have different photosynthesis). CABLE/offline/cable_LUC_EXPT.F90 Lines 650 to 654 in 7326db3
while mtemp_min20 is used here: CABLE/offline/cable_LUC_EXPT.F90 Lines 575 to 579 in 7326db3
|
I'm looking into adding the BIOS ancillaries1 to the 0.05 degree gridinfo file and it looks like there is a slight mismatch in the lat lon grids between the gridinfo and the BIOS ancillaries:
To combine the ancillary inputs with the gridinfo file, should we remap the BIOS ancillaries to the grid defined in the gridinfo file? In this case we require the BIOS ancillaries to be defined on the full Australia domain (i.e. Footnotes
|
I was overthinking it, the missing grid points in the BIOS ancillaries don't contain any land so we can set the value of each variable to its missing value over these grid points before remapping to the gridinfo grid. |
@SeanBryan51 quick responses
We may also have to look into the binpackandsample routine to see if this does different things for different variables/input layers. |
@ccarouge referring to ^^^^ You've picked up a fundamental difference in how POP_TRENDY and BIOS work - and it complicated because of the multi-stage aspects to POP runs. This likely requires a specific CASE() to be devised during this merge.
Since mtemp_min20 and POP_TRENDY biome are climate related these vars get put into the ClimateFile not in the gridinfo file. I think my position on the way forward is (but I will have to check the sequencing) that
There is a bit of uncertainty in my head as to when these variables come from the ClimateFile and cable_bios_load_biome()/cable_bios_load_c4frac() vs when they come from one of the restart files (likely POPLUC). It could be that it's a case that i) read from ClimateFile or via MVG/C4frac then ii) if found in the ClimateFile overwrite. |
Once we have a gridinfo file and the BIOS information on the same grid (without the half-grid cell shift), do we need to ensure all the data is using the same land mask? This means a grid cell is land if both the original gridinfo and the BIOS variables have valid data for that point. That would mean the BIOS data would lose some islands and some precision in some caps. Do we want this to happen @har917 ? @har917 Do you have some output from an Australian-wide BIOS run? Just to check what the coastline and islands look like. Or @AlisonBennett . This is just to see the output grid, we don't care about the variables and values in the file. |
@ccarouge As far as I know we have never run BIOS at the full 0.05 degree resolution (we have run AWAP using most/all the same inputs). This is mostly because of resourcing/set up - in that I understand it's impossible to run BIOS on gadi with enough memory/run time. From memory/discussions with Peter Briggs - the model is fairly ruthless in that it won't run if any of the inputs aren't valid (but I don;t know the detail as to how it is does this in practice). The main cause of concern wasn't islands/coasts but incland areas (lake Eyre) where the input soil ancillaries were/area incomplete. We could try run to a run a |
|
IF I remember correctly the current approach produces NaNs I wouldn't use the default gridinfo - that will be inconsistent (especially if we get some soil parameters from BIOS and the other gridinfo). What maybe better to do is gap-fill using nearest neighbour from the BIOS data and somehow note which cells have been filled. |
We should try and quantify the impact of changing the ancillaries due to the derived_parameters bug. |
Update: I have managed to bitwise reproduce the output created with a gridinfo containing only the BIOS |
Update: there was a bug in regrid_aus_05x05_to_005x005.sh where I was applying the wrong limits for the |
gridinfo file is ready with the BIOS information in it. Created from the regrid_aus_05x05_to_005x005.sh script. Run Australia-wide at 0.25 resolution or a 1000 points simulation for 1 year ?? See how to do it. |
I did some quick testing of the BIOS gridinfo file for the 1000pts config with the BLAZE_9184 branch (commit 7dead54). CABLE errors due to changes in the number of patches and is incompatible with the existing restart files. My guess is this is likely due to the half grid-cell misalignment between the gridinfo grid and the BIOS grid resulting in ambiguous nearest neighbour interpolation of the gridinfo file within CABLE. It was probably pure luck that the shifted gridinfo file was bit reproducible for the act9 configuration. I'm thinking the next steps around this would be to apply a fix to the nearest neighbour interpolation of the gridinfo file in CABLE such that we have bit reproducibility before and after for runs using the shifted gridinfo file. Context: earlier on when I was looking into how best to shift the gridinfo file to the BIOS grid, I found running CABLE with a shifted gridinfo file (in arbitrary directions in |
It would be good to dig into this a little more deeply - is this because the half-grid move means that we get changes on the coast where we have meteorology and all soil parameters available? Or has something more fundamental happened that means were getting different number of patches for grid cells in the interior? The first case should be relatively easy to handle since the input meteorology files are more extensive than actually provided into the binary files - if the latter then this is more problematic (and I can't think why it could happen) The way to test would be to run without using the CABLE restart file - and let the model figure out the number of patches it wants to use and then compare. |
@har917 Here is an example to explain what happens with the grids. The gridinfo at 0.5° has these grid cell centres for latitude (similar story with longitude so just showing one here): The BIOS grid has the following grid centres: Now the way this works in CABLE:
We end up with a random choice of gridinfo grid cell for these BIOS grid cells. Some might take the point that is to the East or North-East or South or West etc. When we regrid the gridinfo file onto the BIOS grid, we have to choose one direction to shift the map towards. It will be the same direction for the whole grid, not a random direction that is different for different points like in CABLE. As a result, we end up with some points using the information from the gridinfo file from a neighbouring grid cell in the new setup compared to the old which can have a different number of tiles. And this can happen randomly throughout the whole map. |
For BIOS (i.e. with POP active) I don't think this is true - at least in the absence of land-use change. BIOS only ever has (up to) 3 tiles per grid cell and this decision (i.e. 2 or 3 tiles per grid cell) is independent of the soil ancillaries. Coastal cells/soil ancillary gaps are a potential counter example - since if there are no valid soil ancillaries at the grid cell, BIOS will remove those tiles from the vector of land points even if there is valid meteorology available (or it will crash in a different way). Shifts in the gridinfo could then lead to a different number of tiles/land points. However - if we are testing with a non-POP configuration then, I agree, shifts can change the number of tiles. |
Update: I ran
Normally, this should not create any difference in the results since the new gridinfo should have the same ancillary data after processing inputs from binary files in
We suspect that it may be a bug in BIOS due to CABLE/offline/cable_parameters.F90 Line 1343 in e190b23
Further investigation is needed on whether this is a bug in BIOS. Once this is fixed, we can have assurance on values contained in the new gridinfo file. Addendum: I also ran the |
@abhaasgoyal I think this is a matter of definition - it appears that the gridinfo file has been set up to specify -sucs whereas the BIOS binary and the methods that specify these parameters via namelist give sucs itself. I didn't spot that change in sign in All very confusing - especially when you bring the vectorised soil properties (i.e. the I'm a bit surprised that the -ve sign didn't show up in other testing but that maybe because BIOS uses a set of science options where |
After checking, this is because the gridinfo file expects positive values for So, @abhaasgoyal is going to modify the creation script for gridinfo to swap the sign of the input before writing to gridinfo. It seems like a lot of sign changes applied on top of each others but we can't remove that from CABLE as that would break things for all the other gridinfo files in use. |
Update: After updating the However, running the
I'll attempt to debug this on whether the ancillary inputs from binary files vs new gridinfo were processed to be the same, and where it differs in the pipeline. Meanwhile, @AlisonBennett if you could explain how the binary files for |
@abhaasgoyal Since these differences have shown up in an albedo field on the first time step and only in some of the 1000 points - I'm fairly confident that this has something to do with how the different approaches have established some of the parameters (i.e. there's been a slight effective grid shift when establishing the 0.05 degree gridinfo file). I would look at the soilalb in the first instance. |
@har917 We are trying to trace the error by comparing binary ancillary data in 1000pts configuration ( |
Basically we don't/wouldn't - if we (@AlisonBennett and @har917) were to need netCDF files we would go back to the source flt/hdr files that exist on CSIRO's HPC [the binaries are the end of a 1-way chain of processing]. If needed we can rsync the flt/hdr files across to rp23 (and hopefully somewhere is a record of exactly the gdal_translate call that was used) |
To further @har917's comment, once the flt/hdr files are synced across to rp23 they can then be converted to netcdf using gdal_translate.
…________________________________
From: har917 ***@***.***>
Sent: Monday, 16 September 2024 17:56
To: CABLE-LSM/CABLE ***@***.***>
Cc: Bennett, Alison (Environment, Aspendale) ***@***.***>; Mention ***@***.***>
Subject: Re: [CABLE-LSM/CABLE] Reformat BIOS ancillary information (Issue #281)
How would you convert the files in recap1000pts/params to NetCDF?
Basically we don't/wouldn't - if we ***@***.***<https://github.com/AlisonBennett> and @har917<https://github.com/har917>) were to need netCDF files we would go back to the source flt/hdr files that exist on CSIRO's HPC [the binaries are the end of a 1-way chain of processing]. If needed we can rsync the flt/hdr files across to rp23 (and hopefully somewhere is a record of exactly the gdal_translate call that was used)
—
Reply to this email directly, view it on GitHub<#281 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AIL5VYX2P767F3CJV4DVX4DZW2FKFAVCNFSM6AAAAABHEZJNSKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJSGIZTOOBYG4>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Comment on how the .bin files for subdomains are created from the aust_0.05_pt .bin files. We use a Fortran utility (created by Peter Briggs) called BinFileSubsetAndPack. A copy of this program is stored at: Within the sitelists subdirectory there is are csv files that contain the list of points for each of the domains. |
Update We found out that the latitudes/longitudes for the land points were being incorrectly set-up from the gridinfo at some points (mostly based on coastal areas). When we read the ancillary values from the gridinfo, there is an initial check for valid CABLE/offline/cable_parameters.F90 Line 960 in e190b23
We assume that binary ancillary locations were picked up without the additional check present above. Now, when gridinfo parameters were downscaled to 0.05 degree resolution via nearest neighbour interpolation, the coastal edges still had a coarse structure, however the ancillary data already had values suited at finer points. This is validated by the difference between existing and non-existing points between maps of
The mismatch of existing values in BIOS and non-BIOS parameters posed a problem in some points of SolutionGoalPoints in non-BIOS fields should exist iff they exist in BIOS fields (since their maps are already finetuned to 0.05 degree resolution including coastal regions). These were the list of non-BIOS variables which were fixed.
AlgorithmAssumption: All BIOS parameters have valid points in the same longitude/latitude. For reference, I took valid values in Update non-BIOS mapping in 2 stages:
A sample transformation for I will test the updated version gridinfo in |
@har917 @AlisonBennett @ccarouge One weird thing I found out was that |
@abhaasgoyal I had seen Are you asking because now some of the values of patchfrac are over land and crash the run? |
@ccarouge I brought |
New 1000pts simulation with reading BIOS binary but adding call to derive_parameters after call. |
@abhaasgoyal this is the change which updates the derived parameters after setting the BIOS soil parameters via the binary inputs: bfa87d3 |
Differences in the output appear in stage 3. Need to look at differences appearing in CASA in stage 2. |
Reading restarts from stage 2 ( |
Differences were caused due to parameters being modified between reading the gridinfo vs binary ancillary data, see ( Line 2710 in e3b41aa
Moving these calls after final read in
Currently testing whether any differences come up when we read |
Confusion of variables:
|
In the BIOS setup, the ancillary information is read in 2 steps:
The binary file contains also additional variables that are not currently in the gridinfo file.
We need to decide how to deal with this additional information:
Solution 1
It keeps the gridinfo file format constant but then adds a precedence rule between files. It makes it less clear to users what is the final source of the input information. Adds cases that need to be handled by CABLE instead of putting them in the pre-processing.
Solution 2
It collates the information into one file which clarifies the source of the information to the users. Although, currently CABLE has rules of precedence in its input files. gridinfo has the least priority, then the met files and finally the restart file. It would require creating different gridinfo files for different configurations and the format of the gridinfo file would not be constant between configurations.
Problem: If we collate all the BIOS information in the gridinfo file, do we risk using different information coming from the restart file for example? Is this case improbable and only probable in case of user error? In the current implementation, does BIOS overwrite the information after reading in only the gridinfo file or after reading in all the other input files?
Implementation
We tend to favour solution 2 but we need to clarify if the identified problem makes it impossible to implement it this way.
Some start towards an implementation is here:
/g/data/rp23/experiments/2024-04-17_BIOS3-merge/BIOS3_forcing/gridinfo_inprogress/convert_gridinfo_Aust.sh
.Input data for this script is here:
/g/data/rp23/experiments/2024-04-17_BIOS3-merge/BIOS3_forcing/australia_0.05/params/nc
.The text was updated successfully, but these errors were encountered: