Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QA/QC - Things to fix #176

Open
24 of 36 tasks
ValentineHerr opened this issue Feb 7, 2019 · 24 comments
Open
24 of 36 tasks

QA/QC - Things to fix #176

ValentineHerr opened this issue Feb 7, 2019 · 24 comments

Comments

@ValentineHerr
Copy link
Member

ValentineHerr commented Feb 7, 2019

Here is the running list of things to fix.

----- PLOTS to revisit by hand -----
(this is warnings from this script)

  • DUKE FOREST LOBLOLLY PINE_MANAGED LOBLOLLY PINE PLANTATION
  • BLANDY EXPERIMENTAL FARM_TEMPERATE DECIDUOUS FOREST. STAND ESTABLISHED AROUND 1914
  • CAXIUANA~ NATIONAL FOREST RESERVE_TROPICAL EVERGREEN FOREST
  • CAXIUANA~ NATIONAL FOREST RESERVE_TROPICAL EVERGREEN FOREST WITH DROUGHT TREATMENT
  • CAXIUANA~ NATIONAL FOREST RESERVE_AGGRADING TROPICAL EVERGREEN FOREST
  • CAXIUANA~ NATIONAL FOREST RESERVE_AGGRADING TROPICAL EVERGREEN FOREST. STAND ESTABLISHED AROUND 1960

----- MEASUREMENTS checks -----

  • there are 0 measurement.ID that are repeated
  • There are 0 measurements with no corresponding plot record
  • There are 0 measurements with no corresponding site in SITES record
  • There are 0 measurement records with undefined PFTs
  • There are 0 measurement records with undefined varibale.name
  • There are 2 covariate_1 in measurements that are not defined in VARIABLES : "min.depth" and "max.height"
  • There are 1 covariate_2 in measurements that are not defined in VARIABLES: "%C"
  • There are 0 measurement records with no corresponding methodology record

----- SITES checks -----

  • There are 0 sites with no corresponding plot record
  • There are 0 sites with no corresponding history record
  • There are 0 sites with no corresponding measurements record

----- HISTORY checks -----

  • There are 0 history records with no corresponding measurements record
  • There are 0 history records with no corresponding sites record
  • There are 3 history records with undefined disturbance category/type combinations
  • There are 0 history records with no corresponding plot record

----- PLOTS checks -----

  • There are 0 plots with no corresponding measurement record
  • There are 0 plots with no corresponding history recordd
  • There are 0 plots with no corresponding site record

----- HISTTYPE checks -----

  • There are hist.type in HISTTYPE that don't exist in HISTORY table (not a big deal)
  • There are hist.cat in HISTTYPE that don't exist in PLOTS table (not a big deal)

----- PFT checks -----

  • There are pftcode in PFT that don't exist in MEASUREMENTS table. ("2VW", "2GW", "2FORB", "2LTR", "2RF", "2RB") (not a big deal)

----- METHODOLOGY checks -----

  • There are method.ID in METHODOLOGY that don't exist in MEASUREMENTS table. (not big deal)
  • method.notes is not in METHODOLOGY table but is mentioned in METHODOLOGY_metadata

----- VARIABLES checks -----

  • There variables not used in MEASUREMENTS table ("NPP_4_OM", "NPP_5_OM", "NPP_woody_OM", "NPP_litter_OM", "ANPP_litterfall_2_C", "BNPP_root.turnover_fine_OM", "R_het_deadwood", "R_het_ag_C",
    "R_het_C). It is probably ok to leave them, but I am still mentioning in case records have been accidentally deleted.

----- ALLOMETRIES checks -----

  • There allometric.equation not used in MEASUREMENTS table. (not a big deal)

----- MEASUREMENT MEAN VALUE check against range in VARIABLES -----

  • There is 618 measurements falling out of variable range
  • There is 58 covariate_1 value(s) falling out of variable range
  • There is 19 covariate_2 value(s) falling out of variable range
  • There is 87 value(s) from a variable that has no defined range (this is probably the first record for that variable)

---- DUPLICATE RECORDS ------

  • There are D.precedences to give manually (mostly SRDB data, but 2 records from GROA)
@teixeirak
Copy link
Member

teixeirak commented Feb 7, 2019 via email

@ValentineHerr
Copy link
Member Author

ValentineHerr commented Feb 7, 2019

  • @teixeirak, all the missing method.ID are for stand.density, and I am assuming method.ID is correctly NA there, right ? If yes, I'll adapt the check script to ignore this situation.

@ValentineHerr
Copy link
Member Author

  • @teixeirak here are the hist.type not defined in HISTTYPE table:
hist.cat hist.type
Management Warming_soil
Management Warming_air
No.disturbance No major disturbance
Regrowth Initiation of post-disturbance cohort (planted)
No.disturbance NAC

@ValentineHerr
Copy link
Member Author

@teixeirak and @beckybanbury,

I created this file (that you probably have to open in your web browser from your computer, after pulling) that shows records that are outside of the range of values previously calculated for each variable.

I haven't had a close look yet but I thought I would mention it to you right away so that you can let me know what you think and maybe check some of the records (red dots with measurement.ID mentioned)

@teixeirak
Copy link
Member

This is helpful; thank you @ValentineHerr.

@beckybanbury, we do not need to review all of these records-- only the ones that are significant outliers. I'll leave this to you, but please let me know if there are any you'd like me to look at.

@teixeirak
Copy link
Member

-Warming was in there as Warming_X, with X indicating soil or air. I fixed it to make coding easier.
-There is a No.disturbance, "No severe disturbance". These should be the same. Let's replace "major" with "severe"
-Regrowth should be one of the following:

Regrowth(_prior) | Initiation of post-disturbance cohort (planted or natural)
Regrowth(_prior) | Planted

  • How many No.disturbance NAC are there? That's not a current category, although I suppose its fine (other than messing up the code)

@teixeirak
Copy link
Member

teixeirak commented Feb 8, 2019

yes, correct (regarding question on missing ID for stand density)

@ValentineHerr
Copy link
Member Author

ValentineHerr commented May 10, 2019

@teixeirak,
here are few things that need to be done or answered.
Note for myself: I created this list after importing SRDB data and updated all files and figures.

  • give D.precedence to record that have "NAC" there (SRDB data, pending on this issue)

  • replace NEP and NPP_litter to NEP_C and NPP_litter_C for SRDB. And Ideally, find out why it did not come with the "_C" in the first place... (@ValentineHerr)

  • define hist.type not defined in HISTTYPE table (might not be necessary, see below table) (@teixeirak) :

hist.cat hist.type
No.disturbance No major disturbance
Regrowth Initiation of post-disturbance cohort (planted)
Management Warming
  • should "No major disturbance" be replaced by "No severe disturbance", which already exists ?

  • is "Initiation of post-disturbance cohort (planted)" a special case of "Initiation of post-disturbance cohort (planted or natural)" or should it have its own case ?

  • we have warming_air and warming_soil... I think all instances of Warming are from SRDB so maybe I can replace them all by warming_soil ?

  • some records are:

hist.cat hist.type
No.disturbance NAC
  • Should "NAC" be replaced by "No severe disturbance" ? (@teixeirak)

  • I dont't rememeber "Agriculture_generic" hist.type not being used in History before. Now it is coming out in the QA/QC. I might not have noticed it before bu do you know of any record with that history type that have been deleted? (@teixeirak )

  • check range of records checking_range_of_records.html (see this comment) (@beckybanbury)

  • has method.notes been removed from methodology table intentionally ? (@teixeirak)

  • this figure splits the data into tropical and extratropical. We do not have that column anymore. Can you remind me of the lat threshold to determine it again? (@teixeirak)

@teixeirak
Copy link
Member

* [this figure ](https://github.com/forc-db/ForC/blob/master/figures/Histogram_of_Stand_age.png) splits the data into tropical and extratropical. We do not have that column anymore. Can you remind me of the lat threshold to determine it again? (@teixeirak)

23.5

@teixeirak
Copy link
Member

* has method.notes been removed from methodology table intentionally ? (@teixeirak)

I'm confused... its there in the version I'm looking at. It has value NA for all SRDB records.

@teixeirak
Copy link
Member

* I dont't rememeber "Agriculture_generic" hist.type not being used in History before. Now it is coming out in the QA/QC. I might not have noticed it before bu do you know of any record with that history type that have been deleted? (@teixeirak )

I believe we added this recently-- perhaps for GROA import? You can leave it ehre.

@teixeirak
Copy link
Member

* give D.precedence to record that have "NAC" there (SRDB data, pending on [this issue](https://github.com/forc-db/ForC_private/issues/18))

By, "there", do you mean in methodology? If so, please don't automatically assign precedence there; those should get manual review (for now).

teixeirak added a commit that referenced this issue May 13, 2019
relates to issue #176
@teixeirak
Copy link
Member

  • should "No major disturbance" be replaced by "No severe disturbance", which already exists ?

YES

  • is "Initiation of post-disturbance cohort (planted)" a special case of "Initiation of post-disturbance cohort (planted or natural)" or should it have its own case ?

These are separate cases. The first indicates that we know the stand was planted. The second indicates that its ambiguous whether regen was natural or planted.

  • we have warming_air and warming_soil... I think all instances of Warming are from SRDB so maybe I can replace them all by warming_soil ?

In some cases it was ambiguous what type of warming was applied. I added the category "warming" to HISTTYPE

@ValentineHerr
Copy link
Member Author

* give D.precedence to record that have "NAC" there (SRDB data, pending on [this issue](https://github.com/forc-db/ForC_private/issues/18))

By, "there", do you mean in methodology? If so, please don't automatically assign precedence there; those should get manual review (for now).

By "there" I mean in D.precedence. I just meant that we need to give D.precedence manually to some records, once we know how to rank the different methodologies coming from SRDB.

@ValentineHerr
Copy link
Member Author

ValentineHerr commented May 13, 2019

is "Initiation of post-disturbance cohort (planted)" a special case of "Initiation of post-disturbance cohort (planted or natural)" or should it have its own case ?

These are separate cases. The first indicates that we know the stand was planted. The second indicates that its ambiguous whether regen was natural or planted.

Ok, I add a category for "Initiation of post-disturbance cohort (planted)" to HISTTYPE

Good, thanks.

@teixeirak
Copy link
Member

By "there" I mean in D.precedence. I just meant that we need to give D.precedence manually to some records, once we know how to rank the different methodologies coming from SRDB.

Okay, please incorporate Ben's feedback and let me know if there are any left for me to review.

@ValentineHerr
Copy link
Member Author

Forgot to mention I updated the list in the first comment of this issue. all issues are gone except 3 covariate variable names, imported with GROA, that need to be defined. "%C", "min.depth" and "max.height".

@ValentineHerr
Copy link
Member Author

There are also a few potential duplicate records that need to be resolved manually (mostly left over from SRDB import, and a couple that showed up with GROA, that I guess we missed).

@ValentineHerr
Copy link
Member Author

@teixeirak, I think with one of your last update there is a disturbance ("cut") that got added in the data but not defined:

image

@teixeirak
Copy link
Member

@teixeirak, I think with one of your last update there is a disturbance ("cut") that got added in the data but not defined:

Oh, shame on me! It just needs to be capitalized.

@ValentineHerr
Copy link
Member Author

@teixeirak, a few things flagged in the QA/QC.

  • There are 2 covariate_1 in measurements that are not defined in VARIABLES : "min.depth" and "max.height"
  • There are 1 covariate_2 in measurements that are not defined in VARIABLES: "%C"
  • There are history records with undefined disturbance category/type combinations: cat = "Disturbance" type = "cut"
  • measurement.ID 25308 has "NA" in mean

Also,

  • I think it would be worth reviewing the values that are outside of range in this file. I'll update the range for the easy ones so maybe you can focus on the obvious issues.

@ValentineHerr
Copy link
Member Author

I resolved the issues and deleted the previous comment.
Only fixed the sites issues but there are 3 citations missing in CITATIONS: "Grier_1977_opmc", "Bartholomew_1953_mniu", "CUEVAS_1991_aabo" that probably @mawilliams99 can work on when there is time.

I don't think those citations were missing before... the citation.id in CITATIONS may have been changed so the solution may just be to update the ciation.id in MEASUREMENTS.

@teixeirak
Copy link
Member

Thanks, @ValentineHerr ! I agree that a CI for this repo would be great, including the QA/QC checks that we have written and updating ForC simplified/ figures. It's definitely worthwhile to set up now if it won't take to much time, as that will make this whole project easier.

@ValentineHerr
Copy link
Member Author

@teixeirak and/or @mawilliams99, on the main page of this repo, there is a lit of errors and warnings that now appear in red.
It would be great to fix the errors before I update ForC_simplified. (warnings are not important).

Let me know if the error reports are self explanatory. I thought I could fix them myself but I think you would be more suited to do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants