Skip to content

Commit

Permalink
update test data statement
Browse files Browse the repository at this point in the history
  • Loading branch information
jpiaskowski committed Oct 3, 2023
1 parent 884b6d3 commit 8a2bd47
Showing 1 changed file with 11 additions and 0 deletions.
11 changes: 11 additions & 0 deletions test-data.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,14 @@ Soil properties such as organic carbon can be understood through two versions of
Point data represent a particular location in space that has been excavated, soil morphology described, and sampled for laboratory characterization-perhaps describing a 3x3 foot area excavated to 4-6 foot depth. It is best understood as a measurement, one that accurately describes the point from which it is taken. There are well over 80,000 points where laboratory measurements have been taken and form an important part of the foundation upon which the U.S. Soil Survey is built. But, overall, these points are sparsely dotted across the United States, because this is a big country and laboratory sampling is expensive. The [NCSS Laboratory Characterization Database](https://ncsslabdatamart.sc.egov.usda.gov) is based on samples gathered over many decades by various organizations, and possibly for many different objectives. Some of these data are incomplete, perhaps only composed of the upper soil horizons, while other data that span all soil horizons may be missing specific characterization data such as bulk density, cation exchange capacity, or optical sand grain counts.

The spatial and tabular data delivered via [Web Soil Survey](https://websoilsurvey.nrcs.usda.gov/app/) (also known as SSURGO) are referred to as aggregated data. This version of reality (more generalized) of soil data relies on extensive field investigation, point data, and tacit models of "soil landscape relationships" to build a continuous fabric of soil geography. The level of generalization for any given area ("soil survey order" or "level of investigation") is dependent on expected land use decisions (e.g. intensive farming vs. rangeland) and available resources (staff and time). The order of mapping gives an indication of how collections of soils are grouped within a soil mapping unit (e.g. conceptual generalization) and an approximate minimum mapping unit (e.g. spatial generalization). See chapter 4 of the [Soil Survey Manual](https://www.nrcs.usda.gov/sites/default/files/2022-09/SSM-ch4.pdf) for a much more detailed accounting of survey order. Aggregate soil survey data, unlike point data, is far more complete in terms of space and available data elements, and covers most of the United States. However, the spatial and conceptual generalization of these data can be too coarse to resolve fine details within small areas (<2-10 acres, depending on survey order).

## Final Thoughts

This is why there is no "true data" or testing data set you can use to find out if your approach is correct. The point estimates are the closest to truth, but the very methods by which soil carbon is estimated across a landscape is a massive challenge. That is the purpose of this Hackathon:

- to explore alternative ways to estimate soil carbon;
- to interpolate soil carbon across a landscape;
- to combine data measured on different spatial scales relevant to soil carbon estimation; and/or
- to use this information to predict soil carbon across a landscape and to depth.

Any one of these options would be an appropriate way to address the Ag Hackathon topic.

0 comments on commit 8a2bd47

Please sign in to comment.