Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check precision of site coordinates #229

Open
teixeirak opened this issue Apr 6, 2021 · 30 comments
Open

check precision of site coordinates #229

teixeirak opened this issue Apr 6, 2021 · 30 comments

Comments

@teixeirak
Copy link
Member

teixeirak commented Apr 6, 2021

@Troger4 , here are instructions for checking the precision of site coordinates. Please try these out and let me know if they make sense to you. If you have any doubts, please ask before proceeding.

  1. ensure that you've pulled the latest version of ForC, then open ForC_sites
  2. focus on the column coordinates.precision (K in Excel)
  3. find any site with NAC in this field, then find the study/ studies with records at this site in measurement.refs. There may be additional info on the site in site.refs. It would also be valuable to check the original source of anything with (minutes) in this field, as that indicates that current precision is very low (and hopefully the original source has something better). Open reference (you already know how to link citation.ID to pdf). If there's more than one, it may be helpful to look at more than one, but don't go overboard if there are tons of references.
  4. Find the site description for a site with a name corresponding to (but not necessarily exactly matching) the one you're working on (usually near beginning of methods, but could be in table or maybe even supplementary info), specifically geographic coordinates.

converting degrees-minutes-seconds to decimal degrees:

  • Decimal degrees = Degrees + (Minutes/60) + (Seconds/3600)
  • for latitude, N = positive, S= negative
  • for longitude, E= positive, W= negative
  • if you're not confident about part of this, google for further guidance.
  1. verify that coordinates entered in ForC match those given in the paper. If they don't match, please correct.
  2. in coordinates.precision , enter the precision reported in the original pub (which should now match what's in ForC). Please enter exactly one of the following (parts in bold):
  • degree- rounded to nearest degree or rough fraction of a degree (e.g., .167, .25, .33, .5, .67, .75, .83, or to just one decimal point);
  • minute- reported to the nearest minute in original source (increments of 0.01667);
  • second- reported to the nearest second in original source;
  • fraction of second - reported to fraction of second in original source;
  • decimal degrees to [n] digits- reported in decimal degrees to n digits (where n is minimum precision of the two coordinates);
  • other (see geography.notes) - unusual cases (e.g., composite of multiple sites), as detailed in geography.notes.
  1. If you run into a case that doesn't make sense, skip it and make a note that it requires review.
  2. If the study has multiple sites and you can confidently link those to others reported in the study, it will be most efficient to get all those sites at the same time. You can identify them by filtering the measurement.refs field for the citation.ID you're looking at.
@Troger4
Copy link
Collaborator

Troger4 commented Apr 8, 2021

I'll try it out today and get back to you if I have any issues. Thanks very much for the written instructions!

teixeirak added a commit that referenced this issue Apr 16, 2021
#229
@Troger4 , it's important to check that the coordinates match what is given in the paper. In the case of Chao_2009_atdq, the paper gave coordinates to the minute, but the values entered were less precise. In cases like this, please correct the values in the spreadsheet (using the deg-min-sec to decimal degree conversion given in the instructions).

I've fixed the Chao coordinates.
@teixeirak
Copy link
Member Author

@ValentineHerr , it would be helpful if you could write a script (quick, I think) to flag low-precision coordinates for review. We want to identify records at the poorest level of precision:

degree- rounded to nearest degree or rough fraction of a degree (e.g., .167, .25, .33, .5, .67, .75, .83, or to just one decimal point)

What I envision is a script that looks at lat and lon and tests whether both coordinates meet that criteria. When that is the case, fill the coordinates.precision field with "(degree)" ONLY IF if it's current value is "NAC".

@ValentineHerr
Copy link
Member

@teixeirak,

Here are an example for each case we have (where precision is "NAC" and we have coordinates). the first 2 columns are the coordinates in SITES, the 2 last columns are the number of decimals:

image

To confirm what you are asking, I'll enter "degree" in column coordinates.precision if min(ndec_lat, ndec_lon) <= 3, correct?

@teixeirak
Copy link
Member Author

Well, it's not quite so simple as looking at n decimals, because sometimes a very rough coordinate could have high n decimals. For example, if coordinates are rounded to the nearest 10 minutes, you could get repeating values after the decimal (e.g., 30°40' --> 30.6666667). I guess the rule I'd apply is:

min(ndec_lat, ndec_lon) <= 1

OR

decimal portion = .25, .75

OR

decimal portion = .167 (1/6), .33 (1/3), .67 (2/3), or .83 (5/6), where there could be any number of repeats in the 6 or 3 (e.g., we'd count .33, .333, .3333, .33333, etc.).

@ValentineHerr
Copy link
Member

Ok, I think I got it... but does it seem right that this would be 1788 sites?

Also, I'll be adding "(decimal)" so that @Troger4 can change to "decimal" when she checked, right? see this comment

@teixeirak
Copy link
Member Author

teixeirak commented Apr 16, 2021

but does it seem right that this would be 1788 sites?

Yikes, that's a lot--roughly 1/3! I didn't expect the number to be that high. If that's the case, maybe make two categories:

1. "(degree)"

min(ndec_lat, ndec_lon) =0

OR

decimal portion = .25, 0.5, .75

2. "(minutes rounded)"

doesn't meet criteria above

AND

min(ndec_lat, ndec_lon) <= 1

OR

decimal portion = .25, .75

OR

decimal portion = .167 (1/6), .33 (1/3), .67 (2/3), or .83 (5/6), where there could be any number of repeats in the 6 or 3 (e.g., we'd count .33, .333, .3333, .33333, etc.).

@teixeirak
Copy link
Member Author

Also, I'll be adding "(decimal)" so that @Troger4 can change to "decimal" when she checked, right? see this comment

This doesn't make sense, unless you meant "(degree)". If that's what you meant, the answer is yes.

@ValentineHerr
Copy link
Member

This doesn't make sense, unless you meant "(degree)". If that's what you meant, the answer is yes.

Yes, sorry, that is what I meant.

And ok for th rest, I make the change

@teixeirak
Copy link
Member Author

Okay. Note that I just made a couple small edits above (so use that, not the email).

@ValentineHerr
Copy link
Member

did you really mean min(ndec_lat, ndec_lon) =0 and not min(ndec_lat, ndec_lon) <=1 ? that reduces a lot of the issues (down to 429)

@teixeirak
Copy link
Member Author

Yeah, I'd like to separate out the ones rounded to the nearest degree or quarter degree from those rounded to the nearest ten minutes or tenth of a degree. The former could easily fall in the wrong geographic.area.

@ValentineHerr
Copy link
Member

ok, this part should be done

@teixeirak
Copy link
Member Author

teixeirak commented Apr 16, 2021

Actually, I'm sorry, I missed something before. We want to identify sites where BOTH lat and on meet the criteria, as opposed to just one. So, in the screenshot below, the green ones are correctly identified, yellow ones are falsely identified because just one coordinate meets the criteria.

image

@teixeirak
Copy link
Member Author

@Troger4 , I added a column at the end of the sites spreadsheet flagging sites with coordinates that have no climate data, generally because they fall in the ocean (see issue #233 ). These are high priority for review. Based on the names, I can tell that at least some are coastal, and so the coordinates may be more-or-less correct, just slightly missing what the climate database considers land. But I also noticed that at least some were flagged by Valentine's script as low precision.

@ValentineHerr
Copy link
Member

Actually, I'm sorry, I missed something before. We want to identify sites where BOTH lat and on meet the criteria, as opposed to just one. So, in the screenshot below, the green ones are correctly identified, yellow ones are falsely identified because just one coordinate meets the criteria.

image

@teixeirak, what should happen to the orange ones (when only one coordinate meets the criteria), then? Leave NAC?

@teixeirak
Copy link
Member Author

Correct. My interpretation of those is that a trailing zero on one coordinate was dropped (so, for example, lat on the first one would be 56.60).

@ValentineHerr
Copy link
Member

So I need to change the statements above to the following (note that decimal portion = .25, .75 in the second part can't work with first part if we say doesn't meet criteria above ).

1. "(degree)"

(
ndec_lat = 0
AND
ndec_lon =0
) OR (
decimal portion lat = .25, 0.5, .75
AND
decimal portion lon = .25, 0.5, .75
)

2. "(minutes rounded)"

doesn't meet criteria above

AND
((
ndec_lat, <= 1
AND
ndec_lon <= 1
) OR (

decimal portion of lat AND lon = .25, .75 (can't work with first statement)

OR

decimal portion lat AND lon = .167 (1/6), .33 (1/3), .67 (2/3), or .83 (5/6), where there could be any number of repeats in the 6 or 3 (e.g., we'd count .33, .333, .3333, .33333, etc.).

))

@teixeirak
Copy link
Member Author

teixeirak commented Apr 19, 2021

The one modification to the above is that the two coordinates don't have to meet the same criteria. For example, for degree, lat =35, lon = -110.5 should be flagged.

So,

(
ndec_lat = 0
OR
decimal portion lat = .25, 0.5, .75
)
AND
(
ndec_lon =0
OR
decimal portion lon = .25, 0.5, .75
).

And similarly for minutes rounded.

@ValentineHerr
Copy link
Member

ok, thanks!

@ValentineHerr
Copy link
Member

I pushed a fixed version.

@teixeirak
Copy link
Member Author

Thanks @ValentineHerr !

@Troger4 , please be sure to pull the latest version before working on this again.

@teixeirak
Copy link
Member Author

@Troger4 , also a couple clarifications:

  • "decimal degree to 0 digits" = "degree", so let's just put "degree". This is super easy to go back and change-- you can do it or I can.
  • More importantly, let's clarify how we're handling cases where the two coordinates have different levels of reported precision. Since you've been doing this, we'll go with whatever you've been doing. It looks like you're reporting on whichever is less precise, correct? (e.g.., site 106: 43.8, 93 --> decimal degrees to 0 digits)

@Troger4
Copy link
Collaborator

Troger4 commented Apr 19, 2021

@Troger4 , also a couple clarifications:

  • "decimal degree to 0 digits" = "degree", so let's just put "degree". This is super easy to go back and change-- you can do it or I can.
  • More importantly, let's clarify how we're handling cases where the two coordinates have different levels of reported precision. Since you've been doing this, we'll go with whatever you've been doing. It looks like you're reporting on whichever is less precise, correct? (e.g.., site 106: 43.8, 93 --> decimal degrees to 0 digits)

Okay, I'll fix that. And yes exactly, I've been assigning on whichever is less precise.

Also, I am unable to access the original PDFs for many of the records in ForC_sites but I can access the publications which the info was loaded in from. These seem to be studies in which data from hundreds of sites were used but it's obviously not the original publication. Is it alright to confirm coordinate data using the secondary source if I am not able to access the original or should I leave it if I cannot access the original?

Thanks!

teixeirak added a commit that referenced this issue Apr 19, 2021
#229

updated metadata description according to procedure of @Troger4
@teixeirak
Copy link
Member Author

Thanks, @Troger4 !

For the site info loaded from papers that were not the original study, please enter the precision as reported in the paper you're looking at, AND put a "1" in the field lacks.info.from.ori.pub (column AJ). This will indicate to us that the sites data was not loaded from the original pub.

@Troger4
Copy link
Collaborator

Troger4 commented Apr 19, 2021

Will do, thanks!

@Troger4
Copy link
Collaborator

Troger4 commented Apr 19, 2021

Luyssaert_2007_cbob https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1365-2486.2007.01439.x
Database containing info for 86 records in ForC_sites, but the paper doesn’t give any information on how to access the database or what the database is called. Do you happen to know the name of “the database” or how to access it? It is supposed to be publicly accessible.

@teixeirak
Copy link
Member Author

I checked the Luyssaert database. Coordinates are reported in decimal degrees to 2 decimal points, although there seem to be a few with lower precision (presumably because original pub lacks that info).

@Troger4
Copy link
Collaborator

Troger4 commented Apr 19, 2021

I see. How did you access it? Is the database something I can download and add to our references? Would you like me to fill "decimal degrees to 2 decimal points" for all our Luyssaert records? Most of the records from Luyssaert have at least 3 decimal points currently. Apologies for all the questions!

@teixeirak
Copy link
Member Author

We have a copy here (file: Luyssaert.xlsx; just sent you invite to the repo). That said, I don't really see any point in you looking at this databse. The coordinates for those sites would all be the same as entered in ForC, unless we accessed an original publication and got better coordinates.

@Troger4
Copy link
Collaborator

Troger4 commented Apr 20, 2021

Thanks! I quickly double checked the coordinate data between the two databases so all of our Luyssaert_2007 (degree) records have been resolved now.

teixeirak added a commit that referenced this issue Apr 26, 2021
I forgot to add a column label for sites with coordinates flagged as suspect-- fixed now.

@Troger4 , the sites flagged "1" in `coordinates.suspect` are suspect because they lack data in a climate database, generally because they fall in the ocean. It would be very helpful if you could review these. You may have already got a few. If the coordinates are reported accurately, please record the precision and add to geography.notes "Coordinates fall in ocean according to Worldclim2 database."

#229
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants