Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

song links missing for T arcufolium #90

Open
klausriede opened this issue Jun 4, 2024 · 17 comments
Open

song links missing for T arcufolium #90

klausriede opened this issue Jun 4, 2024 · 17 comments

Comments

@klausriede
Copy link

http://orthoptera.archive.speciesfile.org/Common/basic/Taxa.aspx?TaxonNameID=1140042

@typophyllum
Copy link
Collaborator

Thanks, fixed it. Again duplicate OTU with missing data.
https://orthoptera.speciesfile.org/otus/803223/overview

@klausriede
Copy link
Author

@MMCigliano this problem should be fixed on a higher level! I come across this problem frequently, without spending too much time searching, just everyday use! It should not be too difficult to design some control routines

@typophyllum
Copy link
Collaborator

If TaxonPages always picked the OTU with the lowest number in case there are two or more coordinate OTUs this would be solved. But apparently much more complicated than it seems.

And in Filter nomenclature could perhaps be integrated a filter for names with multiple OTUs.

@LocoDelAssembly
Copy link
Contributor

The duplicated OTU in question is deleted by now? I thought this problem was solved when TP searches started to exclude OTUs having non-blank name field and those were the only kind of duplicated OTUs that existed (in SFs projects).

Please hold on editing data next time an error like this appears so I can attempt analyzing the problem.

@typophyllum
Copy link
Collaborator

According to the sandcastle data the deleted OTU looked like this:

imagen

@typophyllum
Copy link
Collaborator

typophyllum commented Jun 4, 2024

As Klaus mentions it's easy to find more cases, for example there:

imagen

TP shows OTU 926810, which is a duplicate lacking content, like OTU 926811. The correct one is OTU 850457.

Currently the seven sound links are missing.
https://orthoptera.speciesfile.org/otus/926810/overview

@mjy
Copy link
Contributor

mjy commented Jun 4, 2024

For the record we will not implement logic that selects the first OTU by id as a solution, it's semantically particular to SFs, not all TW data.

@LocoDelAssembly
Copy link
Contributor

LocoDelAssembly commented Jun 4, 2024

Nevertheless those morrisi examples should not appear except for the last one. However it is surprising that the higher numbered is marked valid.

https://sfg.taxonworks.org/api/v1/otus/926810?project_token=3oerVKf82_196cIECvHYNg -> https://orthoptera.speciesfile.org/otus/926810/overview (full name is Typophyllum sp. 3 Typophyllum morrisi)
https://sfg.taxonworks.org/api/v1/otus/850457?project_token=3oerVKf82_196cIECvHYNg -> https://orthoptera.speciesfile.org/otus/850457/overview (hand-made link, autocomplete does not lead here)

https://sfg.taxonworks.org/api/v1/otus/autocomplete?project_token=3oerVKf82_196cIECvHYNg&having_taxon_name_only=true&term=Typophyllum+morrisi

The having_taxon_name_only is supposed to remove OTUs with non-blank name, so likely a regression here? Or some extra logic to show the "valid" OTU?

[edit] Sorry, actually extra logic in TP to redirect to valid OTU using valud_otu_id field since autocomplete indeed shows OTUs with blank name only, so question would be why those temporary names are marked as valid OTU of another.[/edit]

@mjy
Copy link
Contributor

mjy commented Jun 4, 2024

Debug against sandbox data with a huge grain of salt, best not to go down that rabit hole. Practice is better.

@LocoDelAssembly
Copy link
Contributor

The links above all point to production

@LocoDelAssembly
Copy link
Contributor

@mjy
Copy link
Contributor

mjy commented Jun 4, 2024

According to the sandcastle data the deleted OTU looked like this:

@LocoDelAssembly right, was referencing ^

@LocoDelAssembly
Copy link
Contributor

And with previous OTU Klaus found, it was again a problem that the otu_valid_id points to an OTU that doesn't look valid:

https://sfg-practice.taxonworks.org/api/v1/otus/autocomplete?project_token=3oerVKf82_196cIECvHYNg&having_taxon_name_only=true&term=Tympanophyllum+(Tympanophyllum)+arcufolium (thanks for sfg-practice reminder @mjy 😄)

How this can be happening?

@LocoDelAssembly
Copy link
Contributor

otu_valid_id origin seems to be this query:

SELECT DISTINCT ON (otus.id) otus.id, otus.name, otus.taxon_name_id, COALESCE(o2.id, otus.id) AS otu_valid_id
FROM "otus" LEFT JOIN
    taxon_names t1 ON otus.taxon_name_id = t1.id LEFT JOIN
    otus o2 ON t1.cached_valid_taxon_name_id = o2.taxon_name_id
WHERE "otus"."id" = :some_id

Would it break something if the query favored otus.id = o2.id over picking one at random when there are several OTUs referencing the same valid TN?

taxonworks_practice=# SELECT DISTINCT ON (otus.id) otus.id, otus.name, otus.taxon_name_id, COALESCE(o2.id, otus.id) AS otu_valid_id
FROM "otus" LEFT JOIN
    taxon_names t1 ON otus.taxon_name_id = t1.id LEFT JOIN
    otus o2 ON t1.cached_valid_taxon_name_id = o2.taxon_name_id
WHERE "otus"."id" = 803223;
   id   | name | taxon_name_id | otu_valid_id 
--------+------+---------------+--------------
 803223 |      |        910776 |       929363
(1 row)

taxonworks_practice=# SELECT otus.id, otus.name, otus.taxon_name_id, COALESCE(o2.id, otus.id) AS otu_valid_id
FROM "otus" LEFT JOIN
    taxon_names t1 ON otus.taxon_name_id = t1.id LEFT JOIN
    otus o2 ON t1.cached_valid_taxon_name_id = o2.taxon_name_id
WHERE "otus"."id" = 803223;
   id   | name | taxon_name_id | otu_valid_id 
--------+------+---------------+--------------
 803223 |      |        910776 |       929363
 803223 |      |        910776 |       803223 <<< This one would be the expected result
(2 rows)

@mjy
Copy link
Contributor

mjy commented Jun 11, 2024

@LocoDelAssembly pointer to corresponding code?

@mjy
Copy link
Contributor

mjy commented Jun 11, 2024

Would it break something if the query favored otus.id = o2.id over picking one at random when there are several OTUs referencing the same valid TN?

Probably not, but we can't assume there is anything special about the match, i.e. if we suddenly switched to the last match + some order then our result should be the same. If aggregating data is an issue then we need to resolve at the aggregation level.

@LocoDelAssembly
Copy link
Contributor

@LocoDelAssembly pointer to corresponding code?

Two places, which if I'm not mistaken are not deciding what to show in autocomplete, only what to set as otu_valid_id. In many cases I think you won't like this redirection, perhaps even with AntWeb this is a problem given them have partially identified specimens and because of that there are OTUs like "Genus sp. Genus" that perhaps may cause the Genus OTU to have one of those sp. as the valid OTU. (Perhaps was not sp. exactly, but we or maybe Dash added something in the importer to import partially valid scientific names in this way)

The idea would be not redirecting the user if the selected OTU is a valid candidate already.

mjy added a commit to SpeciesFileGroup/taxonworks that referenced this issue Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants