-
Currently, Plazi contributors use powerful tools and workflows to extract and annotate text from publications. Proposed is to help non-Plazi contributors with specific interests and expertise to plug-in to Plazi transcription workflows and add annotation of their own. For instance, @seltmann and students are looking to extract bee-plant associations from taxonomic literature. Ideally they would be able to re-use existing tools and workflows to help annotate these valuable associations in existing literature. @myrmoteras can you help outline the current workflow and share ideas on how to help external, non-Plazi folks to leverage them? |
Beta Was this translation helpful? Give feedback.
Replies: 17 comments
-
@mguidoti @flsimoes lets discuss this and reply. How much of what you and Jeremy are teaching can be reused? |
Beta Was this translation helpful? Give feedback.
-
This was in my pipeline to reply already and I think @jhpoelen is not talking about regular treatment extraction but rather the workflow created for the covid taskforce, to extract biotic interactions... I started a flowchart summarizing the steps and highlighting the tools we use but couldn't finish just yet. |
Beta Was this translation helpful? Give feedback.
-
@myrmoteras @mguidoti thanks for your responses and updates. I am interested in taking advance of both the existing taxonomic literature workflows and the most customized covid taskforce workflow. This way, both taxonomic literature as well as more general publications can be considered when performing specialized annotations / transcription. Something like:
Am just thinking out loud, and am curious what you make of these ideas. |
Beta Was this translation helpful? Give feedback.
-
@jhpoelen sorry for taking so long to reply you, but I'm finally here. As you know it, the workflow is a multi-step one that must consider a good amount of variables along the way to best accommodate the universe of possibilities when dealing with biodiversity literature data extraction. It's not linear, nor simple, or short. We have had a meeting before, where we drawn a flowchart before - specific to Katja's case - and here's the link (let @myrmoteras know if you don't have access to it). But this time, I also took the time to make a more generic and broader one, including most of the aspects we consider when dealing with both treatment and biotic interaction extraction, or each one of these workflows separately - link. I've invited you as a guest to this board, but if you don't want to create an account into Whimsical, here's the external, protected link too. The password I'm sending by e-mail. The first look on this new flowchart might be taunting, but I think the color code will ease the process of understanding. As your question was more generic than the specific case of Katja, and focused on what the non-Plazi agents can do at this point, I tried to separate the processes that can be done by externals from the ones that are still highly dependent on our processing facility (Porto Alegre office). As you'll see, joining efforts, constant evaluation of the corpus or corpora of bibliographic and clear goals can do the trick. Let me know what parts you need clarification and if this does a fair attempt to answer your questions. Don't hesitate with follow up questions. The exercise of attempting to drawn a more generic and broader flowchart to explain our processing, while interconnecting the treatment extraction and biotic interactions annotations at paper and treatment level was somehow, challenging. It's natural to have questions. Sorry once again to taking so long. Cheers, |
Beta Was this translation helpful? Give feedback.
-
@mguidoti In the workflow, it suggests I can "create a list of treatments." I would like to understand how I can create a list of treatments from TreatmentBank based on a DOI or a taxon. For example, can I get a download of all Ixodidae treatments, download them for easier examination for biotic interactions? |
Beta Was this translation helpful? Give feedback.
-
@mguidoti @myrmoteras @jhpoelen The tick data class at UCSB has prepared a first pass of recording the tick - host biotic interactions from: https://github.com/ParasiteTracker/tick-interaction-database You will see it is not "complete" as all of the TreatmentBank links are not in place nor the identifiers for taxon names. This is on purpose as I am wondering if there is an easier way for me to include the TreatmentBank links than manually searching and cutting/pasting for each record. Interestingly, there are not treatments for all taxa, even though this paper has been indexed into TreatmentBank. I have also submitted this dataset for review by GloBI. |
Beta Was this translation helpful? Give feedback.
-
@seltmann the document should be complete with all the treatments marked up. There is one that shows an error (may be @flsimoes or @mguidoti can fix it |
Beta Was this translation helpful? Give feedback.
-
@seltmann not sure how to understand your question regarding the treatmentBank links: Are you aware of the treatment statistics http://tb.plazi.org/GgServer/srsStats? you can search for a list of all the treatments from this article http://tb.plazi.org/GgServer/summary/3C7EFFACFFE9FFC2FF90FFA76543CB47 by using the UUID in this link (the article uuid), and include the UUID of the treatment, the verbatim taxon name and taxonomic family you can either look at is, each of the treatment is linked with the respective treatment in TB, of you can download the entire result in a format you might want. |
Beta Was this translation helpful? Give feedback.
-
@seltmann all the treatments now have also a DOI via BLR. or in JSON what we should discuss with @mguidoti etc how we could get the obo terms so we can add them as custom metadata to the treatments in BLR, so @jhpoelen can harvest it the way he did it for the bat data. @seltmann you can now also find all the records in GBIF - see the document deposit as dataset in GBIF (the article GBIF Dataset ID in the list: http://www.gbif.org/dataset/344f8a86-21a1-428e-ae4f-01ea6082254a. You can't get the species right now, it is in the import queue in GBIF https://www.gbif.org/health |
Beta Was this translation helpful? Give feedback.
-
Fixed. The last subSubSection did not extend all the way to the end of the treatment. |
Beta Was this translation helpful? Give feedback.
-
@flsimoes thanks I am glad, you have better eyesight! Glad this is resolved, |
Beta Was this translation helpful? Give feedback.
-
@myrmoteras You're welcome! |
Beta Was this translation helpful? Give feedback.
-
@myrmoteras I was not aware of the treatment statistics. This should work for me to be able to fill out the treatment bank identifier in a better way. Can I use "Document UUID" as the needed identifier for the treatment in the spreadsheet? For example, in our spreadsheet I presently have http://treatment.plazi.org/id/C04787D4-FFA7-FF8C-FF07-FAF965B3CC42 as the Treatment Bank identifier for Amblyomma albolimbatum in the reference 10.11646/zootaxa.4871.1.1. PMID: 33311340. Can I use the document UUID (C04787D4FFA7FF8CFF07FAF965B3CC42) instead of http://treatment.plazi.org/id/C04787D4-FFA7-FF8C-FF07-FAF965B3CC42 ? |
Beta Was this translation helpful? Give feedback.
-
@myrmoteras the interactions are all http://purl.obolibrary.org/obo/RO_0002454 (has host) in the dataset. However, some are negated, meaning that the text stated that X does not have host Y. |
Beta Was this translation helpful? Give feedback.
-
yes, the document UUID is it. It also part of the TB persistent identifier: http://treatment.plazi.org/id/C04787D4-FFA7-FF8C-FF07-FAF965B3CC42 We move all the treatments into the Biodiversity Literature Repository as a stable, longterm repository where we mint a DOI: http://doi.org/10.5281/zenodo.4582519 in this case. We have some QC measures in place to minimize creating erroneous treatment deposits.. The UUID resolves to the treatment in TB what has a much more marked up then what we currently have in BLR. It is also what we have when you open the document in GGI. |
Beta Was this translation helpful? Give feedback.
-
What we could consider is to use the annotations you make to the treatment and add this as custom metadata to the treatment in BLR, which then could be harvested by GloBI, similar to what we do with bat literature. If you are interested in this, let's get @mguidoti involved who can help to use the API to make this annotations. From out perspective, this would be a real good case what could be done once the treatments are available as FAIR data |
Beta Was this translation helpful? Give feedback.
-
@seltmann just looking at the treatment, would it be interesting for you if we could find a way to make use of the fact that we have taxonomic names marked up, other than ticks, that are essentially the host species? So, making use of this species, one then could infer all the host automatically, for how to tag the non-records like in this case you or @jhpoelen might now? It though is highly relevant such as statment: no records causing human parasitism |
Beta Was this translation helpful? Give feedback.
@jhpoelen sorry for taking so long to reply you, but I'm finally here.
As you know it, the workflow is a multi-step one that must consider a good amount of variables along the way to best accommodate the universe of possibilities when dealing with biodiversity literature data extraction. It's not linear, nor simple, or short. We have had a meeting before, where we drawn a flowchart before - specific to Katja's case - and here's the link (let @myrmoteras know if you don't have access to it). But this time, I also took the time to make a more generic and broader one, including most of the aspects we consider when dealing with both treatment and biotic interaction extraction, or each one of …