This project contains the links to the datasets and the code that was used for our study : "A comprehensive analysis of the usability and archival stability of omics computational tools and resources"
Table of contents
Mangul, Serghei, et al. "A comprehensive analysis of the usability and archival stability of omics computational tools and resources." bioRxiv, doi: https://doi.org/10.1101/452532
We downloaded open access papers via PubMed from 10 systems and computational biology journals. Raw data in XML format is available here. Our approach to extract software links from the downloaded papers and verify the archival stability of links is described in the Methods section of the paper and Figure S1. Timeout links were manually verified.
Links extracted from the abstracts and the body of the surveyed papers (n=48,393) are available in CSV format here. The CVS file contains the following fields:
- The type of link. The links were classified as extracted from abstract or the body of the paper
- Name of the journal
- Year the paper was published
- URL
- HTTP status: 0-300 - success. 300-400 redirection. 400 - broken link. -1 - timeout. See more details here
- Binary flag to indicates if the link was present in one paper or was shared across multiple papers.
We have randomly chosen 99 tools across various domains of computational biology. The methodology used to select tools and list of domains is presented in the Methods section of our paper.
Information about the usability of 99 tools is presented in CSV format here. The CVS file contains the following fields:
- tool ID
- Name of the package manager from which the tools was available, or "NA" if the tool was not available via a package manager
- Number of citations per year
- Number of commands executed during the installation process
- Number of commands suggested in the installation manual of the tool
- The proportion of undocumented commands (not specified in the manual)
- Binary flag to indicate if the tool passed automatic installation test. Tools that require no manual intervention are considered to pass automatics installation test.
- The total installation time
- Binary flag to indicate how easy was to install the software tool. We categorized a tool as ‘easy to install’ if it could be installed in 15 minutes or less; ‘complex installation’ if it required more than 15 minutes but was successfully installed before the two-hour limit; and ‘not installed’ if the tool could not be successfully installed within two hours
- Binary flag to indicate if the example dataset was provided
We have prepared Jupyter Notebooks that utilize the raw data described above to reproduce the results and figures presented in our manuscript.
For more information about reproducing the data collection process used in the archival stability section of our study, see the README.md file in the download.parse.data/ directory.
Would you like to play with our data and code? There is no need to download or install anything, we set this repository up compatible with Binder:
We thank the input from our peer reviewers, as well as online commenters in social media, in suggesting making the figures colorblind friendly. We acknowledge the following resources, which help us achieve the final result:
- Northwestern University's Knight Lab post on Three tools to help you make colorblind-friendly graphics.
- Somersalt18:24's article on designing colorblind-friendly scientific figures.
- Color Oracle, which simulates different colorblind conditions.
This repository is under MIT license. For more information, please read our LICENSE.md file.
Please do not hesitate to contact us ([email protected], [email protected], [email protected]) if you have any comments, suggestions, or clarification requests regarding the study or if you would like to contribute to this resource.