-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path1introduction.tex
27 lines (20 loc) · 5.9 KB
/
1introduction.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
\section*{Introduction}
\subsection*{Scientific background}
Reproducibility of results is one of the most fundamental requirements for credibility in scientifc research \cite{tiwari2021reproducibility}. However, in recent years it became evident that a substantial part of study results are not reproducible. A recent systematic review found that almost 50\% of modeling studies could not be reproduced \cite{tiwari2021reproducibility}. Especially in systems biology and systems medicine, the increase in size and complexity of models and data-sets proposes new challenges to share reproducible results. Therefore, several initiatives such as the \hyperlink{https://www.go-fair.org/go-fair-initiative/}{Go \acs{fair} Initiative}, \hyperlink{https://fair-dom.org/}{FAIDROM} or \hyperlink{http://co.mbine.org/}{COMBINE} were formed aiming to develop and provide standardization efforts and tools to enhance reproducibility in systems biology \cite{specificationsb}.
One of these tools are so called \ac{combine} archives, which aim to improve the coordination of standard formats for several features of simulation studies, such as \ac{sbml}, CellML, \ac{sbgn}, and \ac{sbrml}. These standards aim to encode, simulate and visualize biological models \cite{combine}. Furthermore, \ac{combine} archives offer the unique opportunity to not only reproduce simulation results but also to access comprehensive metadata such as author information, publication IDs (e.g. \ac{doi}) and simulation details in one single file. The vast majority of this information is usually stored in different data formats and locations that require a bundle of software tools to handle. \ac{combine} archives instead bring a single executable file, which is easy to access and comes with proper provenience information. It is obvious that this creates a much higher accessibility to complex systems medicine and systems biology data for researchers and provides a better reproducibility of scientific results.
\subsection*{Rationale for this study}
\subsubsection*{Dyamic Pathway Simulations}
Dynamic pathway modeling is needed to describe the complex regulatory system of feedback regulators, to answer this question, Bachmann \textit{et al.} built a dual negative feedback model of \ac{jak}2/\ac{stat}5 signaling in primary erythroid progenitor cells isolated from mouse fetal livers \cite{bachmannmodel}.
Given the background of the only partially reproducible dynamic pathway model of \ac{socs} family members in \ac{jak}2/\ac{stat}5 signaling from Bachmann \textit{et al.} \cite{bachmannmodel}, we were asked to create a fully featured \ac{combine} archive and reproduce the simulation content, both for educational purposes and for scientific completion of the original work. Furthermore, the created \ac{combine} Archive could serve as a template to enhance reproducibility of future modelling studies.
\subsubsection*{Agile working}
Agile working has been a major drive for the evolution of working environment especially in information technologies. New definitions on how, where, with whom and when collaboration and the completion of tasks is done are enabled by digital cloud solutions and co-working platforms that integrate the allocation of tasks, versioning of content and the \textit{ad-hoc}-formation of teams. GitHub as a provider of internet-hosted software development and version control tools has been used as an essential common platform for managing software projects and supporting collaborative development. Lately some educational projects have begun to adopt it for hosting and managing course content to enhance transparency features in the creation, reuse, and remix of materials \cite{github, Knegendorf.}. In the development of this \ac{combine} archive we dedicated ourselves to the \acs{fair}-principles and therefore built a completely publicly traceable working environment in GitHub, that can be accessed via the link given in the appendices.
\subsection*{Objectives}
Since the number of modeling studies providing both data and meta-data in form of a \ac{combine} archives are limited, the main objective of this case study was to create a fully featured \acs{combine}-archive including both scripts to reproduce all simulation figures and easy to access simulation data of the Bachmann model, a dynamic pathway model of \ac{jak}/\ac{stat}5 signaling \cite{bachmannmodel}. Furthermore, we aimed to create an easy to use guideline on how to compile a \acs{combine} archive out of an existing simulation model. Besides these aims we were also interested in validating all generated scripts after archive compilation. Finally, we aimed to evaluate the used tools in terms of usability, accessibility and asked if the provided data from the original study was sufficient to reproduce the presented results.
\subsection*{Study design}
This case study describes the implementation process of a \ac{combine} archive using agile co-working and publicly available documentation.
\subsubsection*{Research questions}
The research question was whether there is a comprehensive way of developing a \ac{combine} archive out of publicly available data sources that allow the public to reproduce all simulation data including the resulting graphs. As secondary question we tested if this task could be performed, documented and archived in an agile working environment and according to the \acs{fair}-principles.
\subsubsection*{Analysis procedures}
To analyze the function of the \ac{combine}-archive we run several own simulations during the development process and compared the results with existing data from the original publication and with existing graphs \textit{ibidem}.
\subsubsection*{Validity procedures}
To validate the results of our \ac{combine} archive we conducted a systematic search for available modeling data, tested the content on a metadata level and created \ac{sbgn}- and \ac{sedml}-files that were generated using various tools and that were challenged against the original data in numeric and graphic analyses.