Thank you for considering contributing to tataki! Here are some guidelines to help you get started.
Table of Contents
- I want to add a module to the standalone mode
- I want to add a CWL document to the external extension mode
A template for a module is available in the parser directory.
Instructions:
- Copy the template to a new file with a name that describes the format you are going to parse.
- Rename the
Template
struct to the name of the format you are going to parse. - Implement the
determine_from_path
method of theParser
trait. Detailed instructions are provided below. - Add a
mod
statement to parser.rs so thattataki
recognizes your module, and also add a branch to thematch
statement in thefrom_str_to_parser
function. - Write a test for the module. An input file for the test should be placed in the tests/inputs directory. If the size of the input file is large, please use Zenodo.
This method determines if an input is in a format that your module can interpret.
impl Parser for Template {
fn determine_from_path(
&self,
input_path: &Path,
options: &InvokeOptions,
) -> anyhow::Result<ModuleResult> {
/*
This is a dummy implementation. Replace this with the actual algorithm to determine the file format.
*/
let file = std::fs::File::open(input_path)?;
let reader = std::io::BufReader::new(file);
for (count, line) in reader.lines().enumerate() {
let line = line?;
// Do something with the line here
// in case of parser error
if false {
return Err(anyhow::anyhow!("The input is missing the required column."));
}
// If the tidy option is not set, the number of lines to read is limited to num_records. +2 is used as a buffer.
if !options.tidy && count + 2 > options.num_records {
break;
}
}
Ok(ModuleResult::with_result(
Some("EDAM label".to_string()),
Some("http://edamontology.org/format_ EDAM id".to_string()),
))
}
}
input_path
: The path to the input file.options
: The options passed to the parser.options.tidy
andoptions.num_records
is used to control the number of lines to read from the input.options.no_decompress
is irrelevant here.
pub struct InvokeOptions {
/// Read full content of the input or not
pub tidy: bool,
/// Irrevant for `determine_from_path` method
pub no_decompress: bool,
/// Number of records to read
pub num_records: usize,
}
- If the parser can successfully interpret the file, return
Ok(ModuleResult)
. UseModuleResult::with_result(label: Option<String>, id: Option<String>)
to construct theModuleResult
and show the Edam ontology information.label
: EDAM Preferred Labelid
: EDAM Class ID
// example of successfull return
Ok(ModuleResult::with_result(
Some("BAM".to_string()),
Some("http://edamontology.org/format_2572".to_string()),
))
- If the parser fails, return
Err(anyhow::Error)
, including an error message that specifies the reasons why the parser cannot process the file.
// example of failure return
return Err(anyhow::anyhow!("The input is missing the required column."));
A template for a CWL document is available in the cwl directory.
Instructions:
- Copy the template and create a CWL document under the
cwl
directory. - Configure the CWL document with the docker image, base command, and Edam ontology information.
Please make sure that your CWL document has the following:
requirements.DockerRequirement.dockerPull
: The docker image that the CWL document uses.baseCommand
: The base command with which the docker image is executed to parse the input.edam_id
andlabel
: Describe the Edam ontology information when the parse is successfull. Both must havetataki
prefix which is listed in the$namespaces
section.
Example of a CWL document:
cwlVersion: v1.2
class: CommandLineTool
# Configure docker image here
requirements:
DockerRequirement:
dockerPull: your_docker_image
InlineJavascriptRequirement: {}
# Configure base command here
baseCommand: [command, to, use]
successCodes: [0, 139]
inputs:
input_file:
type: File
inputBinding:
position: 1
outputs: {}
$namespaces:
tataki: https://github.com/sapporo-wes/tataki
# Configure Edam ontology information here
tataki:edam_id: http://edamontology.org/format_edam-id
tataki:label: edam-label