Skip to content

ddbj/ddbj_curator_assistant

Repository files navigation

DDBJ Curators' Assistant

The system consists of:

  • A set of tools designed to automate and standardize, in a fast and easy way, the curation of Sequence Datasets submitted to DDBJ. It comprises four steps: validation (ddbj_mss_validation); auto-correction (ddbj_autofix); auto upload files to DDBJ Databases (ddbj_sakura2DB); and update work tracking spreadsheet (ddbj_kaeru, "kaeru 帰る" means "leave, go home" in Japanese language, in the sense that the work is done).
  • A database, named dblink_ddbj, that contains the most relevant information from DDBJ Database, designed specially for DDBJ curators.
  • An easy-to-use search engine tool, search_dblink, for quickly and simultaneously retrieving data from a wide range of accession IDs.

Mass Dataset Documentation

ag_packages_202204_MSS_workflow

  1. DDBJ Mass Validation
    • An easy command line that identifies submitted files (annotation and fasta) and checks inconsistencies based on DDBJ rules.
    • Requirement: BioSample
    • Command line (production)
    bash /home/andrea/scripts/ddbj_mss_validation
    
    • Command line beta
    bash /home/andrea/scripts/ddbj_mss_validation_beta
    
  2. DDBJ Autofix
    • A simple command line that interactively suggests corrections that have been detected by DDBJ Mss Validation and automatically fixes them.
    • Command line
    bash /home/andrea/scripts/ddbj_autofix
    
    • Command line beta (CAUTION! Use this version when running ddbj_mss_validation_beta)
    bash /home/andrea/scripts/ddbj_autofix_beta
    
  3. DDBJ Sakura2DB
    • Interactive command line that automatically: a) identifies the file type; b) runs sakura2db (test and actual) for the corrected files to upload the files to their respective databases at DDBJ (Tsunami); c) moves the files to DONE directory.
    • Command line
      bash /home/andrea/scripts/ddbj_sakura2DB
      
      • Command line beta
      bash /home/andrea/scripts/ddbj_sakura2DB_beta
      
  4. DDBJ Kaeru
    • Update work tracking spreadsheet, after running DDBJ Sakura2DB.
    • Command line
    bash /home/andrea/scripts/ddbj_kaeru
    

DBLink DDBJ and Search DBLink Documentation

  1. DBLink DDBJ
    • Comprises essential data (dblink dataset) from the major DDBJ databases: BioProject, BioSample, Sequence Read Archive (DRA), Assembled Sequences (Mass Data) and GEA.
  2. Search DBLINK
    • A simple command line tool that accesses the DBLink-DDBJ database and correlates the major DBLink dataset from DDBJ using one file with different accession IDs.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published