Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release to 1.4.0 #378

Merged
merged 72 commits into from
Jan 28, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
7f68e3b
Added Ion 540 Chip Kit
svarona Dec 30, 2024
3c349a1
Updated chengelog
svarona Dec 30, 2024
4936b7c
Added dev to version to keep in logfiles0
svarona Dec 30, 2024
58bdfe0
Removed (*) in laboratory_address.json
OPSergio Jan 10, 2025
6d80436
Removed (*) in the enum
OPSergio Jan 10, 2025
dd5c287
Merge pull request #365 from OPSergio/develop
OPSergio Jan 10, 2025
304ff6e
Removed setup.py for pyproject.toml
Shettland Jan 10, 2025
7953cf4
Added dynamic-dependencies
Shettland Jan 13, 2025
0d3cba7
Solved timeout upload_to_ena
Aberdur Jan 14, 2025
2675202
Corrected blanks error
Aberdur Jan 14, 2025
1303981
Corrected blanks error
Aberdur Jan 14, 2025
c8e15b1
Corrected format error
Aberdur Jan 14, 2025
cb71b5d
Corrected format error
Aberdur Jan 14, 2025
aec7b8a
Merge branch 'develop' into develop
Aberdur Jan 14, 2025
64a54cf
Merge pull request #369 from Aberdur/develop
Aberdur Jan 14, 2025
513789e
Fixed path to files not containing sample name in basename
svarona Jan 10, 2025
a5a164f
Added batchID/date to filenames instead of date
svarona Jan 13, 2025
3d9e962
add current date to splitted files
svarona Jan 13, 2025
1b316d4
add year to outdir path
svarona Jan 13, 2025
7526057
test whether samples correspond to same batch date or not
svarona Jan 13, 2025
103248e
Added function to save splitted files to outdir
svarona Jan 13, 2025
aac4937
test if bioinfo_lab_metadata exist, if exists merge
svarona Jan 13, 2025
4fe8328
Created code to merge metadata if already exist
svarona Jan 13, 2025
aa75429
created function to search files containing muiltiple samples
svarona Jan 13, 2025
a8a5095
test whether file exist and try to merge
svarona Jan 13, 2025
d7c7b61
fixed black
svarona Jan 13, 2025
e9af243
Fixed flake8
svarona Jan 13, 2025
13cf3e8
fixed black
svarona Jan 14, 2025
6c6a729
added creation of log in batch folder
svarona Jan 14, 2025
8f07097
fixed regex to find long table in analysis_results
svarona Jan 14, 2025
04becc0
removed log that was not working
svarona Jan 14, 2025
e86120c
split files by batch first
svarona Jan 14, 2025
a4ca40e
created unique sufix for all files in batch
svarona Jan 14, 2025
1d4f777
renamed save_splitted_files to save_merged_files
svarona Jan 14, 2025
436cb40
added code to save splitted long table to batch dir
svarona Jan 14, 2025
7785a11
fixed black
svarona Jan 14, 2025
13bd2db
added log errors
svarona Jan 14, 2025
7452fba
replaced batch_id with batch_date
svarona Jan 14, 2025
4ff54fb
replaced batch_id with batch_date
svarona Jan 14, 2025
8570d3e
Updated changelog
svarona Jan 14, 2025
9a94222
Added creation of analysis_results earlier for module to work
svarona Jan 15, 2025
ec97363
ignored flake error
svarona Jan 15, 2025
4417ce2
flixed black and flake
svarona Jan 15, 2025
2c61265
fixed black and flake
svarona Jan 15, 2025
687b6a2
finally fixed flake8
svarona Jan 15, 2025
030fd3a
removed print
svarona Jan 15, 2025
d444f4f
Added log functionality in module build-schema
Aberdur Jan 21, 2025
03683ba
Update CHANGELOG.md
Aberdur Jan 21, 2025
3fd902e
black_lint corrected errors
Aberdur Jan 21, 2025
54e759c
black_lint corrected errors
Aberdur Jan 21, 2025
85afe5b
Merge pull request #371 from Aberdur/develop
Aberdur Jan 21, 2025
42fd6d8
Updated the metadata_processing field in configuration.json
victor5lm Jan 24, 2025
de33f3f
Added an "errors" field in the json schema
victor5lm Jan 24, 2025
ded81c5
Fixed bug
victor5lm Jan 24, 2025
6b276c5
Updated CHANGELOG.md
victor5lm Jan 24, 2025
03f534a
Added other_preparation_kit, quality_control_metrics and consensus_cr…
victor5lm Jan 24, 2025
c2eb5d4
Updated CHANGELOG.md
victor5lm Jan 24, 2025
df5c673
Add QC to bioinfo_config.json
Aberdur Jan 27, 2025
bb40eb3
Add QC functionality to read_bioinfo_metatada
Aberdur Jan 27, 2025
66f936c
Add quality_control_evaluation to viralrecon.py
Aberdur Jan 27, 2025
005b1d7
Add QC fields to relecov_schema
Aberdur Jan 27, 2025
f12162d
Update CHANGELOG.md
Aberdur Jan 27, 2025
bd69996
Update read_bioinfo_metadata black_lint
Aberdur Jan 27, 2025
8291c79
Merge branch 'develop' into develop
Aberdur Jan 27, 2025
27597b0
Merge pull request #373 from Aberdur/develop
Aberdur Jan 27, 2025
287fd34
Added dropdown functionality to build-schema
Aberdur Jan 27, 2025
21b1d3c
Update CHANGELOG.md
Aberdur Jan 27, 2025
fe51b1d
Solve flake8
Aberdur Jan 27, 2025
aa6d234
Updated version to 1.4.0
OPSergio Jan 27, 2025
0b86cf1
Updated version in CHANGELOG.md
OPSergio Jan 27, 2025
94919ff
Updated link version
OPSergio Jan 27, 2025
367873e
Updated version in .toml
OPSergio Jan 28, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 14 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,31 @@ All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [1.X.0] - 202X-XX-XX : https://github.com/BU-ISCIII/relecov-tools/releases/tag/
## [1.4.0] - 2025-01-27 : https://github.com/BU-ISCIII/relecov-tools/releases/tag/v1.4.0

### Credits

Code contributions to the release:

- [Sarai Varona](https://github.com/svarona)
- [Alejandro Bernabeu](https://github.com/aberdur)
- [Victor Lopez](https://github.com/victor5lm)

### Modules

#### Added enhancements

- Added a IonTorrent flow cell for validation [#363](https://github.com/BU-ISCIII/relecov-tools/pull/363)
- Added solution to timeout in upload-to-ena module [#368](https://github.com/BU-ISCIII/relecov-tools/pull/368)
- Added log functionality to build-schema module [#340](https://github.com/BU-ISCIII/relecov-tools/pull/340)
- Updated the metadata_processing field in configuration.json and added the other_preparation_kit, quality_control_metrics and consensus_criteria fields in the json schema [#372](https://github.com/BU-ISCIII/relecov-tools/pull/372)
- Added quality control functionality to read-bioinfo-metadata [#373](https://github.com/BU-ISCIII/relecov-tools/pull/373)
- Added dropdown functionality to build-schema enums [#374](https://github.com/BU-ISCIII/relecov-tools/pull/374)

#### Fixes

- Fixed read-bioinfo-metadata module [#367](https://github.com/BU-ISCIII/relecov-tools/pull/367)

#### Changed

#### Removed
Expand Down
62 changes: 62 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
[build-system]
requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_meta"

[project]
name = "relecov-tools"
version = "1.4.0"
description = "Tools for managing and proccessing relecov network data."
readme = "README.md"
requires-python = ">=3.7"
authors = [
{name = "Sara Monzon", email = "[email protected]"},
{name = "Luis Chapado", email = "[email protected]"},
{name = "Isabel Cuesta", email = "[email protected]"},
{name = "Sarai Varona", email = "[email protected]"},
{name = "Daniel Valle", email = "[email protected]"},
{name = "Pablo Mata", email = "[email protected]"},
{name = "Victor Lopez", email = "[email protected]"},
{name = "Emi Arjona", email = "[email protected]"},
{name = "Jaime Ozaez", email = "[email protected]"},
{name = "Juan Ledesma", email = "[email protected]"},
{name = "Sergio Olmos", email = "[email protected]"},
{name = "Alejandro Bernabeu", email = "[email protected]"},
{name = "Alba Talavera", email = "[email protected]"}
]
maintainers = [
{name = "Sara Monzon", email = "[email protected]"},
{name = "Luis Chapado", email = "[email protected]"},
{name = "Isabel Cuesta", email = "[email protected]"},
{name = "Sarai Varona", email = "[email protected]"},
{name = "Daniel Valle", email = "[email protected]"},
{name = "Pablo Mata", email = "[email protected]"},
{name = "Victor Lopez", email = "[email protected]"},
{name = "Emi Arjona", email = "[email protected]"},
{name = "Jaime Ozaez", email = "[email protected]"},
{name = "Juan Ledesma", email = "[email protected]"},
{name = "Sergio Olmos", email = "[email protected]"},
{name = "Alejandro Bernabeu", email = "[email protected]"},
{name = "Alba Talavera", email = "[email protected]"}
]
keywords = [
"relecov",
"bioinformatics",
"pipeline",
"sequencing",
"NGS",
"next generation sequencing"
]
license = {text = "GNU GENERAL PUBLIC LICENSE v.3"}
dynamic = ["dependencies"]

[project.urls]
Homepage = "https://github.com/BU-ISCIII/relecov-tools"

[tool.setuptools.dynamic]
dependencies = {file = ["requirements.txt"]}

[tool.setuptools.packages.find]
exclude = ["docs"]

[project.scripts]
relecov-tools = "relecov_tools.__main__:run_relecov_tools"
2 changes: 1 addition & 1 deletion relecov_tools/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
stderr=True, force_terminal=relecov_tools.utils.rich_force_colors()
)

__version__ = "1.3.0"
__version__ = "1.4.0"


def run_relecov_tools():
Expand Down
95 changes: 79 additions & 16 deletions relecov_tools/assets/pipeline_utils/viralrecon.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@
import os.path

from pathlib import Path
from datetime import datetime

import relecov_tools.utils
from relecov_tools.config_json import ConfigJson
Expand Down Expand Up @@ -135,7 +134,7 @@ def convert_to_json(self, samp_dict):
j_list = []
# Grab date from filename
result_regex = re.search(
"variants_long_table(?:_\d{8})?\.csv", os.path.basename(self.file_path)
"variants_long_table(?:_\d{14})?\.csv", os.path.basename(self.file_path)
)
if result_regex is None:
stderr.print(
Expand All @@ -153,18 +152,53 @@ def convert_to_json(self, samp_dict):
j_list.append(j_dict)
return j_list

def save_to_file(self, j_list):
def save_to_file(self, j_list, batch_date):
"""Transform the parsed data into a json file"""
date_now = datetime.now().strftime("%Y%m%d%H%M%S")
file_name = "long_table_" + date_now + ".json"
file_name = "long_table_" + batch_date + ".json"
file_path = os.path.join(self.output_directory, file_name)

try:
with open(file_path, "w") as fh:
fh.write(json.dumps(j_list, indent=4))
stderr.print("[green]\tParsed data successfully saved to file:", file_path)
except Exception as e:
stderr.print("[red]\tError saving parsed data to file:", str(e))
if os.path.exists(file_path):
stderr.print(
f"[blue]Long table {file_path} file already exists. Merging new data if possible."
)
log.info(
"Long table %s file already exists. Merging new data if possible."
% file_path
)
original_table = relecov_tools.utils.read_json_file(file_path)
samples_indict = {item["sample_name"]: item for item in original_table}
for item in j_list:
sample_name = item["sample_name"]
if sample_name in samples_indict:
if samples_indict[sample_name] != item:
stderr.print(
f"[red]Same sample {sample_name} has different data in both long tables."
)
log.error(
"Sample %s has different data in %s and new long table. Can't merge."
% (sample_name, file_path)
)
return None
else:
original_table.append(item)
try:
with open(file_path, "w") as fh:
fh.write(json.dumps(original_table, indent=4))
stderr.print(
"[green]\tParsed data successfully saved to file:", file_path
)
except Exception as e:
stderr.print("[red]\tError saving parsed data to file:", str(e))
log.error("Error saving parsed data to file: %s", e)
else:
try:
with open(file_path, "w") as fh:
fh.write(json.dumps(j_list, indent=4))
stderr.print(
"[green]\tParsed data successfully saved to file:", file_path
)
except Exception as e:
stderr.print("[red]\tError saving parsed data to file:", str(e))
log.error("Error saving parsed data to file: %s", e)

def parsing_csv(self):
"""
Expand All @@ -180,7 +214,7 @@ def parsing_csv(self):


# START util functions
def handle_pangolin_data(files_list, output_folder=None):
def handle_pangolin_data(files_list, batch_date, output_folder=None):
"""File handler to parse pangolin data (csv) into JSON structured format.

Args:
Expand Down Expand Up @@ -320,7 +354,7 @@ def get_pango_data_version(files_list):
return pango_data_processed


def parse_long_table(files_list, output_folder=None):
def parse_long_table(files_list, batch_date, output_folder=None):
"""File handler to retrieve data from long table files and convert it into a JSON structured format.
This function utilizes the LongTableParse class to parse the long table data.
Since this utility handles and maps data using a custom way, it returns None to be avoid being transferred to method read_bioinfo_metadata.BioinfoMetadata.mapping_over_table().
Expand Down Expand Up @@ -349,7 +383,7 @@ def parse_long_table(files_list, output_folder=None):
# Parsing long table data and saving it
long_table_data = long_table.parsing_csv()
# Saving long table data into a file
long_table.save_to_file(long_table_data)
long_table.save_to_file(long_table_data, batch_date)
stderr.print("[green]\tProcess completed")
elif len(files_list) > 1:
method_log_report.update_log_report(
Expand All @@ -361,7 +395,7 @@ def parse_long_table(files_list, output_folder=None):
return None


def handle_consensus_fasta(files_list, output_folder=None):
def handle_consensus_fasta(files_list, batch_date, output_folder=None):
"""File handler to parse consensus data (fasta) into JSON structured format.

Args:
Expand Down Expand Up @@ -406,3 +440,32 @@ def handle_consensus_fasta(files_list, output_folder=None):
)
method_log_report.print_log_report(method_name, ["valid", "warning"])
return consensus_data_processed


def quality_control_evaluation(data):
"""Evaluate the quality of the samples and add the field 'qc_test' to each 'data' entry."""
conditions = {
"per_sgene_ambiguous": lambda x: float(x) < 10,
"per_sgene_coverage": lambda x: float(x) > 98,
"per_ldmutations": lambda x: float(x) > 60,
"number_of_sgene_frameshifts": lambda x: int(x) == 0,
"number_of_unambiguous_bases": lambda x: int(x) > 24000,
"number_of_Ns": lambda x: int(x) < 5000,
"qc_filtered": lambda x: int(x) > 50000,
"per_reads_host": lambda x: float(x) < 20,
}
for sample in data:
try:
qc_status = "pass"
for param, condition in conditions.items():
value = sample.get(param)
if value is None or not condition(value):
qc_status = "fail"
break
sample["qc_test"] = qc_status
except ValueError as e:
sample["qc_test"] = "fail"
print(
f"Error processing sample {sample.get('sequencing_sample_id', 'unknown')}: {e}"
)
return data
Loading
Loading