Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/cells argument #170

Open
wants to merge 84 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
84 commits
Select commit Hold shift + click to select a range
9cd8def
Merge tag '1.4.4' into develop
Hoohm Oct 5, 2019
db6da2b
changed mtx output for features and small printing bug
Hoohm Oct 27, 2019
480ac20
Changed some verbose output
Hoohm Oct 30, 2019
0f0ad0b
Merge branch 'feature/mtx_format' into develop
Hoohm Oct 30, 2019
2fb744b
deleted an enumerate
Hoohm Mar 31, 2020
27b6fc6
added named_tuple ref
Hoohm Apr 4, 2020
2c13410
parallel umis
Hoohm Apr 15, 2020
70e2170
Fixed tests
Hoohm May 3, 2020
ec21653
Merge branch 'feature/namedtuples' into develop
Hoohm May 3, 2020
a6f595c
some updates to CHANGELOG
Hoohm May 3, 2020
5e09e19
more changelog
Hoohm May 3, 2020
ebdda53
fixed slidin_window
Hoohm May 3, 2020
e3ba958
CHANGELOG update
Hoohm May 3, 2020
b414242
got rid of second length check
Hoohm May 4, 2020
795a1a4
integrated a pull from db for chemistry def
Hoohm Jul 5, 2020
9862a82
added remote downloading of definitions
Hoohm Aug 2, 2020
e5dc23d
a lot of code refactoring
Hoohm Aug 2, 2020
f4cd58c
refactoring, moved chunking to io
Hoohm Aug 23, 2020
e5bbe59
some more changes
Hoohm Sep 5, 2020
42d9bc4
Merge tag 'docu_error_132' into develop
Hoohm Sep 5, 2020
7c3fbc6
fixed chunking
Hoohm Sep 6, 2020
e8c0ff5
fixed sprase output
Hoohm Sep 20, 2020
343f8a2
fixed debugging
Hoohm Sep 20, 2020
f0598b5
correction in README
Hoohm Dec 10, 2020
7d20fb5
dealt with merge conflicts
Hoohm Dec 10, 2020
729db14
other merge conflicts
Hoohm Dec 10, 2020
fbf15a0
conflicts resolved
Hoohm Dec 10, 2020
8276494
resolved more conflicts
Hoohm Dec 10, 2020
1b60407
rewrote all preprocssing tests and got rid of step for tags
Hoohm Dec 14, 2020
49cfdba
docstring updates
Hoohm Dec 14, 2020
b6a6861
fixed the test file
Hoohm Dec 15, 2020
507afeb
fixed procssing and io tests
Hoohm Dec 24, 2020
e507c78
fixed mapping
Hoohm Dec 25, 2020
f309737
updated changelog
Hoohm Dec 25, 2020
47cf4e4
changed all whitelist to reference list
Hoohm Dec 28, 2020
1f8f0b6
some more tests for reference_lists
Hoohm Dec 28, 2020
45185c7
some reformatting
Hoohm Dec 28, 2020
197f4ad
fixed when failing to find knee estimate
Hoohm Dec 28, 2020
3476685
fixed test callings
Hoohm Dec 28, 2020
78fc623
fixed wrong chunking
Hoohm Dec 29, 2020
f2ba84a
fixed unmapped not working and reduced cell barcode to correct for
Hoohm Dec 30, 2020
cf72ec2
first column in barcodes.tsv is now the translated barcode
Hoohm Dec 31, 2020
2cd77f8
documentation updates
Hoohm Dec 31, 2020
8b647ed
cleaned up imports and added single thread fix
Hoohm Jan 1, 2021
b3a3e70
single thread fix
Hoohm Jan 19, 2021
df45e47
fixed MTX without translation barcodes
Hoohm Jan 23, 2021
14e8c75
formatting
Hoohm Jan 31, 2021
af48c0a
Merge branch 'feature/barcode_translation' into develop
Hoohm Jan 31, 2021
0f8a8ab
Added tempfile naming for chunks
Hoohm Mar 21, 2021
d753ea4
fixed the io tests
Hoohm May 26, 2021
68b2826
Moved some functions to io
Hoohm May 29, 2021
2d37cbc
some code and testing refactoring
Hoohm May 29, 2021
a867d44
cleaning up test_mapping.py
Hoohm May 29, 2021
5708028
changed reference to translate
Hoohm Jun 6, 2021
5c57d12
refactored the filtering and translation
Hoohm Jul 9, 2021
757a4a6
fixed translation reading
Hoohm Jul 10, 2021
1d6ebdc
added file checks for sequencing data
Hoohm Nov 21, 2021
68c8da7
added some checks
Hoohm Nov 21, 2021
b65e368
fixed writing out unknown barcodes
Hoohm Nov 21, 2021
fb0d226
reformatting
Hoohm Nov 21, 2021
564a5d0
fixed tests
Hoohm Nov 21, 2021
14a9c61
added pyymal
Hoohm Nov 21, 2021
edba219
feat: Add json template
Jul 2, 2022
3e3d529
Pyupgrade to 3.8
Jul 2, 2022
9e4cfea
code reformatting
Jul 3, 2022
9195df9
Fix: Fix testing for preprocessing
Oct 16, 2022
1548e32
Fix: formatting
Oct 30, 2022
a65f62b
Preprocessing: Code refactor with tests
Sep 6, 2023
6c5f9e2
Rewrote barcode correction
Oct 20, 2023
44bf849
feat: rewriting mapping, barcode correction in polars
Oct 31, 2023
5f76224
Fix: python version
Nov 23, 2023
53c5e5b
(feat): Barcode correction using asof_join
Dec 26, 2023
ced3e61
(feat): Mtx writing
Hoohm Dec 28, 2023
292a232
(test): Tests for IO and preprocessing
Hoohm Dec 30, 2023
b3401d8
(feat): Include yaml report again
Hoohm Dec 30, 2023
1ef8ffd
(feat): Mapping in polars only using polars-distance
Hoohm Jan 1, 2024
993f998
(feat): First attempt at UMI correction
Hoohm Jan 3, 2024
416d475
(fix): Mtx writing
Hoohm Jan 4, 2024
81396b3
(fix): duplicated read_counts writing
Hoohm Jan 4, 2024
ddfe048
(feat): Top unmapped are back
Hoohm Jan 5, 2024
5545d93
(chore): Rename pl.Utf8 to pl.String
Hoohm Jan 5, 2024
c137d5e
(Fix): Barcode correction now iterates until it finds the best mappin…
Hoohm Jan 5, 2024
67d6c82
(feat): Read fastq files using polars.
Hoohm Jan 14, 2024
c54dd60
feat: New fastq reader/writer
Hoohm Feb 25, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,6 @@ build/
CITE_seq_Count.egg-info/
__pycache__
*.pyc
.vscode
.vscode
.ruff_cache
.pytest_cache
37 changes: 34 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,41 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](http://keepachangelog.com/)
and this project adheres to [Semantic Versioning](http://semver.org/).

## [1.4.4] - 10.12.2020
## [1.5.0] - XXXX
### Added
- `CITE-seq-Count` is now Compatible with trimmed data. There is a new `too_short` category in the `run_report.yaml`
that will let you know how much you lost due to reads being too short. #123
- UMI correction is now also parallelized and will use the threads given.
- Added a check at the end of the mapping. If more than 99% of the reads are unmapped, CITE-seq-Count will exit. #62
- (BETA) New functionnality that will fetch the chemistry definition from a remote repo to simplify usage and reduce human errors.
- Added cython dependency based on issue #117

### Changed
- Hotfix linked to the issues like #125 showing that python3.8 and above crash on modified loop.
- Fixed bug that checks filenames length instead of number of files.
- The `features.tsv` now has different columns for the tag name and the tag sequence. This keeps the relevant information
in the output files as well as simplifies reading the mtx format when processing the data.
- The mapping step has been changed. It will first write chunks of reads to files and then read in the chunks in each child process.
This should solve the io bottleneck from before.
- There are new options now for parallel computing. `--chunk_size` Determines how many reads will be read per chunk. This should fix issues like #99.
- `--sliding-window` now only checks for exact matches.
- The main results dict now uses an `int` as keys reducing memory footprint.
- Fixed the issue #92 with using `--bc_collapsing_dist 0`.
- Fixed issue #122 and now properly checks number of files.
- Fixed the error in the documentation pointed by issue #132.
- The report is now a proper yaml file. Issue #133
- Removed distance checking on the whitelist because too slow for long whitelists.
- Tags csv file now requires a header with at least "sequence" and "feature_name".
- Updated tags file parsing to make it more reliable.
- Added new tests to help out contributions.
- If no clustered cells found, the dense output matrix will not be written.
- Barcode whitelists are now called reference lists.
- The reference list file now requires a header `reference`. There is now an optional column called `translation`. This is specific to chemistries such as 10xV3 that use different barcodes for mRNA and Antibody tag capture sequences. See more details in the documentation. #139 and #141
- Bumped UMI_tools to 1.1.1
- Changed `-cells` paramter to `-n_cells` for more explicit argument.
- Cell barcode correction when reference list is provided is now discarding cells with only unmapped reads reducing run time.
- Aberrant cells wording replaced by clustered cells to be more specific.

### Removed
- Unmmapped reads are not umi corrected anymore reducing run time and memory usage.


## [1.4.3] - 05.10.2019
Expand Down
Loading