Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Input files have differing numbers of entries (1882 != 1677) #30

Open
Rohit-Satyam opened this issue Sep 3, 2023 · 1 comment

Comments

@Rohit-Satyam
Copy link

I was trying to use fastq-sample but I keep on getting the error:

Input files have differing numbers of entries (1882 != 1677)
fastq-sample -n 100 -o sampled_60 1107_S34_L001_R1.fastq.gz 1107_S34_L001_R2.fastq.gz

I have attached the files below.

1107_S34_L001_R2.fastq.gz
1107_S34_L001_R1.fastq.gz

@Rohit-Satyam
Copy link
Author

Rohit-Satyam commented Sep 3, 2023

The issue is resolved by unzipping the fastq files. Unzipping is feasible for small data but I have actual files (3GB-9GB zipped). Is there a way to make fastq-sample handle gzipped file.

Besides, the proportion option seems not to be working since total number of reads stays the same in sampled file

## Want to sample 60% of total reads
fastq-sample -p 60 -o sampled_60 -s 1234 1107_S34_L001_R1.fastq 1107_S34_L001_R2.fastq

wc -l sampled_60.1.fastq
146648 sampled_60.1.fastq

wc -l 1107_S34_L001_R1.fastq
146648 1107_S34_L001_R1.fastq

Edit1: Sorry I realized I have to give fraction value. When I provide fraction in zipped format, fastq-sample counts the total number of reads as wc -l file.gz rather than zcat file.gz | wc -l this leas to wrong estimate of reads to be dumped.

Note: I am using v0.8.3 from Bioconda

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant