-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ABH format assumes non-missing, polymorphic SNPs between parents #3
Comments
I fixed this bug, but because I was lazy I didn't create a pull request first, and now I regret it. :-( Requesting review from @laceysanderson. All changes are in commit cc808c4 How I fixed the problemWhen looking at the genotype calls for each parent, there is now a check for if the alleles are missing (".") or different (implying heterozygosity), in which case a message is printed to the terminal indicating the SNP site and which parent had the problematic genotype. I then continue to the next line of the while loop, thereby skipping processing of that line in the input file (and no conversion for said line will exist in the output file). How to testAt present, the vcf_filter module uses the first two columns of a VCF as the maternal and paternal parents, respectively. You can upload a VCF file to the module through the administration page with known missing (./.) or heterozygous alleles (0/1 or 1/0) in either of these columns. Then, select your file and select "A/B format" as your Export format. Run the tripal job manually, and check for output regarding skipping of SNP sites in which the first two columns had missing or heterozygous alleles. You can download the output file to confirm these sites were omitted. |
Did some research on ABH format with the intent of deciding whether I could find an alternative to your approach of removing lines where the parents are missing or heterozygous.
I'm not sure I'm recommending either of the alternatives... Perhaps the best approach would be to let the researcher choose by providing ABH format specific options to exclude sites with missing/heterozygous parents or denote non-parental alleles with X. This would definitely need an additional issue and could be considered an enhancement. In the meantime I would suggest clarifying our approach in the description of the ABH format. |
This worked as expected on LR-86 and LR-70. Unfortunately both had already been filtered to be bi-allelic and neither seems to have both parents the same (expected a row with no B). When I tried to test converting the UCDavis set to ABH (which would have had both these issues :-) I ran into #7 and the job seemed to hang. Concerns (to be addressed in a pull request!):
|
Addresses issue #3: ABH format assumes non-missing, polymorphic SNPs between parents
Fixed in PR |
If the parents are identical, or missing, the script will happily translate all SNPs across the individuals to match, i.e. If Parent A is missing, all individuals that are also missing will have allele A.
For now, we've ensured the files we're distributing to our group have polymorphic non-missing calls for the parents. But we should not necessarily be making this assumption and thus we should handle this situation properly in case we run into it in the future.
The text was updated successfully, but these errors were encountered: