Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

earlier CCTCCTT prevents finding leader/D1 #25

Open
phycologia opened this issue Aug 18, 2022 · 4 comments
Open

earlier CCTCCTT prevents finding leader/D1 #25

phycologia opened this issue Aug 18, 2022 · 4 comments

Comments

@phycologia
Copy link
Collaborator

Code worked and found boxb but couldn't find leader or D1--D1 has typical start and end sequences, so I think it may be because there is an earlier "CCTCCTT" which occurs in the sequence

accession number:
MT135015.1

@nlabrad
Copy link
Owner

nlabrad commented Sep 2, 2022

@phycologia what are the flanking regions for this one?
I noticed a potential issue that would make it not find the start, but now I'm dealing with the ends not showing up. Can you confirm what the closing flanking region should be?

I find GACAA as a start (2 of them) and the matching end I have for that is '[AT]TGTC' which means either ATGTC or TTGTC, there is a ATGTC but 500 bases down so I don't think that's it. Am I missing one end for that?
@callmcgovern also check if this makes sense.

@phycologia
Copy link
Collaborator Author

@nlabrad D1 begins with GACCT and ends with AGGTC

@nlabrad
Copy link
Owner

nlabrad commented Sep 2, 2022

Aha, I see it. And now I see what you mean. Maybe if there is more than one ITS starting pattern found we do something. Maybe if there is more than one and d1d1 is not found we try again, idk

@phycologia
Copy link
Collaborator Author

@nlabrad @callmcgovern have we ever come across a sequence that doesn't have "AGGGA" right at the beginning of the ITS region? if that's always present then what if the code searched for "CCTCCT[TA]" followed by "(however you would code '2 or 3 variable bases')" followed by "AGGGA"? what's the range of leader sequence lengths we've seen? I think I've found only 7 & 8, but if it's sometimes longer then I guess it'd need to be a wider range of # of bases between CCTCCT[TA] and AGGGA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants