Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error with clean-names.bf #66

Open
00-kelvin opened this issue Dec 19, 2024 · 7 comments
Open

error with clean-names.bf #66

00-kelvin opened this issue Dec 19, 2024 · 7 comments

Comments

@00-kelvin
Copy link

Hi there,

I keep getting the following error when I try to use the clean-names.bf tool. I think my regexp is correct for my situation based on the instructions in the README, and I have tried several iterations, but please excuse me if the mistake is entirely on my end.

My alignment is formatted as follows, with >Species_name|GeneID:

>Araneus_inustus|IAHS01031913.1.p1
ATGCCTGACTACTTAGGAGACGATATGAGAAAAACAAAGGGT---GAT------AATGAAAAGGAAAGAGATGTTAAAGCTTTGGATGAAGGAGATATTGCTTTATTGAAATCATATGGTGTTGGCCAGT>
>Acroaspis_sp_IDV6688|IANO01004068.1.p1
ATGCCTGACTACTTAGGGGATGATATGAGAAAAACGAAA------GATGCTATCGAAGAAAAGGAAAAAGACGTTAAAGCTTTGGATGAAGGAGATATAGCGTTGTTGAAATCATATGGAGTTGGCCAAT>
>Eriophora_pustulosa|IBIK01018306.1.p1

and my tree has just the species names (plus annotations and bootstrap values):

((((('Ambicodamus_IDV6680':0.321072,(('Uloborus_diversus'{Foreground}:0.464614,('Deinopis_sp_IDV7365'{Foreground}:0.0983693, ...

So I am trying to match just on the species name (before the pipe)

The error:

hyphy clean-names.bf --msa ~/data/macse_171224/N5.HOG0067803/N5.HOG0067803_NT.fasta --tree ~/kelvin-scratch/data/family-tree/n5_orbweavers_all_fg.tree --regexp '^([a-zA-Z0-9_]+)\|+$' --output cleannames.nxh

Analysis Description
--------------------
 Read an alignment and a tree and rename all the sequences to conform
with HyPhy naming requirements. The result will be written as a combined
alignment, with the format specified by the DATA_FILE_PRINT_FORMAT HyPhy
environment variable 

- __Requirements__: An MSA and a tree.

- __Citation__: TBD

- __Written by__: Sergei L Kosakovsky Pond

- __Contact Information__: [email protected]

- __Analysis Version__: 0.0.1

Use the following regular expression to select a subset of leaves : regexp: ^([a-zA-Z0-9_>]+)\|+$
Error:
Could not match pattern for sequence name __iterator_end__loop__ in call to assert(None!=match.me, error_msg);

Function call stack
1 :  assert(None!=match.me, error_msg);

	Keyword arguments:
		{
		 "output":"cleannames.nxh"
		}
-------
2 :  ExecuteCommands("assert (`statement`, error_msg)", /home/crunnel2/anaconda3/envs/selection/share/hyphy/TemplateBatchFiles/libv3/);

	Keyword arguments:
		{
		 "output":"cleannames.nxh"
		}
-------
3 :  io.CheckAssertion("None!=match.me","Could not match pattern for sequence name `v`");

	Keyword arguments:
		{
		 "output":"cleannames.nxh"
		}
-------

Check errors.log for execution error details.

The line "Could not match pattern for sequence name iterator_end__loop" made me wonder if the error might be outside my control since of course none of my sequences are called that.

Thanks in advance for your help!

@spond
Copy link
Member

spond commented Dec 19, 2024

Dear @00-kelvin,

Yeah, that error looks odd (iterator_end__loop is an internal name). Let me check.

Best,
Sergei

@spond
Copy link
Member

spond commented Dec 19, 2024

Dear @00-kelvin,

Would you be able to share the complete example with me (alignment, tree)? I am not able to easily replicate the issue. Also, could you confirm your hyphy --version output?

Best,
Sergei

@00-kelvin
Copy link
Author

Certainly, here they are:

(selection) crunnel2@login01:~/bin/hyphy-analyses/clean-names$ hyphy --version
HYPHY 2.5.64(MP) for Linux on x86_64 x86 SSE4 SIMD zlib (v1.3.1)
(selection) crunnel2@login01:~/bin/hyphy-analyses/clean-names$ hyphy clean-names.bf --msa ~/data/macse_171224/N5.HOG0067803/N5.HOG0067803_NT.fasta --tree ~/kelvin-scratch/data/family-tree/n5.tree --regexp '^([a-zA-Z0-9_]+)\|+$' --output ~/scratch/test.nxh

Analysis Description
--------------------
 Read an alignment and a tree and rename all the sequences to conform
with HyPhy naming requirements. The result will be written as a combined
alignment, with the format specified by the DATA_FILE_PRINT_FORMAT HyPhy
environment variable 

- __Requirements__: An MSA and a tree.

- __Citation__: TBD

- __Written by__: Sergei L Kosakovsky Pond

- __Contact Information__: [email protected]

- __Analysis Version__: 0.0.1

Use the following regular expression to select a subset of leaves : regexp: ^([a-zA-Z0-9_]+)\|+$
Error:
Could not match pattern for sequence name __iterator_end__loop__ in call to assert(None!=match.me, error_msg);

Function call stack
1 :  assert(None!=match.me, error_msg);

	Keyword arguments:
		{
		 "output":"/home/crunnel2/scratch/test.nxh"
		}
-------
2 :  ExecuteCommands("assert (`statement`, error_msg)", /home/crunnel2/anaconda3/envs/selection/share/hyphy/TemplateBatchFiles/libv3/);

	Keyword arguments:
		{
		 "output":"/home/crunnel2/scratch/test.nxh"
		}
-------
3 :  io.CheckAssertion("None!=match.me","Could not match pattern for sequence name `v`");

	Keyword arguments:
		{
		 "output":"/home/crunnel2/scratch/test.nxh"
		}
-------

Check errors.log for execution error details.

MSA and tree file attached in zip file:
example.zip
There may very well be some issue with my tree in particular; I was having some trouble with editing it to remove nodes not included in the alignment which could potentially be a source of error.

@spond
Copy link
Member

spond commented Dec 19, 2024

Dear @00-kelvin,

OK, so clean-names.bf was reporting the error incorrectly (it was actually failing to match the regexp).
I also added automatic tree unrooting (otherwise you may get downstream errors). I pushed a fix.

Use --regexp '^([a-zA-Z0-9_]+)\|?.*$' because it has to match both the tree and the alignment. You tree names do not have the |... part.

Best,
Sergei

@00-kelvin
Copy link
Author

Thank you so much, Sergei. I misunderstood the README on how to design the regexp -- your version worked!

I hate to the bearer of bugs, but I thought I might as well mention that now that I have it working, I can see that clean-names.bf also strips out annotations from the provided tree if they exist. Since the label-trees utility doesn't seem to work on the output of clean-names (presumably because it's not in Newick format anymore?) it would be great if clean-names could preserve those labels as well.

No urgency here though, I can certainly work around this -- I just thought you might like to be made aware, and I feel some responsibility for clean-names in particular since you actually made it in response to one of my GitHub issues earlier this year. thank you again! :-)

@spond
Copy link
Member

spond commented Dec 20, 2024

Dear @00-kelvin,

We always appreciate bug reports! In this case it's more of an oversight on my part; carrying labels through requires specific code to implement, and it simply wasn't there.

clean-names outputs alignment + tree in a single file and label-tree expects tree files only, hence the disconnect. At any rate, I changed clean-names to preserve {} annotation, when present. Try pulling the latest update and see if that helps.

Best,
Sergei

@00-kelvin
Copy link
Author

Thank you again, Sergei! As always I appreciate your help and speedy responses. Hope you have a happy new year!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants