Fix: UpdateSymbolList incorrectly renames genes #8179
+6
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi Seurat Team,
This is PR that builds on previous fix described in #4545. This is by no means perfect fix (explanation below) so I leave it to you to decide path forward. If you decide a different solution is warranted I'm happy to help with PR if desired.
The issues is with potential for
UpdateSymbolList
to inappropriately rename genes. In original case the search for alias symbols was removed from internals ofUpdateSymbolList
by manually setting the parameter to previous symbols only. However, that unfortunately still causes issues as there are a number of previous symbols which are now symbols of different genes. For instance the genes MCM2, MCM7, and CCNL1 which are all currently approved genes. However, in current formUpdateSymbolList
reverts changes:Using the most recent 10X human reference genome (which is filtered so this is not full extent of potential issues), I have found >100 genes which would be inappropriately swapped.
The "simple" solution which is in this PR to avoid potential issues I added parameter to require that an object be specified and synonyms only be used if they are genes not already found in the object. This limits the function to use with Seurat object but protects against inappropriate renames (though not completely).
The reason it's not complete solution is because most Seurat objects are filtered versions of the count matrix and this often results in objects with half the genes present in the annotation file. Therefore the function does still leave the possibility to inappropriately rename a gene if it was gene that was filtered out during object creation. In order to avoid completely, it different fix and for Seurat to store full feature list from the
counts
input somewhere in the object to check against vs checking against the current features withFeatures
.Again if current PR solution is not desired that is totally fine but wanted you to be alert to issue.
Best,
Sam
Note/Edit: CI failure appears to be related to BioCManager install error not this PR.