-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Genes with multiple chromosomes #2
Comments
Do we want to split these into multiple records when creating |
The X|Y ones are in the pseudoautosomal regions of the X and Y chromosomes. I would not be worried about those and would not split them. These should be retained. OMS looks like a susceptibility "gene." It's not really a molecular entity, just a set of association signal regions: https://www.ncbi.nlm.nih.gov/gene/?term=619538 . This could be dropped. The others appear to be on unplaced scaffolds: https://www.ncbi.nlm.nih.gov/gene/?term=105379561 By the way - if someone picks a gene on the X or Y chromosomes other than those in the X|Y set, you may want to automatically detect it and build separate male and female classifiers. This is a strong signal in expression data, even for unsupervised learning. |
@cgreene, we're including this file to map
So unless we split chromosomes, these genes will not map. I propose splitting with an optional step to include the unsplit rows. Therefore the top row would yield:
Do you think we should even keep the last row? |
May want to open an issue in machine-learning. |
Okay leaving these in will in effect drop them because the resource being mapped won't have that symbol-chromosome combination. No need to explicitly filter. |
@dhimmel : for the purposes of having a resource to connect potential symbols with chromosomes, I think that retaining at least the first two lines would make the most sense. Maybe the third - I don't know how many resources use X|Y for these regions. I don't see the harm in it, so I guess my inclination would be to leave it as well. |
Genes with multiple chromosomes now receive multiple rows for each chromosome as well as retaining the multi-chromosome value. cognoma#2 (comment) Genes with a missing value for chromosome are removed.
@cgreene in b64fcb4 I retained all three lines. However, there is another issue -- some genes have no chromosome. For example:
These genes all have type |
These - to my knowledge - come from the expectation that there exists a gene for the disease but nobody has found it. They aren't really meaningful molecular entities and expect that you won't see them in practice. |
Genes with multiple chromosomes now receive multiple rows for each chromosome as well as retaining the multi-chromosome value. cognoma#2 (comment) Genes with a missing value for chromosome are removed.
Download and process Entrez Gene. Create gene identification guidelines for Project Cognoma. Closes cognoma#2.
What does it mean for a gene to have multiple chromosomes? Here are all the genes from
genes.tsv
that exhibited multiple chromosomes:The text was updated successfully, but these errors were encountered: