Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Everything is nan when running the full list of named genes from hgnc #51

Open
lubianat opened this issue Jul 25, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@lubianat lubianat added the bug Something isn't working label Jul 25, 2024
@lubianat
Copy link
Member Author

for biological processes it runs fine:

image

@lubianat
Copy link
Member Author

ChatGPT comments:

Scenarios for "nan" (not a number):

Division by Zero or Invalid Operations:
    In the calculation of odds_ratio, if both the numerator and denominator are zero, it would typically be set to nan because 0000​ is undefined. However, the code uses max(1.0 * (n - x) * (N - x), 1) in the denominator to avoid division by zero, but there could still be scenarios where other operations lead to nan.
    The np.log10(p_value) operation can result in nan if p_value is zero or negative (logarithms of zero or negative numbers are undefined). This can lead to nan in combined_score.

Scenarios for "inf" (infinity):

Logarithm of a Very Small p-value:
    The -np.log10(p_value) component in combined_score can result in inf if p_value is extremely small. In floating-point arithmetic, if p_value is so small that it's effectively zero, np.log10(p_value) could approach negative infinity, and the negative sign in -np.log10(p_value) would make this inf.
Extremely Large odds_ratio:
    If the denominator (n - x) * (N - x) in the odds_ratio calculation is very close to zero (but not exactly zero due to the max function), the odds_ratio could become extremely large, possibly leading to inf.

Preventative Measures in Code:

max(..., 1) in the denominator: This avoids division by zero but can still lead to large values if the term in the max function is close to zero.
Handling log of zero or negative values: np.log10(p_value) should be handled carefully, ensuring p_value is never zero or negative, which could be addressed by checking p_value > 0 before computing the logarithm.

@lubianat
Copy link
Member Author

Not necessarily a bug. Maybe put the estimate of the p-value to be min of 1e-100

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant