Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inconsistent handling of repeated rows in cohort #287

Open
jreps opened this issue Dec 6, 2024 · 0 comments
Open

inconsistent handling of repeated rows in cohort #287

jreps opened this issue Dec 6, 2024 · 0 comments
Milestone

Comments

@jreps
Copy link
Contributor

jreps commented Dec 6, 2024

If a cohort has repeated rows (a person is in multiple times with the same id and dates) FE returns odd values.

The SQL counts code does a distinct so the repeated rows are removed when counting how often a concept occurs. However, the denominator, personCount does not do a distinct, see https://github.com/OHDSI/FeatureExtraction/blob/437570aa6a955486f9a4ab5917d64ac857971ed4/R/GetCovariates.R#L154C3-L155C3

This means if a cohort is a repeat of the same row 10 times, and the person has concept 54545 then the count for concept 54545 will be 1, but the person count will be 10. Then FE will return 10% when it should be 100%.

I think the easiest fix would be to edit https://github.com/OHDSI/FeatureExtraction/blob/437570aa6a955486f9a4ab5917d64ac857971ed4/R/GetCovariates.R#L154C3-L155C3 to count(*) after selecting distinct * to ensure multiple rows are only counted once.

@anthonysena anthonysena added this to the v4.0.0 milestone Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants