-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
330 consecutive imputation links #15
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely written code and unit test. Tests pass locally. Two minor comments which don't necessarily need to be changed
tests/cumulative_links.csv
Outdated
100,100000,,202404,3,1,2,6,1 | ||
200,100001,,202402,1,4,3,1,2 | ||
200,100001,,202403,3,0.5,3,3,0.5 | ||
200,100001,300,202404,0.5,1,4,0.5,1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be misunderstanding the code / method, but why isn't cumulative_forward_imputation_link for the last row 1.5. Is that because it is in a different imputation group?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes that is correct. We only want to multiply the links where there are consecutive missing values. In fact, maybe we should turn all cumulative links in the same row as a return to nan because it has no meaning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to confirm, the last row is just the same as the imputation_link column because it is a return not a missing value. I have made a change to turn the cumulative link columns to NaN
if there is a return
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy for the code to be merged following the minor changes
100,100000,,202404,3,1,2,6,1 | ||
200,100001,,202402,1,4,3,1,2 | ||
200,100001,,202403,3,0.5,3,3,0.5 | ||
200,100001,300,202404,0.5,1,4,, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes more sense now that the cumulative link doesn't return anything in cases where there is a valid return :)
Summary
Multiple consecutive missing values need imputation links to be multiplied to apply the correct imputation link. This function creates two new columns onto the input DataFrame, one for imputation_group that defines consecutive missing values, and another column for cumulative links.
Checklists
This pull request meets the following requirements:
If you feel some of these conditions do not apply for this pull request, please
add a comment to explain why.