You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Groupby is more efficient than filtering. So move groupby to weighting loop in script (instead of filtering in generate_balance_sample function and test on EPC sample subset.
Original comment:
I'm not sure if it's worth checking and changing (I don't think it is at this stage), but FYI that I found in the evaluation that a groupby loop was more efficient than a loop in which you subset the df per lsoa within the loop. i.e.
df_select = df.select(cols) # the groupby was also faster with less columns, so worth doing this first too
for lsoa, sample in tqdm(df_select.group_by("lsoa")):
results = generate_balance_sample(sample, target_marginals, lsoa)
Groupby is more efficient than filtering. So move groupby to weighting loop in script (instead of filtering in
generate_balance_sample
function and test on EPC sample subset.Original comment:
I'm not sure if it's worth checking and changing (I don't think it is at this stage), but FYI that I found in the evaluation that a groupby loop was more efficient than a loop in which you subset the df per lsoa within the loop. i.e.
Originally posted by @lizgzil in #26 (comment)
The text was updated successfully, but these errors were encountered: