-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add garden size estimate evaluation notebook #115
base: dev
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @crispy-wonton this looked good to me, and I thought the filtering steps made sense (and in the right order) - I commented on a few bits for clarification though.
total_avg_gardens_df = avg_gardens_df.filter( | ||
pl.col("msoa_avg_outdoor_space_property_type") == "unknown" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come you didn't compare all the property types?
# Get counts of UPRN per garden | ||
uprn_count = gardens_df.group_by("NATIONALCADASTRALREFERENCE").agg( | ||
pl.col("UPRN").count().alias("UPRN_count") | ||
) | ||
|
||
# Assign 1 garden size per cadastral | ||
cadastral_garden_size = gardens_df.group_by("NATIONALCADASTRALREFERENCE").agg( | ||
pl.col("garden_area_m2").first().alias("cadastral_garden_size_m2") | ||
) | ||
|
||
# Join to garden size df | ||
gardens_df = gardens_df.join( | ||
uprn_count, how="left", on="NATIONALCADASTRALREFERENCE" | ||
).join(cadastral_garden_size, how="left", on="NATIONALCADASTRALREFERENCE") | ||
|
||
# Divide shared gardens equally among UPRNs sharing gardens | ||
gardens_df = gardens_df.with_columns( | ||
(pl.col("cadastral_garden_size_m2") / pl.col("UPRN_count")).alias( | ||
"divided_garden_area_m2" | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't expecting these steps. I thought "garden_area_m2" was the estimation for each UPRN, so surprised "divided_garden_area_m2" needed to be calculated. Where have I misunderstood?
pl.col("divided_garden_area_m2").is_not_null(), | ||
pl.col("divided_garden_area_m2") > 0, | ||
pl.col("divided_garden_area_m2") < pl.col("divided_garden_area_m2").quantile(0.99), | ||
pl.col("weight").is_not_null(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't remember why a weight would be null? Was it when the value was from a dummy property?
) | ||
|
||
# Filter results to MSOAs with averages calculated from 15 or more properties | ||
results = results.filter(pl.col("n_properties") >= 15) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here and in the weighted results you could increase this threshold - perhaps set to the 0.1 quantile? (which I think is 51). or was there a reason to use 15?
Fixes #114
Description
add garden size estimate evaluation notebook
Instructions for Reviewer
to generate the notebook, ensure
jupytext
is installed and then run the following line in terminal:jupytext --to notebook asf_heat_pump_suitability/analysis/garden_size_estimates/20250106_evaluate_garden_size_estimates.py
Please pay special attention to ...
Please could you provide some feedback on this analysis. Namely:
Checklist:
notebooks/
pre-commit
and addressed any issues not automatically fixeddev
README
s