Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add garden size estimate evaluation notebook #115

Draft
wants to merge 2 commits into
base: dev
Choose a base branch
from

Conversation

crispy-wonton
Copy link
Collaborator

@crispy-wonton crispy-wonton commented Jan 22, 2025

Fixes #114

Description

add garden size estimate evaluation notebook

Instructions for Reviewer

to generate the notebook, ensure jupytext is installed and then run the following line in terminal:
jupytext --to notebook asf_heat_pump_suitability/analysis/garden_size_estimates/20250106_evaluate_garden_size_estimates.py

Please pay special attention to ...
Please could you provide some feedback on this analysis. Namely:

  • Do you think the selection criteria are appropriate and applied in the correct order?
  • Have I accounted for the appropriate corrections to make this a fair analysis (e.g. applying weights, checking weights add to at least 0.9 etc.)
  • Are there any suggestions you have to make the analysis more robust or accurate?
  • Have the weights been applied correctly?
  • Is there anything we should add to the analysis?

Checklist:

  • I have refactored my code out from notebooks/
  • I have checked the code runs
  • I have tested the code
  • I have run pre-commit and addressed any issues not automatically fixed
  • I have merged any new changes from dev
  • I have documented the code
    • Major functions have docstrings
    • Appropriate information has been added to READMEs
  • I have explained this PR above
  • I have requested a code review

Copy link
Collaborator

@lizgzil lizgzil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @crispy-wonton this looked good to me, and I thought the filtering steps made sense (and in the right order) - I commented on a few bits for clarification though.

Comment on lines +241 to +243
total_avg_gardens_df = avg_gardens_df.filter(
pl.col("msoa_avg_outdoor_space_property_type") == "unknown"
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come you didn't compare all the property types?

Comment on lines +28 to +48
# Get counts of UPRN per garden
uprn_count = gardens_df.group_by("NATIONALCADASTRALREFERENCE").agg(
pl.col("UPRN").count().alias("UPRN_count")
)

# Assign 1 garden size per cadastral
cadastral_garden_size = gardens_df.group_by("NATIONALCADASTRALREFERENCE").agg(
pl.col("garden_area_m2").first().alias("cadastral_garden_size_m2")
)

# Join to garden size df
gardens_df = gardens_df.join(
uprn_count, how="left", on="NATIONALCADASTRALREFERENCE"
).join(cadastral_garden_size, how="left", on="NATIONALCADASTRALREFERENCE")

# Divide shared gardens equally among UPRNs sharing gardens
gardens_df = gardens_df.with_columns(
(pl.col("cadastral_garden_size_m2") / pl.col("UPRN_count")).alias(
"divided_garden_area_m2"
)
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't expecting these steps. I thought "garden_area_m2" was the estimation for each UPRN, so surprised "divided_garden_area_m2" needed to be calculated. Where have I misunderstood?

pl.col("divided_garden_area_m2").is_not_null(),
pl.col("divided_garden_area_m2") > 0,
pl.col("divided_garden_area_m2") < pl.col("divided_garden_area_m2").quantile(0.99),
pl.col("weight").is_not_null(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember why a weight would be null? Was it when the value was from a dummy property?

)

# Filter results to MSOAs with averages calculated from 15 or more properties
results = results.filter(pl.col("n_properties") >= 15)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here and in the weighted results you could increase this threshold - perhaps set to the 0.1 quantile? (which I think is 51). or was there a reason to use 15?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Evaluate garden size estimates
2 participants