-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pull neighborhood groups from a table in the data warehouse #107
Merged
jeancochrane
merged 6 commits into
main
from
jeancochrane/pull-neighborhood-groups-from-data-warehouse
Mar 15, 2024
Merged
Changes from 5 commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
9a74de1
Pull neighborhood groups from data warehouse
jeancochrane 62a3c44
Merge main into branch 'jeancochrane/pull-neighborhood-groups-from-da…
jeancochrane 62a27e7
Appease formatter
jeancochrane db5869d
Remove legacy Excel workbook with neighborhood group definitions
jeancochrane 70af482
Remove run_date functionality from manual_flagging/flagging.py
jeancochrane d690b51
Join to neighborhood_group on sale.nbhd not res.nbhd in manual_flaggi…
jeancochrane File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -94,6 +94,19 @@ | |
WHERE condo.class IN ('297', '299', '399') | ||
AND NOT condo.is_parking_space | ||
AND NOT condo.is_common_area | ||
), | ||
|
||
-- Select neighborhood groups and filter for most recent versions | ||
neighborhood_group AS ( | ||
SELECT nbhd_group.nbhd, nbhd_group.group_name | ||
FROM location.neighborhood_group AS nbhd_group | ||
INNER JOIN ( | ||
SELECT nbhd, MAX(version) AS version | ||
FROM location.neighborhood_group | ||
GROUP BY nbhd | ||
) AS latest_group_version | ||
Comment on lines
+103
to
+107
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that this join could also be extended to identify the version of the neighborhood groups that was used for a specific run; see the removed code in 70af482 for a sketch of what that would look like. |
||
ON nbhd_group.nbhd = latest_group_version.nbhd | ||
AND nbhd_group.version = latest_group_version.version | ||
) | ||
|
||
-- Now, join with sale table and filters | ||
|
@@ -104,6 +117,7 @@ | |
sale.seller_name AS meta_sale_seller_name, | ||
sale.buyer_name AS meta_sale_buyer_name, | ||
sale.nbhd as nbhd, | ||
nbhd_group.group_name as geography_split, | ||
sale.sale_filter_ptax_flag AS ptax_flag_original, | ||
data.class, | ||
data.township_code, | ||
|
@@ -120,6 +134,8 @@ | |
INNER JOIN default.vw_pin_universe universe | ||
ON universe.pin = data.pin | ||
AND universe.year = data.year | ||
LEFT JOIN neighborhood_group nbhd_group | ||
ON res.nbhd = nbhd_group.nbhd | ||
WHERE {sql_time_frame} | ||
AND NOT sale.sale_filter_same_sale_within_365 | ||
AND NOT sale.sale_filter_less_than_10k | ||
|
@@ -163,27 +179,6 @@ | |
current_year = datetime.datetime.now().year | ||
df["char_bldg_age"] = current_year - df["yrblt"] | ||
|
||
""" | ||
Ingest and join new geographic groups for current methodology. | ||
|
||
To update our methodology with new geographic classifications, we currently | ||
utilize the 'geography_split' column, which is effective for uniform groupings | ||
across all market types, as observed in the city tri(1). For | ||
subsequent tris, if new classifications are consistent across markets, | ||
they can be appended to the 'geo_geography_split' column. However, for | ||
market-specific variations (e.g., condos vs. single-family homes), | ||
we should introduce an additional column or use a conditional join to | ||
ensure accurate integration of these diverse groupings. | ||
""" | ||
|
||
df_new_groups_tri1 = pd.read_excel( | ||
os.path.join(root, "data", "res_condos_nbhd_groups_2024.xlsx"), | ||
usecols=["Town Nbhd", "Town Grp 1"], | ||
).rename(columns={"Town Nbhd": "nbhd", "Town Grp 1": "geography_split"}) | ||
|
||
df["nbhd"] = df["nbhd"].astype(int) | ||
df = pd.merge(df, df_new_groups_tri1, on="nbhd", how="left") | ||
|
||
|
||
def create_bins_and_labels(input_list): | ||
""" | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is no longer necessary since the groups will be present in the data warehouse once ccao-data/data-architecture#343 lands.