Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull neighborhood groups from a table in the data warehouse #107

Merged
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file removed data/res_condos_nbhd_groups_2024.xlsx
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is no longer necessary since the groups will be present in the data warehouse once ccao-data/data-architecture#343 lands.

Binary file not shown.
37 changes: 16 additions & 21 deletions manual_flagging/flagging.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,19 @@
WHERE condo.class IN ('297', '299', '399')
AND NOT condo.is_parking_space
AND NOT condo.is_common_area
),

-- Select neighborhood groups and filter for most recent versions
neighborhood_group AS (
SELECT nbhd_group.nbhd, nbhd_group.group_name
FROM location.neighborhood_group AS nbhd_group
INNER JOIN (
SELECT nbhd, MAX(version) AS version
FROM location.neighborhood_group
GROUP BY nbhd
) AS latest_group_version
Comment on lines +103 to +107
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this join could also be extended to identify the version of the neighborhood groups that was used for a specific run; see the removed code in 70af482 for a sketch of what that would look like.

ON nbhd_group.nbhd = latest_group_version.nbhd
AND nbhd_group.version = latest_group_version.version
)

-- Now, join with sale table and filters
Expand All @@ -104,6 +117,7 @@
sale.seller_name AS meta_sale_seller_name,
sale.buyer_name AS meta_sale_buyer_name,
sale.nbhd as nbhd,
nbhd_group.group_name as geography_split,
sale.sale_filter_ptax_flag AS ptax_flag_original,
data.class,
data.township_code,
Expand All @@ -120,6 +134,8 @@
INNER JOIN default.vw_pin_universe universe
ON universe.pin = data.pin
AND universe.year = data.year
LEFT JOIN neighborhood_group nbhd_group
ON res.nbhd = nbhd_group.nbhd
WHERE {sql_time_frame}
AND NOT sale.sale_filter_same_sale_within_365
AND NOT sale.sale_filter_less_than_10k
Expand Down Expand Up @@ -163,27 +179,6 @@
current_year = datetime.datetime.now().year
df["char_bldg_age"] = current_year - df["yrblt"]

"""
Ingest and join new geographic groups for current methodology.

To update our methodology with new geographic classifications, we currently
utilize the 'geography_split' column, which is effective for uniform groupings
across all market types, as observed in the city tri(1). For
subsequent tris, if new classifications are consistent across markets,
they can be appended to the 'geo_geography_split' column. However, for
market-specific variations (e.g., condos vs. single-family homes),
we should introduce an additional column or use a conditional join to
ensure accurate integration of these diverse groupings.
"""

df_new_groups_tri1 = pd.read_excel(
os.path.join(root, "data", "res_condos_nbhd_groups_2024.xlsx"),
usecols=["Town Nbhd", "Town Grp 1"],
).rename(columns={"Town Nbhd": "nbhd", "Town Grp 1": "geography_split"})

df["nbhd"] = df["nbhd"].astype(int)
df = pd.merge(df, df_new_groups_tri1, on="nbhd", how="left")


def create_bins_and_labels(input_list):
"""
Expand Down
Loading