Skip to content

Commit

Permalink
Remove duplicates in vars_dict and add pre-commit check to guard ag…
Browse files Browse the repository at this point in the history
…ainst dupes (#36)

* Remove duplicates in `vars_dict` and add pre-commit check to guard against future dupes

* Update vars_dict data documentation

* Rename validate-vars-dict -> check-vars-dict

* Add Rscript shebang to check-vars-dict.R hook
  • Loading branch information
jeancochrane authored Jan 14, 2025
1 parent 8b6f53e commit a5449b9
Show file tree
Hide file tree
Showing 6 changed files with 25 additions and 8 deletions.
5 changes: 5 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,11 @@ repos:
entry: Cannot commit .Rhistory, .RData, .Rds or .rds.
language: fail
files: '\.(Rhistory|RData|Rds|rds)$'
- id: check-vars-dict
name: Validate vars_dict
entry: Rscript scripts/check-vars-dict.R
files: data/vars_dict.rda
language: r
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.7.4
hooks:
Expand Down
2 changes: 1 addition & 1 deletion R/data.R
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,7 @@
#' to their human-readable value (ROOF_CNST = 1
#' becomes ROOF_CNST = Shingle/Asphalt).
#'
#' @format A data frame with 509 rows and 11 variables:
#' @format A data frame with 518 rows and 11 variables:
#' \describe{
#' \item{var_name_hie}{Column name of variable when stored in the legacy
#' ADDCHARS SQL table}
Expand Down
7 changes: 1 addition & 6 deletions data-raw/vars_dict.csv
Original file line number Diff line number Diff line change
Expand Up @@ -488,7 +488,7 @@ qu_mlt_cd,card,card,meta_card_num,card_num,Card Number,meta,character,,,
,,parking_space_flag_reason,parking_space_flag_reason,,Reason Parcel Is Considered Parking/Garage Space or Storage Unit,meta,character,,,
,,is_common_area,is_common_area,is_common_area,Building Common Area,meta,logical,,,
,,char_building_units,char_building_units,building_units,Total Condominium Building Livable Parcels,char,numeric,,,
,,char_building_sf ,char_building_sf ,building_sf ,Total Condominium Building Square Footage,char,numeric,,,
,,char_building_sf,char_building_sf,building_sf,Total Condominium Building Square Footage,char,numeric,,,
,,char_unit_sf,char_unit_sf,unit_sf,Condominium Unit Square Footage,char,numeric,,,
,,char_bedrooms,char_bedrooms,bedrooms,Condominium Unit Bedrooms,char,numeric,,,
,,char_half_baths,char_half_baths,half_baths,Condominium Unit Half Baths,char,numeric,,,
Expand All @@ -509,11 +509,6 @@ qu_mlt_cd,card,card,meta_card_num,card_num,Card Number,meta,character,,,
,,ccao_is_active_exe_homeowner,ccao_is_active_exe_homeowner,is_active_exe_homeowner,Active Homeowner Exemption,ccao,logical,,,
,,ccao_n_years_exe_homeowner,ccao_n_years_exe_homeowner,n_years_exe_homeowner,Number of Years Active Homeowner Exemption,ccao,numeric,,,
,,sale_count_past_n_years,meta_sale_count_past_n_years,sale_count_past_n_years,Number of sales within previous N years of sale/lien date,meta,numeric,,,
,,char_building_sf,char_building_sf,building_sf,Building Square Footage,char,numeric,,,
,,char_unit_sf,char_unit_sf,unit_sf,Unit Square Footage,char,numeric,,,
,,char_bedrooms,char_bedrooms,bedrooms,Bedrooms,char,numeric,,,
,,char_half_baths,char_half_baths,half_baths,Half Baths,char,numeric,,,
,,char_full_baths,char_full_baths,full_baths,Full Baths,char,numeric,,,
,,strata_1,meta_strata_1,strata_1,Condominium Building Strata 1,meta,character,,,
,,strata_2,meta_strata_2,strata_2,Condominium Building Strata 2,meta,character,,,
,,shp_parcel_centroid_dist_ft_sd,shp_parcel_centroid_dist_ft_sd,parcel_centroid_dist_ft_sd,Standard Deviation Distance From Parcel Centroid to Vertices (Feet),shp,numeric,,,
Expand Down
Binary file modified data/vars_dict.rda
Binary file not shown.
2 changes: 1 addition & 1 deletion man/vars_dict.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

17 changes: 17 additions & 0 deletions scripts/check-vars-dict.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
#!/usr/bin/env Rscript
# Script to check that the `vars_dict` data object is well-formed
load("data/vars_dict.rda")

# Check for duplicate model parameters
non_na_model_vars <- subset(
vars_dict,
!is.na(var_name_model)
)[c("var_name_model", "var_code", "var_value")]
dupes <- non_na_model_vars[which(duplicated(non_na_model_vars)), ]

if (nrow(dupes) > 0) {
stop(
"Duplicate var_name_model entries in vars_dict: ",
paste(dupes$var_name_model, collapse = ", ")
)
}

0 comments on commit a5449b9

Please sign in to comment.