-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flip outlier indicator and use longer value fields in scripts/export.csv
#115
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -20,6 +20,9 @@ | |
OUTLIER_TYPE_FIELD = "USER27" | ||
RUN_ID_FIELD = "USER28" | ||
|
||
IS_OUTLIER_INDICATOR = "N: Non arms-length" | ||
NON_OUTLIER_INDICATOR = "Y: Is arms-length" | ||
|
||
OUTLIER_TYPE_CODES = { | ||
"Not outlier": "0", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we also want to use the long value for the outlier type, in order to make that field read-only? I notice that the iasWorld data model does not have an "analyst determination reason" field in addition to "model determination reason", as "model determination"/"analyst determination" does, so perhaps we want this field to be writeable, but I also wonder if that was an oversight in our iasWorld schema design. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We do actually want the determination reason to be writable, as it's basically shared between the model / analyst determination fields. |
||
"Anomaly (high)": "1", | ||
|
@@ -78,8 +81,8 @@ | |
sale.sale_key AS {SALE_KEY_FIELD}, | ||
CASE | ||
WHEN flag.sv_is_outlier = TRUE | ||
THEN 'Y' | ||
ELSE 'N' | ||
THEN '{IS_OUTLIER_INDICATOR}' | ||
ELSE '{NON_OUTLIER_INDICATOR}' | ||
END AS {IS_OUTLIER_FIELD}, | ||
flag.sv_outlier_type AS {OUTLIER_TYPE_FIELD}, | ||
flag.run_id AS {RUN_ID_FIELD} | ||
|
@@ -119,19 +122,19 @@ | |
].empty, f"{OUTLIER_TYPE_FIELD} contains invalid codes" | ||
|
||
assert flag_df[ | ||
(flag_df[IS_OUTLIER_FIELD] == "Y") | ||
(flag_df[IS_OUTLIER_FIELD] == IS_OUTLIER_INDICATOR) | ||
& (flag_df[OUTLIER_TYPE_FIELD] == OUTLIER_TYPE_CODES["Not outlier"]) | ||
].empty, ( | ||
f"{OUTLIER_TYPE_FIELD} cannot be {OUTLIER_TYPE_CODES['Not outlier']} " | ||
f"when {IS_OUTLIER_FIELD} is Y" | ||
f"{OUTLIER_TYPE_FIELD} cannot be '{OUTLIER_TYPE_CODES['Not outlier']}' " | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added enclosing quotation marks for a slightly more readable error message. |
||
f"when {IS_OUTLIER_FIELD} is '{IS_OUTLIER_INDICATOR}'" | ||
) | ||
|
||
assert flag_df[ | ||
(flag_df[IS_OUTLIER_FIELD] == "N") | ||
(flag_df[IS_OUTLIER_FIELD] == NON_OUTLIER_INDICATOR) | ||
& (flag_df[OUTLIER_TYPE_FIELD] != OUTLIER_TYPE_CODES["Not outlier"]) | ||
].empty, ( | ||
f"{OUTLIER_TYPE_FIELD} must be {OUTLIER_TYPE_CODES['Not outlier']} " | ||
f"when {IS_OUTLIER_FIELD} is N" | ||
f"{OUTLIER_TYPE_FIELD} must be '{OUTLIER_TYPE_CODES['Not outlier']}' " | ||
f"when {IS_OUTLIER_FIELD} is '{NON_OUTLIER_INDICATOR}'" | ||
) | ||
|
||
assert ( | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this has the potential to become incredibly confusing. Let's all align on the correct indicator meaning/direction this afternoon and keep this PR open until then.