Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flip outlier indicator and use longer value fields in scripts/export.csv #115

Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 11 additions & 8 deletions scripts/export.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,9 @@
OUTLIER_TYPE_FIELD = "USER27"
RUN_ID_FIELD = "USER28"

IS_OUTLIER_INDICATOR = "N: Non arms-length"
NON_OUTLIER_INDICATOR = "Y: Is arms-length"

Comment on lines +23 to +25
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this has the potential to become incredibly confusing. Let's all align on the correct indicator meaning/direction this afternoon and keep this PR open until then.

OUTLIER_TYPE_CODES = {
"Not outlier": "0",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also want to use the long value for the outlier type, in order to make that field read-only? I notice that the iasWorld data model does not have an "analyst determination reason" field in addition to "model determination reason", as "model determination"/"analyst determination" does, so perhaps we want this field to be writeable, but I also wonder if that was an oversight in our iasWorld schema design.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do actually want the determination reason to be writable, as it's basically shared between the model / analyst determination fields.

"Anomaly (high)": "1",
Expand Down Expand Up @@ -78,8 +81,8 @@
sale.sale_key AS {SALE_KEY_FIELD},
CASE
WHEN flag.sv_is_outlier = TRUE
THEN 'Y'
ELSE 'N'
THEN '{IS_OUTLIER_INDICATOR}'
ELSE '{NON_OUTLIER_INDICATOR}'
END AS {IS_OUTLIER_FIELD},
flag.sv_outlier_type AS {OUTLIER_TYPE_FIELD},
flag.run_id AS {RUN_ID_FIELD}
Expand Down Expand Up @@ -119,19 +122,19 @@
].empty, f"{OUTLIER_TYPE_FIELD} contains invalid codes"

assert flag_df[
(flag_df[IS_OUTLIER_FIELD] == "Y")
(flag_df[IS_OUTLIER_FIELD] == IS_OUTLIER_INDICATOR)
& (flag_df[OUTLIER_TYPE_FIELD] == OUTLIER_TYPE_CODES["Not outlier"])
].empty, (
f"{OUTLIER_TYPE_FIELD} cannot be {OUTLIER_TYPE_CODES['Not outlier']} "
f"when {IS_OUTLIER_FIELD} is Y"
f"{OUTLIER_TYPE_FIELD} cannot be '{OUTLIER_TYPE_CODES['Not outlier']}' "
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added enclosing quotation marks for a slightly more readable error message.

f"when {IS_OUTLIER_FIELD} is '{IS_OUTLIER_INDICATOR}'"
)

assert flag_df[
(flag_df[IS_OUTLIER_FIELD] == "N")
(flag_df[IS_OUTLIER_FIELD] == NON_OUTLIER_INDICATOR)
& (flag_df[OUTLIER_TYPE_FIELD] != OUTLIER_TYPE_CODES["Not outlier"])
].empty, (
f"{OUTLIER_TYPE_FIELD} must be {OUTLIER_TYPE_CODES['Not outlier']} "
f"when {IS_OUTLIER_FIELD} is N"
f"{OUTLIER_TYPE_FIELD} must be '{OUTLIER_TYPE_CODES['Not outlier']}' "
f"when {IS_OUTLIER_FIELD} is '{NON_OUTLIER_INDICATOR}'"
)

assert (
Expand Down
Loading