You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently this repo has three separate flagging scripts that do slightly different things:
manual_flagging/initial_flagging.py reads params from a local YAML file and assumes no sales have been flagged, so it queries all sales and adds flags with version = 1
manual_flagging/manual_update.py reads params from a local YAML file and assumes that some sales have already been flagged but need to be re-flagged, so it queries all sales and adds flags with either version += 1 (if the sale had already been flagged by a previous flagging run) or version = 1 (if the sale had not already been flagged)
glue/sales_val_flagging.py reads params from AWS Glue context and assumes that some sales have already been flagged but do not need to be re-flagged, so it queries all unflagged sales and adds flags with version = 1
The separation of these scripts means that most changes to the flagging logic need to be repeated three times in order to be persisted to this repo, which represents a maintenance burden. I think that the differences in logic between the scripts are small enough that we could pretty easily factor out the shared logic and switch on the different logic with some simple conditional branches.
Here's a rough idea of what I'm imagining:
One script called something simple like flag.py
The script accepts command line flags using click:
--params (enum, glue or local, defaults to local)
--reflag (boolean, defaults to False)
--unflagged-only (boolean, defaults to False; if this is true, --reflag is a no-op)
Perhaps all flags other than --params should just be included in the YAML file or Glue params; I'm not sure about this yet
Command line flags (or YAML/Glue params) get used to control the behavior of the script and switch between what we currently think of as initial/update/Glue run types
And here's how these flags could replace the existing scripts:
Initial flagging: python3 flag.py --params local
Manual update: python3 flag.py --params local --reflag
Currently this repo has three separate flagging scripts that do slightly different things:
manual_flagging/initial_flagging.py
reads params from a local YAML file and assumes no sales have been flagged, so it queries all sales and adds flags withversion = 1
manual_flagging/manual_update.py
reads params from a local YAML file and assumes that some sales have already been flagged but need to be re-flagged, so it queries all sales and adds flags with eitherversion += 1
(if the sale had already been flagged by a previous flagging run) orversion = 1
(if the sale had not already been flagged)glue/sales_val_flagging.py
reads params from AWS Glue context and assumes that some sales have already been flagged but do not need to be re-flagged, so it queries all unflagged sales and adds flags withversion = 1
The README has a handy chart explaining the differences in detail.
The separation of these scripts means that most changes to the flagging logic need to be repeated three times in order to be persisted to this repo, which represents a maintenance burden. I think that the differences in logic between the scripts are small enough that we could pretty easily factor out the shared logic and switch on the different logic with some simple conditional branches.
Here's a rough idea of what I'm imagining:
flag.py
--params
(enum,glue
orlocal
, defaults tolocal
)--reflag
(boolean, defaults toFalse
)--unflagged-only
(boolean, defaults toFalse
; if this is true,--reflag
is a no-op)--params
should just be included in the YAML file or Glue params; I'm not sure about this yetAnd here's how these flags could replace the existing scripts:
python3 flag.py --params local
python3 flag.py --params local --reflag
python3 flag.py --params glue --unflagged-only
This design is just a rough sketch though, let's nail it down once you're ready to get started @wagnerlmichael!
The text was updated successfully, but these errors were encountered: