Consolidate flagging scripts #92

jeancochrane · 2024-01-24T17:54:22Z

Currently this repo has three separate flagging scripts that do slightly different things:

manual_flagging/initial_flagging.py reads params from a local YAML file and assumes no sales have been flagged, so it queries all sales and adds flags with version = 1
manual_flagging/manual_update.py reads params from a local YAML file and assumes that some sales have already been flagged but need to be re-flagged, so it queries all sales and adds flags with either version += 1 (if the sale had already been flagged by a previous flagging run) or version = 1 (if the sale had not already been flagged)
glue/sales_val_flagging.py reads params from AWS Glue context and assumes that some sales have already been flagged but do not need to be re-flagged, so it queries all unflagged sales and adds flags with version = 1

The README has a handy chart explaining the differences in detail.

The separation of these scripts means that most changes to the flagging logic need to be repeated three times in order to be persisted to this repo, which represents a maintenance burden. I think that the differences in logic between the scripts are small enough that we could pretty easily factor out the shared logic and switch on the different logic with some simple conditional branches.

Here's a rough idea of what I'm imagining:

One script called something simple like flag.py
The script accepts command line flags using click:
- --params (enum, glue or local, defaults to local)
- --reflag (boolean, defaults to False)
- --unflagged-only (boolean, defaults to False; if this is true, --reflag is a no-op)
- Perhaps all flags other than --params should just be included in the YAML file or Glue params; I'm not sure about this yet
Command line flags (or YAML/Glue params) get used to control the behavior of the script and switch between what we currently think of as initial/update/Glue run types

And here's how these flags could replace the existing scripts:

Initial flagging: python3 flag.py --params local
Manual update: python3 flag.py --params local --reflag
Glue job: python3 flag.py --params glue --unflagged-only

This design is just a rough sketch though, let's nail it down once you're ready to get started @wagnerlmichael!

The text was updated successfully, but these errors were encountered:

jeancochrane · 2024-02-28T16:44:40Z

The two types of manual flagging scripts have been consolidated as of #90; the last step is to consolidate the Glue script.

wagnerlmichael mentioned this issue Feb 6, 2024

Architecture change for variable methodology #90

Merged

jeancochrane mentioned this issue Feb 28, 2024

Get the pipeline running again in some kind of scheduled job (either Glue or another framework) #99

Open

dfsnow assigned wagnerlmichael Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consolidate flagging scripts #92

Consolidate flagging scripts #92

jeancochrane commented Jan 24, 2024

jeancochrane commented Feb 28, 2024

Consolidate flagging scripts #92

Consolidate flagging scripts #92

Comments

jeancochrane commented Jan 24, 2024

jeancochrane commented Feb 28, 2024