Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate flagging scripts #92

Open
jeancochrane opened this issue Jan 24, 2024 · 1 comment
Open

Consolidate flagging scripts #92

jeancochrane opened this issue Jan 24, 2024 · 1 comment
Assignees

Comments

@jeancochrane
Copy link
Collaborator

Currently this repo has three separate flagging scripts that do slightly different things:

  • manual_flagging/initial_flagging.py reads params from a local YAML file and assumes no sales have been flagged, so it queries all sales and adds flags with version = 1
  • manual_flagging/manual_update.py reads params from a local YAML file and assumes that some sales have already been flagged but need to be re-flagged, so it queries all sales and adds flags with either version += 1 (if the sale had already been flagged by a previous flagging run) or version = 1 (if the sale had not already been flagged)
  • glue/sales_val_flagging.py reads params from AWS Glue context and assumes that some sales have already been flagged but do not need to be re-flagged, so it queries all unflagged sales and adds flags with version = 1

The README has a handy chart explaining the differences in detail.

The separation of these scripts means that most changes to the flagging logic need to be repeated three times in order to be persisted to this repo, which represents a maintenance burden. I think that the differences in logic between the scripts are small enough that we could pretty easily factor out the shared logic and switch on the different logic with some simple conditional branches.

Here's a rough idea of what I'm imagining:

  • One script called something simple like flag.py
  • The script accepts command line flags using click:
    • --params (enum, glue or local, defaults to local)
    • --reflag (boolean, defaults to False)
    • --unflagged-only (boolean, defaults to False; if this is true, --reflag is a no-op)
    • Perhaps all flags other than --params should just be included in the YAML file or Glue params; I'm not sure about this yet
  • Command line flags (or YAML/Glue params) get used to control the behavior of the script and switch between what we currently think of as initial/update/Glue run types

And here's how these flags could replace the existing scripts:

  • Initial flagging: python3 flag.py --params local
  • Manual update: python3 flag.py --params local --reflag
  • Glue job: python3 flag.py --params glue --unflagged-only

This design is just a rough sketch though, let's nail it down once you're ready to get started @wagnerlmichael!

@jeancochrane
Copy link
Collaborator Author

The two types of manual flagging scripts have been consolidated as of #90; the last step is to consolidate the Glue script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants