Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[21Pt] PR - Re-work Inundation.py [WIP] #1392

Open
wants to merge 30 commits into
base: dev
Choose a base branch
from

Conversation

GregoryPetrochenkov-NOAA
Copy link
Contributor

This update is a work in progress pending acceptance criteria of a one to one comparison of the new method and the old one given a run of UAT. The caveat is that the old method will have to be run using a cutoff of a 1/10th of a foot or more as inundated as opposed to any value above 0 meters.

NOTE: Accidentally mixed up some older probabilistic scripts in here and will remove them from this PR shortly.

Substantive changes:

  1. Prepared to leverage compressed data for both rem values (change from float 32 to int16) and rastrized catchment values (compressed to last four numbers given they all share the same first four numbers).
  2. Numba optimized and reduced copies as well as thread capable processing for the inundation routine.
  3. Threaded capabilities for the mosaic operation.

Additions

  • inundation_optimized.py: This is the newly thread capable numba optimimzed routine
  • inundation_gms_optimized.py: Iteratively running inundation leveraging the inundation_optimized.py
  • nundate_mosiac_wrapper_optimized.py: Made operation thread capable

Changes

  • overlapping_inundation: Made thread capable
  • mosic_inundation: Made thread capable
  • inundation.py: Added inch lower limit for specifying inundation extent as opposed to zero

Removals

Testing

In Progress

Deployment Plan (For developer use)

How does the changes affect the product?

  • [] Code only?
  • If applicable, has a deployment plan be created with the deployment person/team?
  • Require new or adjusted data inputs? Does it have start, end and duration code (in UTC)?
  • If new or updated data sets, has the FIM code been updated and tested with the new/adjusted data (subset is fine, but must be a subset of the new data)?
  • Require new pre-clip set?
  • Has new or updated python packages?

Issuer Checklist (For developer use)

You may update this checklist before and/or after creating the PR. If you're unsure about any of them, please ask, we're here to help! These items are what we are going to look for before merging your code.

  • Informative and human-readable title, using the format: [_pt] PR: <description>
  • Links are provided if this PR resolves an issue, or depends on another other PR
  • If submitting a PR to the dev branch (the default branch), you have a descriptive Feature Branch name using the format: dev-<description-of-change> (e.g. dev-revise-levee-masking)
  • Changes are limited to a single goal (no scope creep)
  • The feature branch you're submitting as a PR is up to date (merged) with the latest dev branch
  • pre-commit hooks were run locally
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future todos are captured in comments
  • CHANGELOG updated with template version number, e.g. 4.x.x.x
  • Add yourself as an assignee in the PR as well as the FIM Technical Lead

Merge Checklist (For Technical Lead use only)

  • Update CHANGELOG with latest version number and merge date
  • Update the Citation.cff file to reflect the latest version number in the CHANGELOG
  • If applicable, update README with major alterations


except NoForecastFound as exc:
if log_file is not None:
print(f"{hucCode},{branch_id},{exc.__class__.__name__}, {exc}", file=open(log_file, "a"))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opening a file here might not be a good idea. Python docs recommend explicitly closing opened files and not relying on garbage collection to properly flush and close the file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya.. we use "with" for every io based write for log files.

Also.. just today, we went though a ton of our files looking more carefully if objects can be deleted earlier. We had a habit of loading a file and keeping it open the entire function/method even though we used it early on. Much of FIM is now showing memory issues so we are cleaning those up. Good to keep an eye for it here too. You might already be doing it but just a heads up on it. :)

"""

for obj in objects_to_delete:
del obj
Copy link

@groutr groutr Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this only deletes the reference to the object, not the object itself. If the reference count is greater than 0 then the object will continue to live.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it. Maybe add that little note that Ryan mentions right in the code. (Helps future users) Note that this only deletes the reference to the object, not the object itself. If the reference count is greater than 0 then the object will continue to live.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The attempt to manually manage memory here I think is misguided. Python is already doing most of this for you automatically. There is often no need to manually invoke expensive gc collections, though there are some exceptions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya.. There are a few places it might help though (I think). We have some areas where we keep some very large and multiple datasets and lists open simultaneously. There are lots of places, a lot in other FIM Files and not so much here, we can drop objects after we need them, or pull out what we need to a subset object so we can let the bigger one be cleaned up earlier. Most of that is on a file by file, line by line basis. The addition of so many large dataset open at the same time are creating pretty big problems. (Granted.. this statement mostly applies to other FIM code files, and not so much this PR files. But bringing awareness to it helps. :) (I think). Code in multi processing have shown to be a bit slower on garbage collections and we have seen a few impacts on it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timing on the firing of the python garbage collection is async and minor delays can be seen on rare occasions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants