-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove redundant data members from InputFileCatalog to reduce memory use #47013
Conversation
cms-bot internal usage |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47013/43114 |
A new Pull Request was created by @makortel for master. It involves the following packages:
@Dr15Jones, @cmsbuild, @makortel, @smuzaffar can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
@cmsbuild, please test |
@Dr15Jones please test |
It seems to me we don't have good tests for |
+1 Size: This PR adds an extra 24KB to repository Comparison SummarySummary:
|
ae9d552
to
6777337
Compare
Added a unit test for |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47013/43118 |
Pull request #47013 was updated. @Dr15Jones, @cmsbuild, @makortel, @smuzaffar can you please check and sign again. |
Ok, force-pushing again with a modified commit id helped |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47013/43214 |
Pull request #47013 was updated. @Dr15Jones, @cmsbuild, @makortel, @smuzaffar can you please check and sign again. |
@cmsbuild, please test |
+1 Size: This PR adds an extra 16KB to repository Comparison SummarySummary:
|
Comparison failures are related to #46416 |
+core |
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @antoniovilela, @rappoccio, @sextonkennedy, @mandrenguyen (and backports should be raised in the release meeting by the corresponding L2) |
+1 |
PR description:
In #46975 (comment) I discovered the
InputFileCatalog
took ~488 MB memory per stream in a production DIGI job overlaying premixed pileup, where the job was configured with nearly 500k pileup files. A quick look inInputFileCatalog
showed the input file names are more or less stored three timesfileNames_
to communicate a copy of the input file names from constructor toinit()
logicalFileNames_
to partly communicate input file names from constructor toinit()
, and partly to allow cheaplogicalFileNames()
getter to themlogicalFileNames()
function is not really used, so in order to avoid storing the file names in member data, I decided to remove the member function and thelogicalFileNames_
member in the third commitFileCatalogItem::lfn_
stored infileCatalogItems_
memberlfn_
is being usedAn alternative to the second commit could be to keep the
logicalFileNames_
, and store theFileCatalogItem::lfn_
asstring_view
, but I felt that to be a tiny bit more complex.The fourth commit avoids one copy of the
std::vector<std::string>
of the file names when theInputFileCatalog
is constructed from a temporaryvector
, which is the case with all Sources that useInputFileCatalog
.The first commit adds a unit test for
InputFileCatalog
Resolves cms-sw/framework-team#1113
PR validation:
Unit tests passed in CMSSW_14_0_18. With example job #46975 (comment) MaxMemoryPreload showed 197 MB reduction in peak allocated memory on 1 thread.
If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:
To be backported to 14_1_X and 14_0_X.