Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TF] Update Tensorflow to 2.17.0 build with c++20 standard #9438

Merged
merged 3 commits into from
Sep 29, 2024

Conversation

smuzaffar
Copy link
Contributor

This PR updates tensorflow to version 2.17.0 for special TF_X IBs. It also contains following changes

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @smuzaffar for branch IB/CMSSW_14_2_X/tf.

@aandvalenzuela, @iarspider, @smuzaffar can you please review it and eventually sign? Thanks.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.
cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 27, 2024

cms-bot internal usage

@smuzaffar
Copy link
Contributor Author

test parameters:

  • full_cmssw = true

@smuzaffar
Copy link
Contributor Author

please test for CMSSW_14_2_TF_X/el8_amd64_gcc12

@smuzaffar
Copy link
Contributor Author

@fwyzard , our eigen patch cms-externals/eigen-git-mirror@3cbe8e7 is getting bigger and bigger and harder to apply ( mostly due to eigen code formatting). Do you think these EIGEN_DEVICE_FUNC type additions are really needed? should we try building without our patch ?

@cmsbuild
Copy link
Contributor

-1

Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-5cc892/41784/summary.html
COMMIT: 5e451b0
CMSSW: CMSSW_14_2_TF_X_2024-09-24-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9438/41784/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-5cc892/41784/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-5cc892/41784/git-merge-result

Unit Tests

I found 1 errors in the following unit tests:

---> test test_G2GainsValidator had ERRORS

Comparison Summary

Summary:

  • You potentially removed 113 lines from the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 24569 differences found in the comparisons
  • DQMHistoTests: Total files compared: 44
  • DQMHistoTests: Total histograms compared: 3331063
  • DQMHistoTests: Total failures: 93667
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 3237375
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 119.3020000000001 KiB( 43 files compared)
  • DQMHistoSizes: changed ( 1000.0,... ): 1.715 KiB OfflinePV/Alignment
  • DQMHistoSizes: changed ( 12834.0,... ): 5.430 KiB HLT/Vertexing
  • DQMHistoSizes: changed ( 140.063 ): 0.012 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 24834.911,... ): 3.590 KiB HLT/Vertexing
  • Checked 193 log files, 163 edm output root files, 44 DQM output files
  • TriggerResults: found differences in 9 / 42 workflows

@smuzaffar
Copy link
Contributor Author

enable gpu

@smuzaffar
Copy link
Contributor Author

please test for CMSSW_14_2_TF_X/el8_amd64_gcc12

@smuzaffar
Copy link
Contributor Author

smuzaffar commented Sep 28, 2024

please test for CMSSW_14_2_TF_X/el8_aarch64_gcc12

@smuzaffar
Copy link
Contributor Author

please test for CMSSW_14_2_TF_X/slc7_amd64_gcc12

@cmsbuild
Copy link
Contributor

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-5cc892/41803/summary.html
COMMIT: 5e451b0
CMSSW: CMSSW_14_2_TF_X_2024-09-24-1100/el8_aarch64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9438/41803/install.sh to create a dev area with all the needed externals and cmssw changes.

External Build

I found compilation error when building:

+ find bazel-out/-opt/bin -path '*/external/nsync/libnsync_cpp.pic.a'
+ xargs --no-run-if-empty -i cp '{}' /data/cmsbld/jenkins_a/workspace/ib-run-pr-tests/testBuildDir/tmp/BUILDROOT/d4bd83914671513f28ce8f407c5b7724/opt/cmssw/el8_aarch64_gcc12/external/tensorflow-sources/2.17.0-d4bd83914671513f28ce8f407c5b7724/lib-xla-runtime
find: 'bazel-out/-opt/bin': No such file or directory
+ for lib in libfft.pic.a libfft_wrapper.pic.a libmutex.pic.a libnsync_cpp.pic.a
+ test -e /data/cmsbld/jenkins_a/workspace/ib-run-pr-tests/testBuildDir/tmp/BUILDROOT/d4bd83914671513f28ce8f407c5b7724/opt/cmssw/el8_aarch64_gcc12/external/tensorflow-sources/2.17.0-d4bd83914671513f28ce8f407c5b7724/lib-xla-runtime/libfft.pic.a
error: Bad exit status from /data/cmsbld/jenkins_a/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.p1JiI1 (%install)


RPM build errors:
line 42: It's not recommended to have unversioned Obsoletes: Obsoletes: external+tensorflow-sources+2.17.0-d4bd83914671513f28ce8f407c5b7724
Bad exit status from /data/cmsbld/jenkins_a/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.p1JiI1 (%install)


@cmsbuild
Copy link
Contributor

Pull request #9438 was updated.

@smuzaffar
Copy link
Contributor Author

please test for CMSSW_14_2_TF_X/el8_aarch64_gcc12

@cmsbuild
Copy link
Contributor

Pull request #9438 was updated.

@smuzaffar
Copy link
Contributor Author

please test for CMSSW_14_2_TF_X/el8_amd64_gcc12

@cmsbuild
Copy link
Contributor

-1

Failed Tests: UnitTests GpuUnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-5cc892/41807/summary.html
COMMIT: 03bb073
CMSSW: CMSSW_14_2_TF_X_2024-09-24-1100/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9438/41807/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-5cc892/41807/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-5cc892/41807/git-merge-result

Unit Tests

I found 1 errors in the following unit tests:

---> test test_G2GainsValidator had ERRORS

GPU Unit Tests

I found 11 errors in the following unit tests:

---> test simpleCholeskyTest had ERRORS
---> test testTFGraphLoadingCUDA had ERRORS
---> test testTFConstSessionCUDA had ERRORS
and more ...

Comparison Summary

Summary:

  • You potentially removed 112 lines from the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 24567 differences found in the comparisons
  • DQMHistoTests: Total files compared: 44
  • DQMHistoTests: Total histograms compared: 3331063
  • DQMHistoTests: Total failures: 93683
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 3237359
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 119.3020000000001 KiB( 43 files compared)
  • DQMHistoSizes: changed ( 1000.0,... ): 1.715 KiB OfflinePV/Alignment
  • DQMHistoSizes: changed ( 12834.0,... ): 5.430 KiB HLT/Vertexing
  • DQMHistoSizes: changed ( 140.063 ): 0.012 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 24834.911,... ): 3.590 KiB HLT/Vertexing
  • Checked 193 log files, 163 edm output root files, 44 DQM output files
  • TriggerResults: found differences in 9 / 42 workflows

@smuzaffar
Copy link
Contributor Author

please test for CMSSW_14_2_TF_X/el8_aarch64_gcc12

@smuzaffar
Copy link
Contributor Author

please test for CMSSW_14_2_TF_X/slc7_amd64_gcc12

@cmsbuild
Copy link
Contributor

-1

Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-5cc892/41809/summary.html
COMMIT: 03bb073
CMSSW: CMSSW_14_2_TF_X_2024-09-24-1100/el8_aarch64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9438/41809/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-5cc892/41809/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-5cc892/41809/git-merge-result

Unit Tests

I found 2 errors in the following unit tests:

---> test testAOTTools had ERRORS
---> test test_G2GainsValidator had ERRORS

@smuzaffar
Copy link
Contributor Author

looks good , thought gpu TF unit tests ae failing

@smuzaffar smuzaffar merged commit 917ef6a into IB/CMSSW_14_2_X/tf Sep 29, 2024
18 of 24 checks passed
@cmsbuild
Copy link
Contributor

-1

Failed Tests: UnitTests GpuUnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-5cc892/41810/summary.html
COMMIT: 03bb073
CMSSW: CMSSW_14_2_TF_X_2024-09-24-1100/slc7_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9438/41810/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-5cc892/41810/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-5cc892/41810/git-merge-result

Unit Tests

I found 1 errors in the following unit tests:

---> test test_G2GainsValidator had ERRORS

GPU Unit Tests

I found 13 errors in the following unit tests:

---> test simpleCholeskyTest had ERRORS
---> test testCUDAService had ERRORS
---> test testTorchSimpleDnnCUDA had ERRORS
and more ...

Comparison Summary

Summary:

  • You potentially removed 889 lines from the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 69153 differences found in the comparisons
  • DQMHistoTests: Total files compared: 44
  • DQMHistoTests: Total histograms compared: 3331063
  • DQMHistoTests: Total failures: 522287
  • DQMHistoTests: Total nulls: 336
  • DQMHistoTests: Total successes: 2808420
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 125.88700000000009 KiB( 43 files compared)
  • DQMHistoSizes: changed ( 1000.0,... ): 1.715 KiB OfflinePV/Alignment
  • DQMHistoSizes: changed ( 12834.0,... ): 5.430 KiB HLT/Vertexing
  • DQMHistoSizes: changed ( 13034.0 ): 5.658 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 140.063 ): 0.012 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 141.042 ): 0.008 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 141.044 ): 0.020 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 141.046 ): -0.004 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 24834.911,... ): 3.590 KiB HLT/Vertexing
  • DQMHistoSizes: changed ( 250202.181 ): 0.117 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 25202.0 ): 0.787 KiB SiStrip/MechanicalView
  • Checked 193 log files, 163 edm output root files, 44 DQM output files
  • TriggerResults: found differences in 17 / 42 workflows

@smuzaffar smuzaffar deleted the tf-2.17.0 branch October 8, 2024 08:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants