Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend CMSSW to a distributed application over MPI #32632

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

fwyzard
Copy link
Contributor

@fwyzard fwyzard commented Jan 12, 2021

PR description:

Let multiple CMSSW processes on the same or different machines coordinate event processing and transfer data products over MPI.

The implementation is based on four CMSSW modules.
Two are responsible for setting up the communication channels and coordinate the event processing:

  • a "remote controller" called MPIController
  • a "remote source" called MPISource

and two are responsible for the transfer of data products:

  • a "sender" called MPISender
  • a "receiver" called MPIReceiver

.

image

The MPIController is an EDProducer running in a regular CMSSW process. After setting up the communication with an MPISource, it transmits to it all EDM run, lumi and event transitions, and instructs the MPISource to replicate them in the second process.

The MPISource is a Source controlling the execution of a second CMSSW process. After setting up the communication with an MPIController, it listens for EDM run, lumi and event transitions, and replicates them in its process.

Both MPIController and MPISource produce an MPIToken, a special data product that encapsulates the information about the MPI communication channel.

The MPISender is an EDProducer that can read one or more collections from the Event, serialise them using their ROOT dictionaries, and send them over the MPI communication channel.

The MPIReceiver is an EDProducer that can receive a set number of collections over the MPI communication channel, deserialise them using their ROOT dictionaries, and put them in the Event with a configurable instance label.

In principle any non-transient collection with a ROOT dictionary can be transmitted. Any transient information is lost during the transfer, and needs to be recreated by the receiving side.

Each MPISender and MPIReceiver is configured with an instance value that is used to match one MPISender in one process to one MPIReceiver in another process. Using different instance values allows the use of multiple MPISenders/MPIReceivers in a process.

Both MPISender and MPIReceiver obtain the MPI communication channel reading an MPIToken from the event. They also produce a copy of the MPIToken, so other modules can consume it to declare a dependency on the previous modules.

An automated test is available in the test/ directory.

Current limitations

  • all communication is blocking, and there is no acknowledgment or feedback from one module to the other;
  • MPIDriver is a "one" module that supports only a single luminosity block at a time;
  • MPISender and MPIReceiver support a single compile-time type;
  • there is no check that the type sent by the MPISender matches the type expected by the MPIReceiver.

Expected future developments

  • implement efficient serialisation for standard layout types;
  • implement efficient serialisation for PortableCollection types;
  • check the the collection sent by the MPISender and the one expected by the MPIReceiver match;
  • extend the MPISender and MPIReceiver to send and receive multiple collections;
  • rewrite the MPISender and MPIReceiver to send and receive arbitrary run-time collections;
  • improve the MPIController to be a global module rather than a one module;
  • let an MPISource accept connections and events from multiple MPIController modules in different jobs;
  • let an MPIController connect and sent events to multiple MPISource modules in different jobs;
  • support multiple concurrent runs and luminosity blocks, up to a given maximum;
  • transfer the ProcessingHistory from the MPIController to the MPISource ? and vice-versa ?
  • transfer other provenance information from the MPIController to the MPISource ? and vice-versa ?
  • when a run, luminosity block or event is received, check that they belong to the same ProcessingHistory as the ongoing run ?

PR validation:

This PR includes an automated unit test.

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 12, 2021

A new Pull Request was created by @fwyzard (Andrea Bocci) for CMSSW_11_2_X.

It involves the following packages:

HeterogeneousCore/MPICore
HeterogeneousCore/MPIServices

@makortel, @cmsbuild, @fwyzard can you please review it and eventually sign? Thanks.
@makortel, @rovere this is something you requested to watch as well.
@silviodonato, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@fwyzard fwyzard changed the title Mpi updates Implement a simple CMSSW client/server over MPI Jan 12, 2021
@fwyzard
Copy link
Contributor Author

fwyzard commented Jan 12, 2021

@makortel @felicepantaleo @rovere FYI.

@cmsbuild
Copy link
Contributor

Pull request #32632 was updated. @makortel, @cmsbuild, @fwyzard can you please check and sign again.

@fwyzard fwyzard changed the base branch from CMSSW_11_2_X to master January 13, 2021 17:47
@cmsbuild cmsbuild modified the milestones: CMSSW_11_2_X, CMSSW_11_3_X Jan 13, 2021
@cmsbuild
Copy link
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-32632/20729

ERROR: Build errors found during clang-tidy run.

HeterogeneousCore/MPICore/plugins/messages.cc:19:3: error: constexpr variable 'types' must be initialized by a constant expression [clang-diagnostic-error]
  DECLARE_MPI_TYPE(EDM_MPI_Empty,    // MPI_Datatype
  ^
HeterogeneousCore/MPICore/plugins/macros.h:102:28: note: expanded from macro 'DECLARE_MPI_TYPE'
--
HeterogeneousCore/MPICore/plugins/messages.cc:25:3: error: constexpr variable 'types' must be initialized by a constant expression [clang-diagnostic-error]
  DECLARE_MPI_TYPE(EDM_MPI_RunAuxiliary,    // MPI_Datatype
  ^
HeterogeneousCore/MPICore/plugins/macros.h:102:28: note: expanded from macro 'DECLARE_MPI_TYPE'
--
HeterogeneousCore/MPICore/plugins/messages.cc:35:3: error: constexpr variable 'types' must be initialized by a constant expression [clang-diagnostic-error]
  DECLARE_MPI_TYPE(EDM_MPI_LuminosityBlockAuxiliary,    // MPI_Datatype
  ^
HeterogeneousCore/MPICore/plugins/macros.h:102:28: note: expanded from macro 'DECLARE_MPI_TYPE'
--
HeterogeneousCore/MPICore/plugins/messages.cc:46:3: error: constexpr variable 'types' must be initialized by a constant expression [clang-diagnostic-error]
  DECLARE_MPI_TYPE(EDM_MPI_EventAuxiliary,    // MPI_Datatype
  ^
HeterogeneousCore/MPICore/plugins/macros.h:102:28: note: expanded from macro 'DECLARE_MPI_TYPE'
--
gmake: *** [config/SCRAM/GMake/Makefile.coderules:128: code-checks] Error 2
gmake: *** [There are compilation/build errors. Please see the detail log above.] Error 2

@fwyzard
Copy link
Contributor Author

fwyzard commented Jan 14, 2021

please test

@cmsbuild
Copy link
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-32632/20747

ERROR: Build errors found during clang-tidy run.

HeterogeneousCore/MPICore/plugins/macros.h:58:45: note: cast from 'void *' is not allowed in a constant expression
--
HeterogeneousCore/MPICore/plugins/macros.h:61:26: error: constexpr variable 'mpi_type<long double>' must be initialized by a constant expression [clang-diagnostic-error]
  constexpr MPI_Datatype mpi_type<long double> = MPI_LONG_DOUBLE;
                         ^
HeterogeneousCore/MPICore/plugins/macros.h:61:50: note: cast from 'void *' is not allowed in a constant expression
--
HeterogeneousCore/MPICore/plugins/macros.h:64:26: error: constexpr variable 'mpi_type<std::byte>' must be initialized by a constant expression [clang-diagnostic-error]
  constexpr MPI_Datatype mpi_type<std::byte> = MPI_BYTE;
                         ^
HeterogeneousCore/MPICore/plugins/macros.h:64:48: note: cast from 'void *' is not allowed in a constant expression
--
HeterogeneousCore/MPICore/plugins/messages.cc:19:3: error: constexpr variable 'types' must be initialized by a constant expression [clang-diagnostic-error]
  DECLARE_MPI_TYPE(EDM_MPI_Empty,    // MPI_Datatype
  ^
HeterogeneousCore/MPICore/plugins/macros.h:102:28: note: expanded from macro 'DECLARE_MPI_TYPE'
--
HeterogeneousCore/MPICore/plugins/messages.cc:25:3: error: constexpr variable 'types' must be initialized by a constant expression [clang-diagnostic-error]
  DECLARE_MPI_TYPE(EDM_MPI_RunAuxiliary,    // MPI_Datatype
  ^
HeterogeneousCore/MPICore/plugins/macros.h:102:28: note: expanded from macro 'DECLARE_MPI_TYPE'
--
gmake: *** [config/SCRAM/GMake/Makefile.coderules:128: code-checks] Error 2
gmake: *** [There are compilation/build errors. Please see the detail log above.] Error 2

@fwyzard
Copy link
Contributor Author

fwyzard commented Jan 14, 2021

please test

@cmsbuild
Copy link
Contributor

+1

Size: This PR adds an extra 40KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-674e41/43151/summary.html
COMMIT: a6dabd8
CMSSW: CMSSW_15_0_X_2024-11-28-2300/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/32632/43151/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

Note: this is a quick workaround to let the device code use the device collection,
while being able to access the actual number of pf rechits on the host side.
It should replaced with a better and more general implementation, and the use of
the host collection should be removed.
Let multiple CMSSW processes on the same or different machines coordinate
event processing and transfer data products over MPI.

The implementation is based on four CMSSW modules.
Two are responsible for setting up the communication channels and
coordinate the event processing:
  - the MPIController
  - the MPISource
and two are responsible for the transfer of data products:
  - the MPISender
  - the MPIReceiver
.

The MPIController is an EDProducer running in a regular CMSSW process.
After setting up the communication with an MPISource, it transmits to it
all EDM run, lumi and event transitions, and instructs the MPISource to
replicate them in the second process.

The MPISource is a Source controlling the execution of a second CMSSW
process. After setting up the communication with an MPIController, it
listens for EDM run, lumi and event transitions, and replicates them in
its process.

Both MPIController and MPISource produce an MPIToken, a special data
product that encapsulates the information about the MPI communication
channel.

The MPISender is an EDProducer that can read a collection of a predefined
type from the Event, serialise it using its ROOT dictionary, and send it
over the MPI communication channel.

The MPIReceiver is an EDProducer that can receive a collection of a
predefined type over the MPI communication channel, deserialise is using
its ROOT dictionary, and put it in the Event.

Both MPISender and MPIReceiver are templated on the type to be
transmitted and de/serialised.

Each MPISender and MPIReceiver is configured with an instance value
that is used to match one MPISender in one process to one MPIReceiver in
another process. Using different instance values allows the use of
multiple MPISenders/MPIReceivers in a process.

Both MPISender and MPIReceiver obtain the MPI communication channel
reading an MPIToken from the event. They also produce a copy of the
MPIToken, so other modules can consume it to declare a dependency on
the previous modules.

An automated test is available in the test/ directory.
Let MPISender and MPIReceiver consume, send/receive and produce
collections of arbitrary types, as long as they have a ROOT dictionary
and can be persisted.

Note that any transient information is lost during the transfer, and
needs to be recreated by the receiving side.

The documentation and tests are updated accordingly.

Warning: this approach is a work in progress!
TODO:
  - improve framework integration
  - add checks between send/receive types
@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 1, 2024

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-32632/42857

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 1, 2024

Pull request #32632 was updated. @Dr15Jones, @cmsbuild, @fwyzard, @jfernan2, @makortel, @mandrenguyen, @smuzaffar can you please check and sign again.

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 1, 2024

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 1, 2024

+1

Size: This PR adds an extra 28KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-674e41/43174/summary.html
COMMIT: 4a91403
CMSSW: CMSSW_15_0_X_2024-12-01-0000/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/32632/43174/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 31 differences found in the comparisons
  • DQMHistoTests: Total files compared: 46
  • DQMHistoTests: Total histograms compared: 3484682
  • DQMHistoTests: Total failures: 1565
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3483097
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 45 files compared)
  • Checked 202 log files, 172 edm output root files, 46 DQM output files
  • TriggerResults: found differences in 1 / 44 workflows

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 9, 2024

type ngt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants