Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patatrack integration - Pixel local reconstruction (9/N) #31721

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
149 commits
Select commit Hold shift + click to select a range
9e12c44
Implementation of the pixel raw to digi algorithm in CUDA
sushildubey171 Nov 15, 2017
acdaa89
Cleanup the CUDA code, and recover the CPU code
fwyzard Nov 16, 2017
835416c
Better integration in CMSSW, validation, cleanup and fixes
Nov 26, 2017
41db525
Direct access to cabling map for GPU RawToDigi
sushildubey171 Dec 1, 2017
39e1459
Unpack errors, bad ROCs, improve validation, fixes and cleanup
Dec 7, 2017
87e0c99
Set CUDA optimization flags
felicepantaleo Jan 24, 2018
27ad793
Various fixes to GPU implementation o the pixel unpacker
fwyzard Feb 2, 2018
8813282
Digital calibrator, concurrent clusterizer (optimized), CPE and RecHIts
VinInn Feb 15, 2018
1b0c0fa
Move cudaCheck.h under HeterogeneousCore/CUDAUtilities
fwyzard Feb 15, 2018
1dfe4f2
R2D: use GPU::SimpleVector for the error unpacking (cms-patatrack#14)
Feb 20, 2018
78b9a61
Remove debug messages (cms-patatrack#21)
fwyzard Feb 28, 2018
b3a0c93
Add workflows for Riemann fit and GPU (cms-patatrack#20)
makortel Feb 28, 2018
400d308
Move the import next to the GPU modifier
fwyzard Feb 28, 2018
e8cc5ff
Remove -O2/-O3 from CUDA flags (cms-patatrack#29)
fwyzard Mar 1, 2018
fce61c4
GPU2CPU for clusters and RecHIts (cms-patatrack#18)
VinInn Mar 1, 2018
b518a25
Use the `gpu` modifier to read the pixel clusters from the unpacker (…
fwyzard Mar 1, 2018
c1d8e14
Use CUDA Unified Addressing
fwyzard Mar 13, 2018
b1473c7
Introduce an STL-compatible allocator for CUDA host memory
fwyzard Mar 15, 2018
e5c8f7e
Remove #pragma once
fwyzard Apr 4, 2018
cc65b52
Synchronise with CMSSW_10_2_0_pre1
fwyzard Apr 23, 2018
ae971f3
#include ".../cuda_cxx17.h" for std::size()
fwyzard Apr 26, 2018
67762f4
Remove the use of #pragma once (cms-patatrack#47)
fwyzard May 15, 2018
1da9842
Clean up some GPU- and non-GPU-related pixel tracking code (cms-patat…
fwyzard May 23, 2018
ae104ec
Implement a Heterogeneous version of Raw2Cluster and RecHit (cms-pata…
makortel Jun 4, 2018
999ec32
Faster clustering, now does not requires to know number of modules (c…
VinInn Jun 4, 2018
4117157
Prototype for EventSetup data on GPUs (cms-patatrack#77)
fwyzard Dec 29, 2020
7501972
Migrated PixelRecHit to Heterogeneous producer (cms-patatrack#81)
VinInn Jun 29, 2018
59c32aa
Various fixes and cleanup (cms-patatrack#87)
makortel Jun 29, 2018
bfbbaac
Synchronise with CMSSW_10_2_0_pre6
fwyzard Jul 4, 2018
c44ba3e
Update the PixelCPEFast code following the reorganisation in #23571
fwyzard Jul 8, 2018
7be0e29
Various fixes and cleanup (cms-patatrack#101)
fwyzard Jul 24, 2018
4dc6801
Fix synchronisation problems in the clusterizer (cms-patatrack#102)
fwyzard Jul 25, 2018
2dbefb4
Heterogeneous ClusterTPAssociation (cms-patatrack#105)
VinInn Jul 31, 2018
911dc73
Remove all remaining calls to cudaStreamSynchronize() (cms-patatrack#…
makortel Jul 31, 2018
a0c3d19
Make all device->host cudaMemcpyAsync transfers use pinned memory (cm…
makortel Aug 1, 2018
400cebf
Fix memory initialisation problems in the clusterizer (again) (cms-pa…
fwyzard Aug 2, 2018
81b6d79
Clean up some non-GPU-related pixel tracking code
fwyzard Aug 14, 2018
adf1e92
Cleanup after merging with CMSSW 10.2.2 (cms-patatrack#134)
fwyzard Aug 17, 2018
6b9ac38
Add optional flags to disable SOA->legacy conversion and GPU->CPU tra…
makortel Aug 17, 2018
f011f7d
Add new unpackers for HI run configuration.
thomreis Sep 11, 2018
cf35be0
Tune and speed up doublet algo (cms-patatrack#158)
VinInn Sep 12, 2018
5f1d5a5
Add missing HI rawDataRepacker changes.
thomreis Sep 13, 2018
e82d9e8
Tested new configuration that works for HI runs
Sep 17, 2018
8c1bc81
Introduce Cluster Charge Cut, optimize Histogram (bucket sorting) (cm…
VinInn Sep 26, 2018
e843016
Work around atomicAdd synchronisation problem on Volta (cms-patatrack…
fwyzard Sep 30, 2018
db82536
Revert back from oversight about rawDataRepacker and rawDataCollector
Oct 4, 2018
7d03457
Update beampixel_dqm_sourceclient-live_cfg.py
Oct 4, 2018
6ed56a5
Suppress asserts in the GPU code, unless GPU_DEBUG is defined (cms-pa…
fwyzard Oct 6, 2018
a19245d
Synchronise with CMSSW_10_4_0_pre2
fwyzard Nov 14, 2018
0288826
Add infrastructure around cub CachingDeviceAllocator, and use it in S…
makortel Nov 27, 2018
49ab272
Migrate tracker local reconstruction and pixel tracking to Tasks (bac…
fwyzard Nov 28, 2018
45f8115
Address code style and quality issues (cms-patatrack#203)
fwyzard Nov 28, 2018
b10314c
Fix access to uninitialised memory in RawToDigi_kernel (cms-patatrack…
fwyzard Nov 29, 2018
f90a242
Require allocated type to have only a trivial constructor for make_de…
makortel Dec 7, 2018
64c573c
Set the data pointer of error's SimpleVector (cms-patatrack#236)
makortel Jan 7, 2019
f2a4648
Full workflow from raw data to pixel tracks and vertices on GPUs (cms…
VinInn Jan 8, 2019
4de304f
Limit the pixel clusteriser to nearest neighbours (cms-patatrack#241)
fwyzard Jan 9, 2019
e0ad70b
Synchronise with CMSSW_10_4_0
fwyzard Jan 14, 2019
5fca7f1
Fix invalid narrowing conversion from "unsigned int" to "unsigned cha…
fwyzard Jan 17, 2019
1a307c4
Skip CUDA-related tests if no GPU is present (cms-patatrack#252)
fwyzard Jan 17, 2019
255ec71
Fix or disable failing unit tests (cms-patatrack#253)
fwyzard Jan 18, 2019
2392b51
Fix warnings reported by clang (cms-patatrack#255)
makortel Jan 18, 2019
60a9c68
Synchronise with CMSSW_10_5_0_pre2
fwyzard Feb 21, 2019
d3cb263
Next prototype of the framework integration (cms-patatrack#100)
makortel Mar 13, 2019
9ccfd89
Various updates to pixel track/vertex DQM and MTV (cms-patatrack#285)
makortel Mar 15, 2019
9dbaa0b
Make GPU-CPU cluster matching deterministic (cms-patatrack#294)
VinInn Mar 21, 2019
18ccfc7
Apply code checks (cms-patatrack#315)
fwyzard Apr 9, 2019
6658678
Move BeamSpot transfer to GPU to its own producer (cms-patatrack#318)
makortel Apr 23, 2019
b0ede70
Make SiPixelGainCalibrationForHLTGPU available on the CPU (cms-patatr…
VinInn Mar 16, 2019
9163a10
Rework the GPU pixel track clusterizer and vertex finder (cms-patatra…
VinInn Feb 14, 2019
4c81ba3
Improve pixel doublets and CA, and extend debugging functionality (cm…
VinInn Feb 25, 2019
f14c0f4
Migrate the pixel rechits producer and CA to the new heterogeneous fr…
VinInn Mar 29, 2019
2cdc2cf
Clean up by clang-format (cms-patatrack#338)
fwyzard May 14, 2019
c5031d4
Migrate ClusterTPAssociationHeterogeneous to the new framework (cms-p…
fwyzard May 15, 2019
69ed3e2
Synchronise with CMSSW_10_6_0
fwyzard May 15, 2019
b0e14f8
Update Service-based tests to provide the functionality that was remo…
fwyzard May 15, 2019
acc8360
Migrate gpuPixelRecHits::getHits() kernel to use a View instead of mu…
VinInn Jun 20, 2019
af2ffa3
Reorganize CUDAScopedContext (cms-patatrack#355)
makortel Jun 20, 2019
f177e9e
Implement triplets in the pixel ntuplet producer (cms-patatrack#382)
VinInn Jun 20, 2019
0ad2636
Port the whole pixel workflow to new heterogeneous framework (cms-pat…
VinInn Jul 5, 2019
5126c50
Implement full Pixel SoA workflow on CPU (cms-patatrack#385)
VinInn Aug 25, 2019
a157b87
Move event and stream caches, and caching allocators out from CUDASer…
makortel Sep 10, 2019
82a7bd0
Apply clang-format style formatting
fwyzard Sep 11, 2019
536a61d
Synchronise with CMSSW_11_0_0_pre7
fwyzard Sep 12, 2019
950358b
Fix clang warnings (cms-patatrack#387)
makortel Oct 23, 2019
de80674
Implement library-only wrappers for launching CUDA kernels (cms-patat…
fwyzard Oct 24, 2019
b56966f
Replace use of API wrapper stream and event with plain CUDA, part 1 (…
makortel Oct 26, 2019
5aa5b55
Replace CUDA API wrapper memory operations with native CUDA calls (cm…
waredjeb Oct 29, 2019
86a3932
Replace use of CUDA API wrapper unique_ptrs with CUDAUtilities unique…
waredjeb Oct 31, 2019
b13812b
Synchronise with CMSSW_11_0_0_pre11
fwyzard Nov 4, 2019
9b09f0a
Minimal updates following #28127
fwyzard Nov 4, 2019
8393897
Use non-blocking CUDA streams (cms-patatrack#405)
fwyzard Nov 8, 2019
a32cd8f
Replace cuda::device operations with native CUDA calls (cms-patatrack…
waredjeb Nov 26, 2019
ead6c49
Drop obsolete heterogenous framework (cms-patatrack#416)
fwyzard Nov 27, 2019
2276716
Remove last references to CUDA API Wrappers (cms-patatrack#417)
fwyzard Nov 27, 2019
71d4137
Migrate ESProducers to use ESGetToken's
fwyzard Nov 29, 2019
acde18e
Apply code checks and code formatting
fwyzard Nov 29, 2019
2595d76
Move LogWarning into the EDProducer (cms-patatrack#422)
VinInn Dec 2, 2019
b3879eb
Rename exitSansCUDADevices to requireCUDADevices (cms-patatrack#423)
fwyzard Dec 2, 2019
e77a278
Implement changes from the CUDA framework review (cms-patatrack#429)
makortel Jan 17, 2020
db9c642
Synchronise with CMSSW_11_1_0_pre2
fwyzard Jan 27, 2020
850455d
Protect against too many pixels (cms-patatrack#436)
VinInn Jan 31, 2020
333afca
Update lumi ALCARECO configurations for heterogeneous pixel digi and …
makortel Feb 5, 2020
a452bab
Work around: avoid assert in Phase 2 workflows (cms-patatrack#438)
VinInn Feb 11, 2020
f633604
Integrate the comments from the upstream PRs (cms-patatrack#442)
fwyzard Mar 24, 2020
77efd08
Synchronise with CMSSW_11_1_0_pre5
fwyzard Mar 26, 2020
9d3e458
Backport: remove unneeded dependencies in Reco subsystems (#29295)
fwyzard Apr 6, 2020
f7c8e8a
Apply code formatting fixes from upstream integration (cms-patatrack#…
fwyzard Apr 6, 2020
738ef55
Synchronise with CMSSW_11_1_0_pre7
fwyzard Dec 29, 2020
192c1a0
Replace cub prefix scan with home-brewed one (cms-patatrack#447)
VinInn May 15, 2020
53c8a22
Remove dependency on cub (cms-patatrack#449)
VinInn May 18, 2020
3025d44
Rename CUDAHostAllocator to cms::cuda::HostAllocator (cms-patatrack#464)
fwyzard May 20, 2020
3b0dede
Synchronise with CMSSW_11_1_0_pre8
fwyzard May 23, 2020
e9ed9fa
Use siPixelDigis.cpu instead of siPixelDigis (cms-patatrack#467)
fwyzard May 25, 2020
6ad3fec
Update Pixel gain calibration scheme (for Run3) (cms-patatrack#492)
VinInn Jul 4, 2020
e4fc4a8
Add truncation to pixel charge on GPU (cms-patatrack#501)
VinInn Jul 8, 2020
3ff2feb
Synchronise with CMSSW_11_2_0_pre2
fwyzard Jul 12, 2020
52ea6f7
Remove "cuda_cxx17.h" (cms-patatrack#519)
fwyzard Jul 19, 2020
a2151b7
Synchronise with CMSSW_11_2_0_pre3
fwyzard Aug 8, 2020
236150b
Remove use of boost::mpl::vector for dependent records (cms-patatrack…
fwyzard Aug 8, 2020
07f8623
Remove misleading calls to std::move (cms-patatrack#546)
fwyzard Sep 1, 2020
741e56c
Synchronise with CMSSW_11_2_0_pre7
fwyzard Oct 2, 2020
39b0088
Fix BS naming in siPixelRecHitsCUDAPreSplitting (cms-patatrack#551)
AdrianoDee Oct 1, 2020
d243b8f
Update ESProducers following cms-sw#31556 (cms-patatrack#555)
fwyzard Oct 2, 2020
6dc465e
Update the validation sequence for pixel-only tracking workflows (cms…
sroychow Oct 12, 2020
2a32b42
Define CUDA-specific attributes for compatility with GCC (cms-patatra…
fwyzard Oct 20, 2020
86d7fb5
Synchronise with CMSSW_11_2_0_pre8
fwyzard Oct 23, 2020
fb79324
Bugfix: add missing GPU memory free to PixelCPEFast (cms-patatrack#570)
fwyzard Nov 12, 2020
5315cb9
Synchronise with CMSSW_11_2_0_pre9
fwyzard Nov 16, 2020
fad0b22
Remove partial handling of the Pilot Blade from GPU code (cms-patatra…
tvami Nov 24, 2020
b612094
Migrate GPU pixel modules to ESConsumes (cms-patatrack#577)
mmusich Nov 24, 2020
cc51270
Clean up GPU pixel modules
fwyzard Nov 24, 2020
b5d9f71
Apply code formatting
fwyzard Nov 25, 2020
dad9133
Update DQM clients for PixelVertexProducer pixel vertices (cms-patatr…
fwyzard Nov 26, 2020
843d35f
Move SiPixelFedCablingMapGPU to CondFormats and CalibTracker (cms-pat…
tsusa Nov 23, 2020
e61ac42
Rename SiPixelFedCablingMapGPU to SiPixelROCsStatusAndMapping (cms-pa…
fwyzard Nov 24, 2020
d3b0b58
Address pixel local reco PR review comments (cms-patatrack#575)
makortel Nov 27, 2020
674af61
Apply code formatting
fwyzard Nov 27, 2020
b643d1b
Move hit indexes to 32 bits (cms-patatrack#583)
VinInn Nov 27, 2020
2b4d4eb
Synchronise with CMSSW_11_2_0_pre10
fwyzard Nov 27, 2020
0489c58
Simplify cudacompat layer to use a 1-dimensional grid (cms-patatrack#…
fwyzard Dec 1, 2020
ef08951
Make clusterizer kernels independent of the grid size (cms-patatrack#…
VinInn Dec 1, 2020
fedbf4d
Clean up the pixel local reconstruction code (cms-patatrack#593)
fwyzard Dec 18, 2020
d602f89
Synchronise with CMSSW_11_3_0_pre1
fwyzard Dec 24, 2020
120f131
Synchronise with CMSSW_11_3_X_2020-12-24-2300
fwyzard Dec 24, 2020
23549f6
Apply code formatting
fwyzard Dec 25, 2020
e87a6b6
Clean up the pixel local reconstruction code (cms-patatrack#599)
fwyzard Dec 25, 2020
7717f1b
Clean up the pixel local reconstruction code (#601)
fwyzard Dec 29, 2020
2e8e5fd
Clean up the pixel local reconstruction code (cms-patatrack#602)
fwyzard Dec 30, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CUDADataFormats/Common/BuildFile.xml
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
<iftool name="cuda">
<use name="rootcore"/>
<use name="DataFormats/Common"/>
<use name="HeterogeneousCore/CUDAUtilities"/>
<export>
<lib name="1"/>
Expand Down
7 changes: 7 additions & 0 deletions CUDADataFormats/Common/src/classes.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#ifndef CUDADataFormats_Common_src_classes_h
#define CUDADataFormats_Common_src_classes_h

#include "CUDADataFormats/Common/interface/HostProduct.h"
#include "DataFormats/Common/interface/Wrapper.h"

#endif // CUDADataFormats_Common_src_classes_h
4 changes: 4 additions & 0 deletions CUDADataFormats/Common/src/classes_def.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
<lcgdict>
<class name="HostProduct<uint32_t[]>" persistent="false"/>
<class name="edm::Wrapper<HostProduct<uint32_t[]>>" persistent="false"/>
</lcgdict>
9 changes: 9 additions & 0 deletions CUDADataFormats/SiPixelCluster/BuildFile.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
<use name="cuda"/>
<use name="rootcore"/>
<use name="CUDADataFormats/Common"/>
<use name="DataFormats/Common"/>
<use name="HeterogeneousCore/CUDAUtilities"/>

<export>
<lib name="1"/>
</export>
63 changes: 63 additions & 0 deletions CUDADataFormats/SiPixelCluster/interface/SiPixelClustersCUDA.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
#ifndef CUDADataFormats_SiPixelCluster_interface_SiPixelClustersCUDA_h
#define CUDADataFormats_SiPixelCluster_interface_SiPixelClustersCUDA_h

#include "HeterogeneousCore/CUDAUtilities/interface/device_unique_ptr.h"
#include "HeterogeneousCore/CUDAUtilities/interface/host_unique_ptr.h"
#include "HeterogeneousCore/CUDAUtilities/interface/cudaCompat.h"

#include <cuda_runtime.h>

class SiPixelClustersCUDA {
public:
SiPixelClustersCUDA() = default;
explicit SiPixelClustersCUDA(size_t maxModules, cudaStream_t stream);
~SiPixelClustersCUDA() = default;

SiPixelClustersCUDA(const SiPixelClustersCUDA &) = delete;
SiPixelClustersCUDA &operator=(const SiPixelClustersCUDA &) = delete;
SiPixelClustersCUDA(SiPixelClustersCUDA &&) = default;
SiPixelClustersCUDA &operator=(SiPixelClustersCUDA &&) = default;

void setNClusters(uint32_t nClusters) { nClusters_h = nClusters; }

uint32_t nClusters() const { return nClusters_h; }

uint32_t *moduleStart() { return moduleStart_d.get(); }
uint32_t *clusInModule() { return clusInModule_d.get(); }
uint32_t *moduleId() { return moduleId_d.get(); }
uint32_t *clusModuleStart() { return clusModuleStart_d.get(); }

uint32_t const *moduleStart() const { return moduleStart_d.get(); }
uint32_t const *clusInModule() const { return clusInModule_d.get(); }
uint32_t const *moduleId() const { return moduleId_d.get(); }
uint32_t const *clusModuleStart() const { return clusModuleStart_d.get(); }

class DeviceConstView {
public:
__device__ __forceinline__ uint32_t moduleStart(int i) const { return __ldg(moduleStart_ + i); }
__device__ __forceinline__ uint32_t clusInModule(int i) const { return __ldg(clusInModule_ + i); }
__device__ __forceinline__ uint32_t moduleId(int i) const { return __ldg(moduleId_ + i); }
__device__ __forceinline__ uint32_t clusModuleStart(int i) const { return __ldg(clusModuleStart_ + i); }

uint32_t const *moduleStart_;
uint32_t const *clusInModule_;
uint32_t const *moduleId_;
uint32_t const *clusModuleStart_;
};

DeviceConstView *view() const { return view_d.get(); }

private:
cms::cuda::device::unique_ptr<uint32_t[]> moduleStart_d; // index of the first pixel of each module
cms::cuda::device::unique_ptr<uint32_t[]> clusInModule_d; // number of clusters found in each module
cms::cuda::device::unique_ptr<uint32_t[]> moduleId_d; // module id of each module

// originally from rechits
cms::cuda::device::unique_ptr<uint32_t[]> clusModuleStart_d; // index of the first cluster of each module

cms::cuda::device::unique_ptr<DeviceConstView> view_d; // "me" pointer

uint32_t nClusters_h = 0;
};

#endif // CUDADataFormats_SiPixelCluster_interface_SiPixelClustersCUDA_h
37 changes: 37 additions & 0 deletions CUDADataFormats/SiPixelCluster/interface/gpuClusteringConstants.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
#ifndef CUDADataFormats_SiPixelCluster_interface_gpuClusteringConstants_h
#define CUDADataFormats_SiPixelCluster_interface_gpuClusteringConstants_h

#include <cstdint>
#include <limits>

namespace pixelGPUConstants {
#ifdef GPU_SMALL_EVENTS
// kept for testing and debugging
constexpr uint32_t maxNumberOfHits = 24 * 1024;
#else
// data at pileup 50 has 18300 +/- 3500 hits; 40000 is around 6 sigma away
// tested on MC events with 55-75 pileup events
constexpr uint32_t maxNumberOfHits = 48 * 1024;
#endif
} // namespace pixelGPUConstants

namespace gpuClustering {
#ifdef GPU_SMALL_EVENTS
// kept for testing and debugging
constexpr uint32_t maxHitsInIter() { return 64; }
#else
// optimized for real data PU 50
// tested on MC events with 55-75 pileup events
constexpr uint32_t maxHitsInIter() { return 160; }
#endif
constexpr uint32_t maxHitsInModule() { return 1024; }

constexpr uint16_t maxNumModules = 2000;
constexpr int32_t maxNumClustersPerModules = maxHitsInModule();
constexpr uint32_t maxNumClusters = pixelGPUConstants::maxNumberOfHits;
constexpr uint16_t invalidModuleId = std::numeric_limits<uint16_t>::max() - 1;
static_assert(invalidModuleId > maxNumModules); // invalidModuleId must be > maxNumModules

} // namespace gpuClustering

#endif // CUDADataFormats_SiPixelCluster_interface_gpuClusteringConstants_h
19 changes: 19 additions & 0 deletions CUDADataFormats/SiPixelCluster/src/SiPixelClustersCUDA.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#include "CUDADataFormats/SiPixelCluster/interface/SiPixelClustersCUDA.h"
#include "HeterogeneousCore/CUDAUtilities/interface/copyAsync.h"
#include "HeterogeneousCore/CUDAUtilities/interface/device_unique_ptr.h"
#include "HeterogeneousCore/CUDAUtilities/interface/host_unique_ptr.h"

SiPixelClustersCUDA::SiPixelClustersCUDA(size_t maxModules, cudaStream_t stream)
: moduleStart_d(cms::cuda::make_device_unique<uint32_t[]>(maxModules + 1, stream)),
clusInModule_d(cms::cuda::make_device_unique<uint32_t[]>(maxModules, stream)),
moduleId_d(cms::cuda::make_device_unique<uint32_t[]>(maxModules, stream)),
clusModuleStart_d(cms::cuda::make_device_unique<uint32_t[]>(maxModules + 1, stream)) {
auto view = cms::cuda::make_host_unique<DeviceConstView>(stream);
view->moduleStart_ = moduleStart_d.get();
view->clusInModule_ = clusInModule_d.get();
view->moduleId_ = moduleId_d.get();
view->clusModuleStart_ = clusModuleStart_d.get();

view_d = cms::cuda::make_device_unique<DeviceConstView>(stream);
cms::cuda::copyAsync(view_d, view, stream);
}
8 changes: 8 additions & 0 deletions CUDADataFormats/SiPixelCluster/src/classes.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#ifndef CUDADataFormats_SiPixelCluster_src_classes_h
#define CUDADataFormats_SiPixelCluster_src_classes_h

#include "CUDADataFormats/Common/interface/Product.h"
#include "CUDADataFormats/SiPixelCluster/interface/SiPixelClustersCUDA.h"
#include "DataFormats/Common/interface/Wrapper.h"

#endif // CUDADataFormats_SiPixelCluster_src_classes_h
4 changes: 4 additions & 0 deletions CUDADataFormats/SiPixelCluster/src/classes_def.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
<lcgdict>
<class name="cms::cuda::Product<SiPixelClustersCUDA>" persistent="false"/>
<class name="edm::Wrapper<cms::cuda::Product<SiPixelClustersCUDA>>" persistent="false"/>
</lcgdict>
10 changes: 10 additions & 0 deletions CUDADataFormats/SiPixelDigi/BuildFile.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
<use name="cuda"/>
<use name="rootcore"/>
<use name="CUDADataFormats/Common"/>
<use name="DataFormats/Common"/>
<use name="DataFormats/SiPixelRawData"/>
<use name="HeterogeneousCore/CUDAUtilities"/>

<export>
<lib name="1"/>
</export>
42 changes: 42 additions & 0 deletions CUDADataFormats/SiPixelDigi/interface/SiPixelDigiErrorsCUDA.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
#ifndef CUDADataFormats_SiPixelDigi_interface_SiPixelDigiErrorsCUDA_h
#define CUDADataFormats_SiPixelDigi_interface_SiPixelDigiErrorsCUDA_h

#include <cuda_runtime.h>

#include "DataFormats/SiPixelRawData/interface/SiPixelErrorCompact.h"
#include "DataFormats/SiPixelRawData/interface/SiPixelFormatterErrors.h"
#include "HeterogeneousCore/CUDAUtilities/interface/SimpleVector.h"
#include "HeterogeneousCore/CUDAUtilities/interface/device_unique_ptr.h"
#include "HeterogeneousCore/CUDAUtilities/interface/host_unique_ptr.h"

class SiPixelDigiErrorsCUDA {
public:
using SiPixelErrorCompactVector = cms::cuda::SimpleVector<SiPixelErrorCompact>;

SiPixelDigiErrorsCUDA() = default;
explicit SiPixelDigiErrorsCUDA(size_t maxFedWords, SiPixelFormatterErrors errors, cudaStream_t stream);
~SiPixelDigiErrorsCUDA() = default;

SiPixelDigiErrorsCUDA(const SiPixelDigiErrorsCUDA&) = delete;
SiPixelDigiErrorsCUDA& operator=(const SiPixelDigiErrorsCUDA&) = delete;
SiPixelDigiErrorsCUDA(SiPixelDigiErrorsCUDA&&) = default;
SiPixelDigiErrorsCUDA& operator=(SiPixelDigiErrorsCUDA&&) = default;

const SiPixelFormatterErrors& formatterErrors() const { return formatterErrors_h; }

SiPixelErrorCompactVector* error() { return error_d.get(); }
SiPixelErrorCompactVector const* error() const { return error_d.get(); }

using HostDataError = std::pair<SiPixelErrorCompactVector, cms::cuda::host::unique_ptr<SiPixelErrorCompact[]>>;
HostDataError dataErrorToHostAsync(cudaStream_t stream) const;

void copyErrorToHostAsync(cudaStream_t stream);

private:
cms::cuda::device::unique_ptr<SiPixelErrorCompact[]> data_d;
cms::cuda::device::unique_ptr<SiPixelErrorCompactVector> error_d;
cms::cuda::host::unique_ptr<SiPixelErrorCompactVector> error_h;
SiPixelFormatterErrors formatterErrors_h;
};

#endif // CUDADataFormats_SiPixelDigi_interface_SiPixelDigiErrorsCUDA_h
85 changes: 85 additions & 0 deletions CUDADataFormats/SiPixelDigi/interface/SiPixelDigisCUDA.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
#ifndef CUDADataFormats_SiPixelDigi_interface_SiPixelDigisCUDA_h
#define CUDADataFormats_SiPixelDigi_interface_SiPixelDigisCUDA_h

#include <cuda_runtime.h>

#include "HeterogeneousCore/CUDAUtilities/interface/device_unique_ptr.h"
#include "HeterogeneousCore/CUDAUtilities/interface/host_unique_ptr.h"
#include "HeterogeneousCore/CUDAUtilities/interface/cudaCompat.h"

class SiPixelDigisCUDA {
public:
SiPixelDigisCUDA() = default;
explicit SiPixelDigisCUDA(size_t maxFedWords, cudaStream_t stream);
~SiPixelDigisCUDA() = default;

SiPixelDigisCUDA(const SiPixelDigisCUDA &) = delete;
SiPixelDigisCUDA &operator=(const SiPixelDigisCUDA &) = delete;
SiPixelDigisCUDA(SiPixelDigisCUDA &&) = default;
SiPixelDigisCUDA &operator=(SiPixelDigisCUDA &&) = default;

void setNModulesDigis(uint32_t nModules, uint32_t nDigis) {
nModules_h = nModules;
nDigis_h = nDigis;
}

uint32_t nModules() const { return nModules_h; }
uint32_t nDigis() const { return nDigis_h; }

uint16_t *xx() { return xx_d.get(); }
uint16_t *yy() { return yy_d.get(); }
uint16_t *adc() { return adc_d.get(); }
uint16_t *moduleInd() { return moduleInd_d.get(); }
int32_t *clus() { return clus_d.get(); }
uint32_t *pdigi() { return pdigi_d.get(); }
uint32_t *rawIdArr() { return rawIdArr_d.get(); }

uint16_t const *xx() const { return xx_d.get(); }
uint16_t const *yy() const { return yy_d.get(); }
uint16_t const *adc() const { return adc_d.get(); }
uint16_t const *moduleInd() const { return moduleInd_d.get(); }
int32_t const *clus() const { return clus_d.get(); }
uint32_t const *pdigi() const { return pdigi_d.get(); }
uint32_t const *rawIdArr() const { return rawIdArr_d.get(); }

cms::cuda::host::unique_ptr<uint16_t[]> adcToHostAsync(cudaStream_t stream) const;
cms::cuda::host::unique_ptr<int32_t[]> clusToHostAsync(cudaStream_t stream) const;
cms::cuda::host::unique_ptr<uint32_t[]> pdigiToHostAsync(cudaStream_t stream) const;
cms::cuda::host::unique_ptr<uint32_t[]> rawIdArrToHostAsync(cudaStream_t stream) const;

class DeviceConstView {
public:
__device__ __forceinline__ uint16_t xx(int i) const { return __ldg(xx_ + i); }
__device__ __forceinline__ uint16_t yy(int i) const { return __ldg(yy_ + i); }
__device__ __forceinline__ uint16_t adc(int i) const { return __ldg(adc_ + i); }
__device__ __forceinline__ uint16_t moduleInd(int i) const { return __ldg(moduleInd_ + i); }
__device__ __forceinline__ int32_t clus(int i) const { return __ldg(clus_ + i); }

uint16_t const *xx_;
uint16_t const *yy_;
uint16_t const *adc_;
uint16_t const *moduleInd_;
int32_t const *clus_;
};

const DeviceConstView *view() const { return view_d.get(); }

private:
// These are consumed by downstream device code
cms::cuda::device::unique_ptr<uint16_t[]> xx_d; // local coordinates of each pixel
cms::cuda::device::unique_ptr<uint16_t[]> yy_d; //
cms::cuda::device::unique_ptr<uint16_t[]> adc_d; // ADC of each pixel
cms::cuda::device::unique_ptr<uint16_t[]> moduleInd_d; // module id of each pixel
cms::cuda::device::unique_ptr<int32_t[]> clus_d; // cluster id of each pixel
cms::cuda::device::unique_ptr<DeviceConstView> view_d; // "me" pointer

// These are for CPU output; should we (eventually) place them to a
// separate product?
cms::cuda::device::unique_ptr<uint32_t[]> pdigi_d; // packed digi (row, col, adc) of each pixel
cms::cuda::device::unique_ptr<uint32_t[]> rawIdArr_d; // DetId of each pixel

uint32_t nModules_h = 0;
uint32_t nDigis_h = 0;
};

#endif // CUDADataFormats_SiPixelDigi_interface_SiPixelDigisCUDA_h
40 changes: 40 additions & 0 deletions CUDADataFormats/SiPixelDigi/src/SiPixelDigiErrorsCUDA.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
#include <cassert>

#include "CUDADataFormats/SiPixelDigi/interface/SiPixelDigiErrorsCUDA.h"
#include "HeterogeneousCore/CUDAUtilities/interface/copyAsync.h"
#include "HeterogeneousCore/CUDAUtilities/interface/device_unique_ptr.h"
#include "HeterogeneousCore/CUDAUtilities/interface/host_unique_ptr.h"
#include "HeterogeneousCore/CUDAUtilities/interface/memsetAsync.h"

SiPixelDigiErrorsCUDA::SiPixelDigiErrorsCUDA(size_t maxFedWords, SiPixelFormatterErrors errors, cudaStream_t stream)
: data_d(cms::cuda::make_device_unique<SiPixelErrorCompact[]>(maxFedWords, stream)),
error_d(cms::cuda::make_device_unique<SiPixelErrorCompactVector>(stream)),
error_h(cms::cuda::make_host_unique<SiPixelErrorCompactVector>(stream)),
formatterErrors_h(std::move(errors)) {
cms::cuda::memsetAsync(data_d, 0x00, maxFedWords, stream);

cms::cuda::make_SimpleVector(error_h.get(), maxFedWords, data_d.get());
assert(error_h->empty());
assert(error_h->capacity() == static_cast<int>(maxFedWords));

cms::cuda::copyAsync(error_d, error_h, stream);
}

void SiPixelDigiErrorsCUDA::copyErrorToHostAsync(cudaStream_t stream) {
cms::cuda::copyAsync(error_h, error_d, stream);
}

SiPixelDigiErrorsCUDA::HostDataError SiPixelDigiErrorsCUDA::dataErrorToHostAsync(cudaStream_t stream) const {
// On one hand size() could be sufficient. On the other hand, if
// someone copies the SimpleVector<>, (s)he might expect the data
// buffer to actually have space for capacity() elements.
auto data = cms::cuda::make_host_unique<SiPixelErrorCompact[]>(error_h->capacity(), stream);

// but transfer only the required amount
if (not error_h->empty()) {
cms::cuda::copyAsync(data, data_d, error_h->size(), stream);
}
auto err = *error_h;
err.set_data(data.get());
return HostDataError(err, std::move(data));
}
46 changes: 46 additions & 0 deletions CUDADataFormats/SiPixelDigi/src/SiPixelDigisCUDA.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
#include "CUDADataFormats/SiPixelDigi/interface/SiPixelDigisCUDA.h"
#include "HeterogeneousCore/CUDAUtilities/interface/copyAsync.h"
#include "HeterogeneousCore/CUDAUtilities/interface/device_unique_ptr.h"
#include "HeterogeneousCore/CUDAUtilities/interface/host_unique_ptr.h"

SiPixelDigisCUDA::SiPixelDigisCUDA(size_t maxFedWords, cudaStream_t stream)
: xx_d(cms::cuda::make_device_unique<uint16_t[]>(maxFedWords, stream)),
yy_d(cms::cuda::make_device_unique<uint16_t[]>(maxFedWords, stream)),
adc_d(cms::cuda::make_device_unique<uint16_t[]>(maxFedWords, stream)),
moduleInd_d(cms::cuda::make_device_unique<uint16_t[]>(maxFedWords, stream)),
clus_d(cms::cuda::make_device_unique<int32_t[]>(maxFedWords, stream)),
view_d(cms::cuda::make_device_unique<DeviceConstView>(stream)),
pdigi_d(cms::cuda::make_device_unique<uint32_t[]>(maxFedWords, stream)),
rawIdArr_d(cms::cuda::make_device_unique<uint32_t[]>(maxFedWords, stream)) {
auto view = cms::cuda::make_host_unique<DeviceConstView>(stream);
view->xx_ = xx_d.get();
view->yy_ = yy_d.get();
view->adc_ = adc_d.get();
view->moduleInd_ = moduleInd_d.get();
view->clus_ = clus_d.get();
cms::cuda::copyAsync(view_d, view, stream);
}

cms::cuda::host::unique_ptr<uint16_t[]> SiPixelDigisCUDA::adcToHostAsync(cudaStream_t stream) const {
auto ret = cms::cuda::make_host_unique<uint16_t[]>(nDigis(), stream);
cms::cuda::copyAsync(ret, adc_d, nDigis(), stream);
return ret;
}

cms::cuda::host::unique_ptr<int32_t[]> SiPixelDigisCUDA::clusToHostAsync(cudaStream_t stream) const {
auto ret = cms::cuda::make_host_unique<int32_t[]>(nDigis(), stream);
cms::cuda::copyAsync(ret, clus_d, nDigis(), stream);
return ret;
}

cms::cuda::host::unique_ptr<uint32_t[]> SiPixelDigisCUDA::pdigiToHostAsync(cudaStream_t stream) const {
auto ret = cms::cuda::make_host_unique<uint32_t[]>(nDigis(), stream);
cms::cuda::copyAsync(ret, pdigi_d, nDigis(), stream);
return ret;
}

cms::cuda::host::unique_ptr<uint32_t[]> SiPixelDigisCUDA::rawIdArrToHostAsync(cudaStream_t stream) const {
auto ret = cms::cuda::make_host_unique<uint32_t[]>(nDigis(), stream);
cms::cuda::copyAsync(ret, rawIdArr_d, nDigis(), stream);
return ret;
}
Loading