Skip to content
This repository has been archived by the owner on Nov 18, 2022. It is now read-only.

openmm-hip Testing and Benchmarks #1

Open
tictooc opened this issue Mar 20, 2022 · 5 comments
Open

openmm-hip Testing and Benchmarks #1

tictooc opened this issue Mar 20, 2022 · 5 comments

Comments

@tictooc
Copy link

tictooc commented Mar 20, 2022

This is not an issue, just wanted to report that the conda version of this plugin along with the StreamHPC/openmm fork are working without issue on ROCm 5.0.2 and the the latest stable kernel(5.16.15).

test_openmm_hip.sh passes all tests.

Test Results
#1: TestHipAmoebaExtrapolatedPolarization
Done

#2: TestHipAmoebaGeneralizedKirkwoodForce
Done

#3: TestHipAmoebaMultipoleForce
Done

#4: TestHipAmoebaTorsionTorsionForce
Done

#5: TestHipAmoebaVdwForce
Done

#6: TestHipAndersenThermostat
Done

#7: TestHipBrownianIntegrator
Done

#8: TestHipCheckpoints
Done

#9: TestHipCMAPTorsionForce
Done

#10: TestHipCMMotionRemover
Done

#11: TestHipCompiler
Done

#12: TestHipCompoundIntegrator
Done

#13: TestHipCustomAngleForce
Done

#14: TestHipCustomBondForce
Done

#15: TestHipCustomCentroidBondForce
Done

#16: TestHipCustomCompoundBondForce
Done

#17: TestHipCustomCVForce
Done

#18: TestHipCustomExternalForce
Done

#19: TestHipCustomGBForce
Done

#20: TestHipCustomHbondForce
Done

#21: TestHipCustomIntegrator
Done

#22: TestHipCustomManyParticleForce
Done

#23: TestHipCustomNonbondedForce
Done

#24: TestHipCustomTorsionForce
Done

#25: TestHipDispersionPME
Done

#26: TestHipDrudeForce
Done

#27: TestHipDrudeLangevinIntegrator
Done

#28: TestHipDrudeNoseHoover
Done

#29: TestHipDrudeSCFIntegrator
Done

#30: TestHipEwald
Done

#31: TestHipFFTImplFFT3D
Done

#32: TestHipFFTImplHipFFT
realToComplex: 0 xsize: 28 ysize: 25 zsize: 30
realToComplex: 1 xsize: 28 ysize: 25 zsize: 25
realToComplex: 1 xsize: 25 ysize: 28 zsize: 25
realToComplex: 1 xsize: 25 ysize: 25 zsize: 28
realToComplex: 1 xsize: 21 ysize: 25 zsize: 27
realToComplex: 1 xsize: 49 ysize: 98 zsize: 14
realToComplex: 1 xsize: 7 ysize: 21 zsize: 98
realToComplex: 1 xsize: 98 ysize: 21 zsize: 21
realToComplex: 1 xsize: 18 ysize: 98 zsize: 6
realToComplex: 1 xsize: 98 ysize: 98 zsize: 98
Done

#33: TestHipFFTImplVkFFT
Done

#34: TestHipGayBerneForce
Done

#35: TestHipGBSAOBCForce
Done

#36: TestHipHarmonicAngleForce
Done

#37: TestHipHarmonicBondForce
Done

#38: TestHipHippoNonbondedForce
Done

#39: TestHipLangevinIntegrator
Done

#40: TestHipLangevinMiddleIntegrator
Done

#41: TestHipLocalEnergyMinimizer
Done

#42: TestHipMonteCarloAnisotropicBarostat
Done

#43: TestHipMonteCarloBarostat
Done

#44: TestHipMonteCarloFlexibleBarostat
Done

#45: TestHipMultipleForces
Done

#46: TestHipNonbondedForce
Done

#47: TestHipNoseHooverIntegrator
Done

#48: TestHipPeriodicTorsionForce
Done

#49: TestHipRandom
Done

#50: TestHipRBTorsionForce
Done

#51: TestHipRMSDForce
Done

#52: TestHipRpmd
Done

#53: TestHipSettle
Done

#54: TestHipSort
Done

#55: TestHipVariableLangevinIntegrator
Done

#56: TestHipVariableVerletIntegrator
Done

#57: TestHipVerletIntegrator
Done

#58: TestHipVirtualSites
Done

#59: TestHipWcaDispersionForce
Done
----------------
All tests passed
----------------

All of the benchmarks except for the amber20-factorix (upstream issue #3391) benchmark run without issue. benchmark.py output using the draft benchmark.py #3386 with a few local changes for HIP and system info

Full benchmark.py output
$ python benchmark_new_hip.py --seconds=30 --ensemble=NVT --precision=single --bond-constraints=hbonds --platform=OpenCL,HIP
timestamp: 2022-03-20T17:05:34.618117
openmm_version: 7.7.0.dev-ce22dbe
cpuinfo: AMD Ryzen Threadripper 3960X 24-Core Processor
system: Linux
kernel: 5.16.15-arch1-1-lean
gpu: Vega 20 [Radeon VII]

test: gbsa
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: OpenCL
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'}
steps: 58331
elapsed_time: 30.055014
ns_per_day: 670.74311128253

test: gbsa
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: HIP
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'}
steps: 107412
elapsed_time: 30.014081
ns_per_day: 1236.8057246197206

test: rf
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: OpenCL
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'}
steps: 26619
elapsed_time: 29.868414
ns_per_day: 308.00183766034576

test: rf
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: HIP
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'}
steps: 99106
elapsed_time: 29.946284
ns_per_day: 1143.749040782489

test: pme
cutoff: 0.9
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: OpenCL
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'}
steps: 22134
elapsed_time: 29.426031
ns_per_day: 259.95726029106675

test: pme
cutoff: 0.9
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: HIP
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'}
steps: 66469
elapsed_time: 29.941122
ns_per_day: 767.2286429346234

test: apoa1rf
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: OpenCL
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'}
steps: 8716
elapsed_time: 30.673099
ns_per_day: 98.20493195030602

test: apoa1rf
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: HIP
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'}
steps: 31438
elapsed_time: 30.389406
ns_per_day: 357.5250138156698

test: apoa1pme
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: OpenCL
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'}
steps: 7624
elapsed_time: 32.082976
ns_per_day: 82.1262466424561

test: apoa1pme
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: HIP
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'}
steps: 21071
elapsed_time: 30.273105
ns_per_day: 240.54809045851087

test: apoa1ljpme
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: OpenCL
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'}
steps: 7131
elapsed_time: 31.659479
ns_per_day: 77.84315086170558

test: apoa1ljpme
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: HIP
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'}
steps: 16211
elapsed_time: 30.193642
ns_per_day: 185.55302470632722

test: amoebagk
epsilon: 1e-05
constraints: None
hydrogen_mass: 1
ensemble: NVT
timestep_in_fs: 2.0
precision: single
platform: OpenCL
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'}
steps: 191
elapsed_time: 27.422316
ns_per_day: 1.2035744902071728

test: amoebagk
epsilon: 1e-05
constraints: None
hydrogen_mass: 1
ensemble: NVT
timestep_in_fs: 2.0
precision: single
platform: HIP
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'}
steps: 3344
elapsed_time: 28.809248
ns_per_day: 20.05755929484865

test: amoebapme
epsilon: 1e-05
constraints: None
hydrogen_mass: 1
ensemble: NVT
timestep_in_fs: 2.0
precision: single
platform: OpenCL
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'}
steps: 1383
elapsed_time: 28.796577
ns_per_day: 8.298986369109077

test: amoebapme
epsilon: 1e-05
constraints: None
hydrogen_mass: 1
ensemble: NVT
timestep_in_fs: 2.0
precision: single
platform: HIP
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'}
steps: 1579
elapsed_time: 28.977509
ns_per_day: 9.41596463657383

test: amber20-dhfr
cutoff: 0.9
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: OpenCL
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'}
steps: 25188
elapsed_time: 30.44635
ns_per_day: 285.91186792505505

test: amber20-dhfr
cutoff: 0.9
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: HIP
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'}
steps: 67339
elapsed_time: 30.016727
ns_per_day: 775.3129913198063

test: amber20-cellulose
cutoff: 0.9
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: OpenCL
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'}
steps: 1521
elapsed_time: 31.051195
ns_per_day: 16.928739779580134

test: amber20-cellulose
cutoff: 0.9
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: HIP
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'}
steps: 5463
elapsed_time: 29.788209
ns_per_day: 63.381212344790505

test: amber20-stmv
cutoff: 0.9
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: OpenCL
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'}
steps: 480
elapsed_time: 29.724693
ns_per_day: 5.580814577294373

test: amber20-stmv
cutoff: 0.9
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: HIP
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'}
steps: 2196
elapsed_time: 30.422483
ns_per_day: 24.946602813452134

OpenCL vs HIP Performance Summary

System

OS: Arch Linux
Kernel: 5.16.15
ROCm Version: 5.0.2
OpenMM Version: OpenMM 7.7 | Git Revision: ce22dbef84ec68aa910bbffed0f5e801e76ed9be
CPU: AMD Ryzen Threadripper 3960X @ 4.2GHz
GPU: AMD Radeon VII @ 2120core|1200mem

Test OpenCl (ns/day) HIP (ns/day) Performance Improvement
gbsa 670.7 1236.8 84%
rf 308.0 1143.7 271%
pme 260.0 767.2 195%
apoa1rf 98.2 357.5 264%
apoa1pme 82.1 240.6 193%
apoa1ljpme 77.8 185.6 138%
amoebagk 1.2 20.1 1567%
amoebapme 8.3 9.4 13%
amber20-dhfr 285.9 775.3 171%
amber20-cellulose 16.9 63.4 275%
amber20-stmv 5.6 24.9 347%
@ex-rzr
Copy link
Collaborator

ex-rzr commented Mar 21, 2022

@tictooc

Thank you very much for testing!

It's overclocked, right? What is its default frequencies? The specs say that GPU has 1400 and 1750 in boost and memory has 1000.

@tictooc
Copy link
Author

tictooc commented Mar 21, 2022

Yes those results were with the GPU highly overclocked. At stock, the core clock boosts to 1775-1800MHZ and the memory speed is 1000MHz.

Here is a run at stock clocks for comparison. That should fall somewhere right around the expected results on an MI50, since these are all single precision benchmarks. The only change from stock is to set the perf level to high to minimize the noise from the somewhat inconsistent boost algorithm on Vega 20. Average clocks during the below benchmark run were 1770-1790MHz.

Full benchmark.py output
$ python benchmark_new_hip.py --outfile bench_opencl-HIP_stock.json --seconds=30 --ensemble=NVT --precision=single --bond-constraints=hbonds --platform=OpenCL,HIP --device=2
timestamp: 2022-03-21T11:18:49.514128
openmm_version: 7.7.0.dev-ce22dbe
cpuinfo: AMD Ryzen Threadripper 3960X 24-Core Processor
system: Linux
kernel: 5.16.15-arch1-1-lean
gpu: Vega 20 [Radeon VII]

test: gbsa
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: OpenCL
platform_properties: {'DeviceIndex': '2', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'}
steps: 50746
elapsed_time: 28.94034
ns_per_day: 605.9990172886703

test: gbsa
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: HIP
platform_properties: {'DeviceIndex': '2', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'}
steps: 95612
elapsed_time: 29.996725
ns_per_day: 1101.570494779013

test: rf
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: OpenCL
platform_properties: {'DeviceIndex': '2', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'}
steps: 23620
elapsed_time: 29.824848
ns_per_day: 273.7003722533639

test: rf
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: HIP
platform_properties: {'DeviceIndex': '2', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'}
steps: 88283
elapsed_time: 30.015825
ns_per_day: 1016.4839647086161

test: pme
cutoff: 0.9
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: OpenCL
platform_properties: {'DeviceIndex': '2', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'}
steps: 19720
elapsed_time: 29.78403
ns_per_day: 228.8216873270675

test: pme
cutoff: 0.9
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: HIP
platform_properties: {'DeviceIndex': '2', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'}
steps: 59097
elapsed_time: 30.076975
ns_per_day: 679.0550977949079

test: apoa1rf
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: OpenCL
platform_properties: {'DeviceIndex': '2', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'}
steps: 7673
elapsed_time: 30.797974
ns_per_day: 86.1027027297315

test: apoa1rf
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: HIP
platform_properties: {'DeviceIndex': '2', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'}
steps: 27781
elapsed_time: 30.581991
ns_per_day: 313.94664918971426

test: apoa1pme
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: OpenCL
platform_properties: {'DeviceIndex': '2', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'}
steps: 6700
elapsed_time: 32.295395
ns_per_day: 71.69814767709141

test: apoa1pme
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: HIP
platform_properties: {'DeviceIndex': '2', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'}
steps: 18394
elapsed_time: 30.219251
ns_per_day: 210.3614811631168

test: apoa1ljpme
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: OpenCL
platform_properties: {'DeviceIndex': '2', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'}
steps: 6333
elapsed_time: 32.101047
ns_per_day: 68.18110325186588

test: apoa1ljpme
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: HIP
platform_properties: {'DeviceIndex': '2', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'}
steps: 14249
elapsed_time: 30.316464
ns_per_day: 162.43498582156548

test: amoebagk
epsilon: 1e-05
constraints: None
hydrogen_mass: 1
ensemble: NVT
timestep_in_fs: 2.0
precision: single
platform: OpenCL
platform_properties: {'DeviceIndex': '2', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'}
steps: 166
elapsed_time: 27.44593
ns_per_day: 1.0451385688151211

test: amoebagk
epsilon: 1e-05
constraints: None
hydrogen_mass: 1
ensemble: NVT
timestep_in_fs: 2.0
precision: single
platform: HIP
platform_properties: {'DeviceIndex': '2', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'}
steps: 2955
elapsed_time: 29.091936
ns_per_day: 17.552080411561466

test: amoebapme
epsilon: 1e-05
constraints: None
hydrogen_mass: 1
ensemble: NVT
timestep_in_fs: 2.0
precision: single
platform: OpenCL
platform_properties: {'DeviceIndex': '2', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'}
steps: 1161
elapsed_time: 27.578498
ns_per_day: 7.2745368511367055

test: amoebapme
epsilon: 1e-05
constraints: None
hydrogen_mass: 1
ensemble: NVT
timestep_in_fs: 2.0
precision: single
platform: HIP
platform_properties: {'DeviceIndex': '2', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'}
steps: 1377
elapsed_time: 28.98677
ns_per_day: 8.208765585127281

test: amber20-dhfr
cutoff: 0.9
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: OpenCL
platform_properties: {'DeviceIndex': '2', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'}
steps: 21875
elapsed_time: 29.927539
ns_per_day: 252.61014612661594

test: amber20-dhfr
cutoff: 0.9
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: HIP
platform_properties: {'DeviceIndex': '2', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'}
steps: 59507
elapsed_time: 30.070277
ns_per_day: 683.9185152833809

test: amber20-cellulose
cutoff: 0.9
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: OpenCL
platform_properties: {'DeviceIndex': '2', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'}
steps: 1288
elapsed_time: 30.104517
ns_per_day: 14.786246196874705

test: amber20-cellulose
cutoff: 0.9
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: HIP
platform_properties: {'DeviceIndex': '2', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'}
steps: 4757
elapsed_time: 29.699142
ns_per_day: 55.35578098518804

test: amber20-stmv
cutoff: 0.9
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: OpenCL
platform_properties: {'DeviceIndex': '2', 'DeviceName': 'gfx906:sramecc+:xnack-', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'}
steps: 418
elapsed_time: 29.779175
ns_per_day: 4.851067902317642

test: amber20-stmv
cutoff: 0.9
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: HIP
platform_properties: {'DeviceIndex': '2', 'DeviceName': 'AMD Radeon VII', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'}
steps: 1902
elapsed_time: 30.237341
ns_per_day: 21.739054369893168

OpenCL vs HIP Performance Summary

System

OS: Arch Linux
Kernel: 5.16.15
ROCm Version: 5.0.2
OpenMM Version: OpenMM 7.7 | Git Revision: ce22dbef84ec68aa910bbffed0f5e801e76ed9be
CPU: AMD Ryzen Threadripper 3960X @ 4.2GHz
GPU: AMD Radeon VII @ 1750core|1000mem

Test OpenCl (ns/day) HIP (ns/day) Performance Improvement
gbsa 606 1101.6 82%
rf 273.7 1016.5 265%
pme 228.8 679.1 197%
apoa1rf 86.1 313.9 265%
apoa1pme 71.7 210.4 193%
apoa1ljpme 68.2 162.4 138%
amoebagk 1.1 17.6 1500%
amoebapme 7.3 8.2 12%
amber20-dhfr 252.6 683.9 171%
amber20-cellulose 14.8 55.4 274%
amber20-stmv 4.8 21.7 352%

@tictooc
Copy link
Author

tictooc commented May 24, 2022

Additional test results with a 6900XT. Improvements are even greater than on the Radeon VII. The 6900XT was tested at default clocks, with only a change in fan speed and setting power to 293W. This did allow the GPU to run at a higher boost clock on the HIP tests, which bump right up against the power limit when it is set at 293W.

1 failed test on test_openmm_hip.sh
Failed test #32

Test Results
tictoc@TickTockMedia $ ./test_openmm_hip.sh

#1: TestHipAmoebaExtrapolatedPolarization
Done

#2: TestHipAmoebaGeneralizedKirkwoodForce
Done

#3: TestHipAmoebaMultipoleForce
Done

#4: TestHipAmoebaTorsionTorsionForce
Done

#5: TestHipAmoebaVdwForce
Done

#6: TestHipAndersenThermostat
Done

#7: TestHipBrownianIntegrator
Done

#8: TestHipCheckpoints
Done

#9: TestHipCMAPTorsionForce
Done

#10: TestHipCMMotionRemover
Done

#11: TestHipCompiler
Done

#12: TestHipCompoundIntegrator
Done

#13: TestHipCustomAngleForce
Done

#14: TestHipCustomBondForce
Done

#15: TestHipCustomCentroidBondForce
Done

#16: TestHipCustomCompoundBondForce
Done

#17: TestHipCustomCVForce
Done

#18: TestHipCustomExternalForce
Done

#19: TestHipCustomGBForce
Done

#20: TestHipCustomHbondForce
Done

#21: TestHipCustomIntegrator
Done

#22: TestHipCustomManyParticleForce
Done

#23: TestHipCustomNonbondedForce
Done

#24: TestHipCustomTorsionForce
Done

#25: TestHipDispersionPME
Done

#26: TestHipDrudeForce
Done

#27: TestHipDrudeLangevinIntegrator
Done

#28: TestHipDrudeNoseHoover
Done

#29: TestHipDrudeSCFIntegrator
Done

#30: TestHipEwald
Done

#31: TestHipFFTImplFFT3D
Done

#32: TestHipFFTImplHipFFT
realToComplex: 0 xsize: 28 ysize: 25 zsize: 30
realToComplex: 1 xsize: 28 ysize: 25 zsize: 25
realToComplex: 1 xsize: 25 ysize: 28 zsize: 25
realToComplex: 1 xsize: 25 ysize: 25 zsize: 28
realToComplex: 1 xsize: 21 ysize: 25 zsize: 27
realToComplex: 1 xsize: 49 ysize: 98 zsize: 14
realToComplex: 1 xsize: 7 ysize: 21 zsize: 98
realToComplex: 1 xsize: 98 ysize: 21 zsize: 21
realToComplex: 1 xsize: 18 ysize: 98 zsize: 6
realToComplex: 1 xsize: 98 ysize: 98 zsize: 98
exception: Error executing hipFFT: 6
realToComplex: 0 xsize: 28 ysize: 25 zsize: 30
realToComplex: 1 xsize: 28 ysize: 25 zsize: 25
realToComplex: 1 xsize: 25 ysize: 28 zsize: 25
realToComplex: 1 xsize: 25 ysize: 25 zsize: 28
realToComplex: 1 xsize: 21 ysize: 25 zsize: 27
realToComplex: 1 xsize: 49 ysize: 98 zsize: 14
realToComplex: 1 xsize: 7 ysize: 21 zsize: 98
realToComplex: 1 xsize: 98 ysize: 21 zsize: 21
realToComplex: 1 xsize: 18 ysize: 98 zsize: 6
realToComplex: 1 xsize: 98 ysize: 98 zsize: 98
exception: Error executing hipFFT: 6
realToComplex: 0 xsize: 28 ysize: 25 zsize: 30
realToComplex: 1 xsize: 28 ysize: 25 zsize: 25
realToComplex: 1 xsize: 25 ysize: 28 zsize: 25
realToComplex: 1 xsize: 25 ysize: 25 zsize: 28
realToComplex: 1 xsize: 21 ysize: 25 zsize: 27
realToComplex: 1 xsize: 49 ysize: 98 zsize: 14
realToComplex: 1 xsize: 7 ysize: 21 zsize: 98
realToComplex: 1 xsize: 98 ysize: 21 zsize: 21
realToComplex: 1 xsize: 18 ysize: 98 zsize: 6
realToComplex: 1 xsize: 98 ysize: 98 zsize: 98
exception: Error executing hipFFT: 6

#33: TestHipFFTImplVkFFT
Done

#34: TestHipGayBerneForce
Done

#35: TestHipGBSAOBCForce
Done

#36: TestHipHarmonicAngleForce
Done

#37: TestHipHarmonicBondForce
Done

#38: TestHipHippoNonbondedForce
Done

#39: TestHipLangevinIntegrator
Done

#40: TestHipLangevinMiddleIntegrator
Done

#41: TestHipLocalEnergyMinimizer
Done

#42: TestHipMonteCarloAnisotropicBarostat
Done

#43: TestHipMonteCarloBarostat
Done

#44: TestHipMonteCarloFlexibleBarostat
Done

#45: TestHipMultipleForces
Done

#46: TestHipNonbondedForce
Done

#47: TestHipNoseHooverIntegrator
Done

#48: TestHipPeriodicTorsionForce
Done

#49: TestHipRandom
Done

#50: TestHipRBTorsionForce
Done

#51: TestHipRMSDForce
Done

#52: TestHipRpmd
Done

#53: TestHipSettle
Done

#54: TestHipSort
Done

#55: TestHipVariableLangevinIntegrator
Done

#56: TestHipVariableVerletIntegrator
Done

#57: TestHipVerletIntegrator
Done

#58: TestHipVirtualSites
Done

#59: TestHipWcaDispersionForce
Done
------------
Failed tests
------------

#32 TestHipFFTImplHipFFT
  • Test #32 also fails on my Radeon VII with ROCm 5.1.3, so it does not appear to be an issue specific to the 6900XT, which is not officially supported on the ROCm stack.
Full benchmark.py output
$ python benchmark_new_hip.py --outfile bench_opencl-hip_6900XT_amdStagingKernel.json --seconds=30 --ensemble=NVT --precision=single --bond-constraints=hbonds --platform=HIP,OpenCL
timestamp: 2022-05-24T00:02:23.087467
openmm_version: 7.7.0.dev-ce22dbe
cpuinfo: AMD Ryzen 3 3200G with Radeon Vega Graphics
system: Linux
kernel: 5.16.0-1-amd-staging-drm-next-git-02007-g8bb14fbec5ae
gpu: Navi 21 [Radeon RX 6800/6800 XT / 6900 XT]

test: gbsa
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: HIP
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon RX 6900 XT', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'}
steps: 161892
elapsed_time: 29.922807
ns_per_day: 1869.8070404958999

test: gbsa
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: OpenCL
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx1030', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'}
steps: 75024
elapsed_time: 30.117439
ns_per_day: 860.9063473159187

test: rf
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: HIP
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon RX 6900 XT', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'}
steps: 131862
elapsed_time: 30.028146
ns_per_day: 1517.626402908791

test: rf
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: OpenCL
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx1030', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'}
steps: 31242
elapsed_time: 30.19554
ns_per_day: 357.5771521224657

test: pme
cutoff: 0.9
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: HIP
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon RX 6900 XT', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'}
steps: 92186
elapsed_time: 29.964265
ns_per_day: 1063.249227037606

test: pme
cutoff: 0.9
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: OpenCL
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx1030', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'}
steps: 22985
elapsed_time: 29.415364
ns_per_day: 270.04989637388127

test: apoa1rf
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: HIP
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon RX 6900 XT', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'}
steps: 49923
elapsed_time: 30.482591
ns_per_day: 566.0079486025317

test: apoa1rf
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: OpenCL
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx1030', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'}
steps: 11061
elapsed_time: 30.899304
ns_per_day: 123.71416521226494

test: apoa1pme
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: HIP
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon RX 6900 XT', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'}
steps: 35015
elapsed_time: 30.424812
ns_per_day: 397.74063353292036

test: apoa1pme
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: OpenCL
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx1030', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'}
steps: 9571
elapsed_time: 30.878581
ns_per_day: 107.1207773440107

test: apoa1ljpme
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: HIP
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon RX 6900 XT', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'}
steps: 28515
elapsed_time: 30.431483
ns_per_day: 323.83515453387525

test: apoa1ljpme
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: OpenCL
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx1030', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'}
steps: 8888
elapsed_time: 30.55586
ns_per_day: 100.527126384268

test: amoebagk
epsilon: 1e-05
constraints: None
hydrogen_mass: 1
ensemble: NVT
timestep_in_fs: 2.0
precision: single
platform: HIP
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon RX 6900 XT', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'}
steps: 6745
elapsed_time: 29.206119
ns_per_day: 39.90725368201093

test: amoebagk
epsilon: 1e-05
constraints: None
hydrogen_mass: 1
ensemble: NVT
timestep_in_fs: 2.0
precision: single
platform: OpenCL
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx1030', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'}
steps: 382
elapsed_time: 27.458828
ns_per_day: 2.403948194729942

test: amoebapme
epsilon: 1e-05
constraints: None
hydrogen_mass: 1
ensemble: NVT
timestep_in_fs: 2.0
precision: single
platform: HIP
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon RX 6900 XT', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'}
steps: 2982
elapsed_time: 29.458663
ns_per_day: 17.491954743499388

test: amoebapme
epsilon: 1e-05
constraints: None
hydrogen_mass: 1
ensemble: NVT
timestep_in_fs: 2.0
precision: single
platform: OpenCL
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx1030', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'}
steps: 2000
elapsed_time: 29.5524
ns_per_day: 11.694481666463634

test: amber20-dhfr
cutoff: 0.9
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: HIP
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon RX 6900 XT', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'}
steps: 93363
elapsed_time: 29.9992
ns_per_day: 1075.57044187845

test: amber20-dhfr
cutoff: 0.9
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: OpenCL
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx1030', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'}
steps: 24060
elapsed_time: 30.048
ns_per_day: 276.7284345047923

test: amber20-cellulose
cutoff: 0.9
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: HIP
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon RX 6900 XT', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'}
steps: 8601
elapsed_time: 30.356126
ns_per_day: 97.92111154104445

test: amber20-cellulose
cutoff: 0.9
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: OpenCL
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx1030', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'}
steps: 2167
elapsed_time: 30.371548
ns_per_day: 24.658446780519714

test: amber20-stmv
cutoff: 0.9
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: HIP
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'AMD Radeon RX 6900 XT', 'UseBlockingSync': 'true', 'Precision': 'single', 'UseCpuPme': 'false', 'HipCompiler': '/opt/rocm/bin/hipcc', 'HipAllowRuntimeCompiler': 'false', 'TempDirectory': '/tmp', 'HipHostCompiler': '', 'DisablePmeStream': 'false', 'DeterministicForces': 'false'}
steps: 3101
elapsed_time: 31.046426
ns_per_day: 34.51945161095193

test: amber20-stmv
cutoff: 0.9
constraints: HBonds
hydrogen_mass: 1.5
ensemble: NVT
timestep_in_fs: 4.0
precision: single
platform: OpenCL
platform_properties: {'DeviceIndex': '0', 'DeviceName': 'gfx1030', 'OpenCLPlatformIndex': '0', 'OpenCLPlatformName': 'AMD Accelerated Parallel Processing', 'Precision': 'single', 'UseCpuPme': 'false', 'DisablePmeStream': 'false'}
steps: 687
elapsed_time: 29.641908
ns_per_day: 8.009848758723626

OpenCL vs HIP Performance Summary

System

OS: Arch Linux
Kernel: 5.16.0-1-amd-staging-drm-next-git-02007-g8bb14fbec5ae
ROCm Version: 5.1.1
OpenMM Version: OpenMM 7.7 | Git Revision: ce22dbef84ec68aa910bbffed0f5e801e76ed9be
CPU: AMD Ryzen 3200G
GPU: AMD Radeon 6900XT @ 2575core(OpenCL) 2505core(avg HIP)|2000mem

Test OpenCl (ns/day) HIP (ns/day) Performance Improvement
gbsa 860.9 1869.8 117%
rf 357.6 1517.6 324%
pme 270 1063.2 293%
apoa1rf 123.7 566 360%
apoa1pme 107.1 397.7 271%
apoa1ljpme 100.5 323.8 222%
amoebagk 2.4 39.9 1563%
amoebapme 11.7 17.4 49%
amber20-dhfr 276.7 1075.6 289%
amber20-cellulose 24.7 97.9 296%
amber20-stmv 8 34.5 331%

@ex-rzr
Copy link
Collaborator

ex-rzr commented May 25, 2022

Thank you, @tictooc!

It's interesting, the hipFFT test fails on RDNA. We added this test because we encountered correctness issues for some FFT sizes on older versions of rocFFT. And now it happens again. I guess we'll need to investigate it further and report to rocFFT developers.

@tictooc
Copy link
Author

tictooc commented May 25, 2022

The same test fails identically on Vega 20 (at least on the Radeon VII) running ROCm 5.1.3. I'll roll back to ROCm 5.0.2, and see if I can find the regression.

--Edit--
The fft test on the older version of ROCm went through a few different progressions, but ultimately was able to pass.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants