-
Notifications
You must be signed in to change notification settings - Fork 16
Scale up submission
FaHui Lin edited this page Oct 31, 2022
·
11 revisions
When one finds Harvester does not submit enough workers (in a cycle) to fill a PQ, go the following.
Acronyms:
- PQ = PanDA Queue
First of all, check the queue configuration of the PQ recognized by Harvester with harvester-admin command:
$ harvester-admin qconf dump <your_PQ>
All parameters of the PQ will be displayed
E.g.
# /opt/harvester/local/bin/harvester-admin qconf dump CERN-PROD_UCORE_2
CERN-PROD_UCORE_2
-----------------
allowJobMixture = False
configID = 57706
ddmEndpointIn = None
getJobCriteria = None
initEventsMultipler = 2
jobType = ANY
mapType = NoJob
maxNewWorkersPerCycle = 10
maxSubmissionAttempts = 3
maxWorkers = 250
nNewWorkers = 0
nQueueLimitJob = None
nQueueLimitJobMax = None
nQueueLimitJobMin = None
nQueueLimitJobRatio = None
nQueueLimitWorker = 20
noHeartbeat = running,transferring,finished,failed
pandaQueueName = CERN-PROD_UCORE_2
prefetchEvents = False
prodSourceLabel = managed
queueName = CERN-PROD_UCORE_2
queueStatus = online
resourceType = ANY
runMode = self
...
Confirm which workflow mode the PQ is in, by checking mapType
and runMode
parameters in the queue config.
Common workflow modes:
-
Pull mode:
- queue config parameters:
mapType = NoJob
andrunMode = self
- queue config parameters:
-
Pull-UPS (Pull & Unified Pilot Streaming) mode:
- queue config parameters:
mapType = NoJob
andrunMode = slave
- Note that in this mode, submission of workers of the PQ is triggered by PanDA server. I.e. If PanDA server thinks that the PQ does not need to run more jobs, then it won't tell the Harvester to submit ! In this case, check with job status in PanDA and CRIC setup (maybe not enough activated panda jobs in the PQ)
- queue config parameters:
-
Push mode:
- queue config parameters:
mapType = OneToOne
(or any other string thanNoJob
)
- queue config parameters:
Getting started |
---|
Installation and configuration |
Testing and running |
Debugging |
Work with Middleware |
Admin FAQ |
Development guides |
---|
Development workflow |
Tagging |
Production & commissioning |
---|
Scale up submission |
Condor experiences |
Commissioning on the grid |
Production servers |
Service monitoring |
Auto Queue Configuration with CRIC |
SSH+RPC middleware setup |
Kubernetes section |
---|
Kubernetes setup |
X509 credentials |
AWS setup |
GKE setup |
CERN setup |
CVMFS installation |
Generic service accounts |
Advanced payloads |
---|
Horovod integration |