Scale up submission

How to make Harvester submit more workers

When one finds Harvester does not submit enough workers (in a cycle) to fill a PQ, go the following.

Acronyms:

PQ = PanDA Queue

Check Queue Configuration Parameters

First of all, check the queue configuration of the PQ recognized by Harvester with harvester-admin command:

$ harvester-admin qconf dump <your_PQ>

All parameters of the PQ will be displayed

E.g.

# /opt/harvester/local/bin/harvester-admin qconf dump CERN-PROD_UCORE_2

CERN-PROD_UCORE_2
-----------------
 allowJobMixture = False
 configID = 57706
 ddmEndpointIn = None
 getJobCriteria = None
 initEventsMultipler = 2
 jobType = ANY
 mapType = NoJob
 maxNewWorkersPerCycle = 10
 maxSubmissionAttempts = 3
 maxWorkers = 250
 nNewWorkers = 0
 nQueueLimitJob = None
 nQueueLimitJobMax = None
 nQueueLimitJobMin = None
 nQueueLimitJobRatio = None
 nQueueLimitWorker = 20
 noHeartbeat = running,transferring,finished,failed
 pandaQueueName = CERN-PROD_UCORE_2
 prefetchEvents = False
 prodSourceLabel = managed
 queueName = CERN-PROD_UCORE_2
 queueStatus = online
 resourceType = ANY
 runMode = self
 ...

Workflow Mode of the PQ

Confirm which workflow mode the PQ is in, by checking mapType and runMode parameters in the queue config.

Common workflow modes:

Pull mode:
- queue config parameters:
```
mapType = NoJob
runMode = self
```
Pull-UPS (Pull & Unified Pilot Streaming) mode:
- queue config parameters:
```
mapType = NoJob
runMode = slave
```
- Note that in this mode, submission of workers of the PQ is triggered by PanDA server. I.e. If PanDA server thinks that the PQ does not need to run more jobs, then it won't tell the Harvester to submit ! In this case, check with job status in PanDA and CRIC setup (maybe not enough activated panda jobs in the PQ)
Push mode:
- queue config parameters:
```
mapType = OneToOne
runMode = self
```
  mapType can also be OneToMany, ManyToOne, ManyToMany

Make sure one knows which workflow mode the PQ is in.

If the workflow is not the one requires, one should modify the queue configuration of the PQ.

See more details about workflows supported in Harvester.

Parameters about Workers and Jobs of the PQ

The following queue config parameters controls number of workers (queuing & total) of the PQ on Harvester:

Overall Worker Caps

maxWorkers: Maximum unfinished workers allowed in the PQ. If number of all unfinished workers (submitted + idle + running) exceeds maxWorkers, Harvester will not submit more workers
maxNewWorkersPerCycle: Maximum workers to submit to the PQ in every submission cycle. (The submitter cycle frequency and other setting are configured in panda_harvester.cfg [submitter] section.)

Queuing Worker Caps

These parameters apply to both pull and push modes. Number of queuing (submitted) workers are limited by one of the following groups of parameters.

Either static:

nQueueLimitWorker: Maximum queuing workers allowed in the PQ. If number of queuing workers (submitted) exceeds nQueueLimitWorker, Harvester will not submit more workers

Or dynamic:

nQueueLimitWorkerMin: Minimum queuing workers to keep in the PQ. If number of queuing workers (submitted) is less than nQueueLimitWorkerMin, Harvester will submit more worker until reaching nQueueLimitWorkerMin
nQueueLimitWorkerMax: Maximum queuing workers allowed in the PQ. If number of queuing workers (submitted) exceeds nQueueLimitWorkerMax, Harvester will not submit more workers
nQueueLimitWorkerRatio: The target ratio percentage of queuing and running workers. When number of queuing workers (submitted) is between nQueueLimitWorkerMin and nQueueLimitWorkerMin, if number_of_queuing_workers / number_of_running_workers > nQueueLimitWorkerRatio*100% , then Harvester will not submit more workers

Queuing Job Caps

For PQs in Push mode, one of the following groups of parameters (only works for Push mode) controls number of queuing jobs (and prefetched jobs to Harvester). Note that for Push mode, caps on queuing jobs also limits number of queuing workers due to worker-job mapping.

Either static:

nQueueLimitJob: Maximum queuing jobs in Harvester allowed in the PQ. If number of queuing jobs exceeds nQueueLimitJob, Harvester will not submit more workers

Or dynamic:

nQueueLimitJobMin: Minimum queuing jobs to keep in the PQ. If number of queuing jobs is less than nQueueLimitJobMin, Harvester will submit more worker until number of queuing jobs reaches nQueueLimitJobMin
nQueueLimitJobMax: Maximum queuing jobs allowed in the PQ. If number of queuing jobs exceeds nQueueLimitJobMax, Harvester will not submit more workers
nQueueLimitJobRatio: The target ratio percentage of queuing and running jobs. When number of queuing jobs is between nQueueLimitJobMin and nQueueLimitJobMax, if number_of_queuing_jobs / number_of_running_jobs > nQueueLimitJobRatio*100% , then Harvester will not submit more workers

One should modify those parameters in queue config to meet the scale they required.

N.B. For Harvester PQs that have dynamic queue config (can get parameters from CRIC), one can configure those parameters under Associate Parameters on the CRIC PQ page

Restart Service

After modifying parameters in queue config, restart Harvester service so that new setup takes effect.

N.B. Harvester refreshes queue config every 10 minutes, so the change in queue config will eventually take effect without restarting the service. However, restarting the service guarantees all agents in Harvester to restart and run with new parameter values at once, which is usually better in manual changes cases. If the change is only about tweaking number of workers/jobs (say, nQueueLimitWorker) and one can wait, then one does not need to restart the service.

Home

Getting started
Installation and configuration
Testing and running
Debugging
Work with Middleware
Admin FAQ

Developer pages
Code structure
DB structure
DB-schema-changes
State and sequence diagrams
Plugin API specifications
Agents and Plugins descriptions
Plugin utilities
Workflows supported by harvester
Developer Q&A
Release

Development guides
Development workflow
Tagging

Production & commissioning
Scale up submission
Condor experiences
Commissioning on the grid
Production servers
Service monitoring
Auto Queue Configuration with CRIC
SSH+RPC middleware setup

Kubernetes section
Kubernetes setup
X509 credentials
AWS setup
GKE setup
CERN setup
CVMFS installation
Generic service accounts

Advanced payloads
Horovod integration

Provide feedback

Saved searches

Use saved searches to filter your results more quickly