Skip to content

Explanation of the ioserver code in gcm_setup

sshakoor1 edited this page Sep 27, 2023 · 2 revisions

Explicit code

@ MODEL_NPES = $NX * $NY

# Calculate OSERVER nodes based on recommended algorithm
if ( $DO_IOS == TRUE ) then

   # In the calculations below, the weird bc-awk command is to round up the floating point calcs

   # First we calculate the number of model nodes
   set NUM_MODEL_NODES=`echo "scale=6;($MODEL_NPES / $NCPUS_PER_NODE)" | bc | awk 'function ceil(x, y){y=int(x); return(x>y?y+1:y)} {print ceil($1)}'`

   # Next the number of frontend PEs is 10% of the model PEs
   set NUM_FRONTEND_PES=`echo "scale=6;($MODEL_NPES * 0.1)" | bc | awk 'function ceil(x, y){y=int(x); return(x>y?y+1:y)} {print ceil($1)}'`

   # Now we roughly figure out the number of collections in the HISTORY.rc (this is not perfect, but is close to right)
   set NUM_HIST_COLLECTIONS=`cat $TMPHIST | sed -n '/^COLLECTIONS:/,/^ *::$/{p;/^ *::$/q}' | grep -v '^ *#' | wc -l`

   # And the total number of oserver PEs is frontend PEs plus number of history collections
   @ NUM_OSERVER_PES=$NUM_FRONTEND_PES + $NUM_HIST_COLLECTIONS

   # Now calculate the number of oserver nodes
   set NUM_OSERVER_NODES=`echo "scale=6;($NUM_OSERVER_PES / $NCPUS_PER_NODE)" | bc | awk 'function ceil(x, y){y=int(x); return(x>y?y+1:y)} {print ceil($1)}'`

   # And then the number of backend PEs is the number of history collections divided by the number of oserver nodes
   set NUM_BACKEND_PES=`echo "scale=6;($NUM_HIST_COLLECTIONS / $NUM_OSERVER_NODES)" | bc | awk 'function ceil(x, y){y=int(x); return(x>y?y+1:y)} {print ceil($1)}'`

   # multigroup requires at least two backend pes
   if ($NUM_BACKEND_PES < 2) set NUM_BACKEND_PES = 2

   # Calculate the total number of nodes to request from batch
   @ NODES=$NUM_MODEL_NODES + $NUM_OSERVER_NODES

else
   # Calculate the number of model nodes
   set NODES=`echo "scale=6;($MODEL_NPES / $NCPUS_PER_NODE)" | bc | awk 'function ceil(x, y){y=int(x); return(x>y?y+1:y)} {print ceil($1)}'`

   set NUM_OSERVER_NODES = 0
   set NUM_BACKEND_PES   = 0
endif

What the code is doing if DO_IOS=TRUE

Calculate the number of model nodes

The first step of the code:

@ MODEL_NPES = $NX * $NY
set NUM_MODEL_NODES=`echo "scale=6;($MODEL_NPES / $NCPUS_PER_NODE)" | bc | awk 'function ceil(x, y){y=int(x); return(x>y?y+1:y)} {print ceil($1)}'`

is first you calculate the number of processes for the model (MODEL_NPES) by multiplying NX and NY. Then, to get the number of model nodes (NUM_MODEL_NODES), you divide the number of processes by the number of CPUs per node (NCPUS_PER_NODE) and round up. NCPUS_PER_NODE was set when the user selected what node type they wanted.

C720 Example

Assume:

NX = 24
NY = NX * 6 = 144
NCPUS_PER_NODE = 40

then:

MODEL_NPES = NX * NY = 24 * 144 = 3456
NUM_MODEL_NODES = ceil(MODEL_NPES / NCPUS_PER_NODE) = ceil(3456 / 40) = ceil(86.4) = 87

Calculate the number of frontend PEs

The next step of the code:

set NUM_FRONTEND_PES=`echo "scale=6;($MODEL_NPES * 0.1)" | bc | awk 'function ceil(x, y){y=int(x); return(x>y?y+1:y)} {print ceil($1)}'`

is to calculate the number of frontend PEs (NUM_FRONTEND_PES) by multiplying the number of model processes (MODEL_NPES) by 0.1 (10%). Again, the result is rounded up.

C720 Example

We have:

MODEL_NPES = 3456

then:

NUM_FRONTEND_PES = ceil(MODEL_NPES * 0.1) = ceil(3456 * 0.1) = ceil(345.6) = 346

Calculate the number of history collections

This bit of code:

set NUM_HIST_COLLECTIONS=`cat $TMPHIST | sed -n '/^COLLECTIONS:/,/^ *::$/{p;/^ *::$/q}' | grep -v '^ *#' | wc -l`

looks at the HISTORY template selected by the user (TMPHIST) and counts the number of collections by counting the number of lines between the line beginning with "COLLECTIONS:" and the line beginning with "::" and then, excluding lines beginning with "#".

For example:

COLLECTIONS: 'geosgcm_prog'
#             'prog.eta'
             'geosgcm_surf'
             'geosgcm_ocn'
             'geosgcm_moist'
             'geosgcm_turb'
             'geosgcm_gwd'
             'geosgcm_tend'
             'geosgcm_budi'
             'geosgcm_buda'
             'geosgcm_landice'
             'geosgcm_meltwtr'
             'geosgcm_snowlayer'
             'geosgcm_tracer'
>>>HIST_GOCART<<<             'tavg2d_aer_x'
>>>HIST_GOCART<<<             'tavg3d_aer_p'
#             'geosgcm_iau'
#             'geosgcm_conv'
#             'goswim_catch'
#             'goswim_land'
#             'goswim_landice'
#             'geosgcm_lidar'
#             'geosgcm_parasol'
#             'geosgcm_modis'
#             'geosgcm_radar'
#             'geosgcm_isccp'
#             'geosgcm_misr'
             ::

So above that would be 15 collections. Now this is not perfect, but it's ... a good estimate.

C720 Example

We have:

NUM_HIST_COLLECTIONS = 15

Calculate the total number of oserver PEs

This bit of code:

@ NUM_OSERVER_PES=$NUM_FRONTEND_PES + $NUM_HIST_COLLECTIONS

adds the number of frontend PEs (NUM_FRONTEND_PES) to the number of history collections (NUM_HIST_COLLECTIONS) to get the total number of oserver PEs

C720 Example

We have:

NUM_FRONTEND_PES = 346
NUM_HIST_COLLECTIONS = 13

then:

NUM_OSERVER_PES = NUM_FRONTEND_PES + NUM_HIST_COLLECTIONS = 346 + 13 = 359

Calculate the number of oserver nodes

This bit of code:

set NUM_OSERVER_NODES=`echo "scale=6;($NUM_OSERVER_PES / $NCPUS_PER_NODE)" | bc | awk 'function ceil(x, y){y=int(x); return(x>y?y+1:y)} {print ceil($1)}'`

divides the number of oserver PEs (NUM_OSERVER_PES) by the number of CPUs per node (NCPUS_PER_NODE) and rounds up to get the number of oserver nodes.

C720 Example

We have:

NUM_OSERVER_PES = 359
NCPUS_PER_NODE = 40

then:

NUM_OSERVER_NODES = ceil(NUM_OSERVER_PES / NCPUS_PER_NODE) = ceil(359 / 40) = ceil(8.975) = 9

Calculate the number of backend PEs

This bit of code:

set NUM_BACKEND_PES=`echo "scale=6;($NUM_HIST_COLLECTIONS / $NUM_OSERVER_NODES)" | bc | awk 'function ceil(x, y){y=int(x); return(x>y?y+1:y)} {print ceil($1)}'`
if ($NUM_BACKEND_PES < 2) set NUM_BACKEND_PES = 2

divides the number of history collections (NUM_HIST_COLLECTIONS) by the number of oserver nodes (NUM_OSERVER_NODES) and rounds up to get the number of backend PEs. And we force the number of backend PEs to be at least 2.

C720 Example

We have:

NUM_HIST_COLLECTIONS = 13
NUM_OSERVER_NODES = 9

then:

NUM_BACKEND_PES = ceil(NUM_HIST_COLLECTIONS / NUM_OSERVER_NODES) = ceil(13 / 9) = ceil(1.444) = 2

Since we got 2 from the calculation, we don't need to force it to be at least 2.

Calculate the total number of nodes

This bit of code:

@ NODES=$NUM_MODEL_NODES + $NUM_OSERVER_NODES

adds the number of model nodes (NUM_MODEL_NODES) to the number of oserver nodes (NUM_OSERVER_NODES) to get the total number of nodes to request from batch.

C720 Example

We have:

NUM_MODEL_NODES = 87
NUM_OSERVER_NODES = 9

then:

NODES = NUM_MODEL_NODES + NUM_OSERVER_NODES = 87 + 9 = 96

What the code is doing if DO_IOS=FALSE

If DO_IOS is FALSE, then the code just calculates the number of model nodes (NUM_MODEL_NODES) and sets the number of oserver nodes (NUM_OSERVER_NODES) and the number of backend PEs (NUM_BACKEND_PES) to 0.

Calculate the number of model nodes

This bit of code:

set NODES=`echo "scale=6;($MODEL_NPES / $NCPUS_PER_NODE)" | bc | awk 'function ceil(x, y){y=int(x); return(x>y?y+1:y)} {print ceil($1)}'`

divides the number of model processes (MODEL_NPES) by the number of CPUs per node (NCPUS_PER_NODE) and rounds up to get the number of model nodes.

C720 Example

We have:

MODEL_NPES = 3456
NCPUS_PER_NODE = 40

then:

NODES = ceil(MODEL_NPES / NCPUS_PER_NODE) = ceil(3456 / 40) = ceil(86.4) = 87

And then we set:

NUM_OSERVER_NODES = 0
NUM_BACKEND_PES = 0