-
Notifications
You must be signed in to change notification settings - Fork 1
Explanation of the ioserver code in gcm_setup
@ MODEL_NPES = $NX * $NY
# Calculate OSERVER nodes based on recommended algorithm
if ( $DO_IOS == TRUE ) then
# In the calculations below, the weird bc-awk command is to round up the floating point calcs
# First we calculate the number of model nodes
set NUM_MODEL_NODES=`echo "scale=6;($MODEL_NPES / $NCPUS_PER_NODE)" | bc | awk 'function ceil(x, y){y=int(x); return(x>y?y+1:y)} {print ceil($1)}'`
# Next the number of frontend PEs is 10% of the model PEs
set NUM_FRONTEND_PES=`echo "scale=6;($MODEL_NPES * 0.1)" | bc | awk 'function ceil(x, y){y=int(x); return(x>y?y+1:y)} {print ceil($1)}'`
# Now we roughly figure out the number of collections in the HISTORY.rc (this is not perfect, but is close to right)
set NUM_HIST_COLLECTIONS=`cat $TMPHIST | sed -n '/^COLLECTIONS:/,/^ *::$/{p;/^ *::$/q}' | grep -v '^ *#' | wc -l`
# And the total number of oserver PEs is frontend PEs plus number of history collections
@ NUM_OSERVER_PES=$NUM_FRONTEND_PES + $NUM_HIST_COLLECTIONS
# Now calculate the number of oserver nodes
set NUM_OSERVER_NODES=`echo "scale=6;($NUM_OSERVER_PES / $NCPUS_PER_NODE)" | bc | awk 'function ceil(x, y){y=int(x); return(x>y?y+1:y)} {print ceil($1)}'`
# And then the number of backend PEs is the number of history collections divided by the number of oserver nodes
set NUM_BACKEND_PES=`echo "scale=6;($NUM_HIST_COLLECTIONS / $NUM_OSERVER_NODES)" | bc | awk 'function ceil(x, y){y=int(x); return(x>y?y+1:y)} {print ceil($1)}'`
# multigroup requires at least two backend pes
if ($NUM_BACKEND_PES < 2) set NUM_BACKEND_PES = 2
# Calculate the total number of nodes to request from batch
@ NODES=$NUM_MODEL_NODES + $NUM_OSERVER_NODES
else
# Calculate the number of model nodes
set NODES=`echo "scale=6;($MODEL_NPES / $NCPUS_PER_NODE)" | bc | awk 'function ceil(x, y){y=int(x); return(x>y?y+1:y)} {print ceil($1)}'`
set NUM_OSERVER_NODES = 0
set NUM_BACKEND_PES = 0
endif
The first step of the code:
@ MODEL_NPES = $NX * $NY
set NUM_MODEL_NODES=`echo "scale=6;($MODEL_NPES / $NCPUS_PER_NODE)" | bc | awk 'function ceil(x, y){y=int(x); return(x>y?y+1:y)} {print ceil($1)}'`
is first you calculate the number of processes for the model (MODEL_NPES) by multiplying NX and NY. Then, to get the number of model nodes (NUM_MODEL_NODES), you divide the number of processes by the number of CPUs per node (NCPUS_PER_NODE) and round up. NCPUS_PER_NODE was set when the user selected what node type they wanted.
Assume:
NX = 24
NY = NX * 6 = 144
NCPUS_PER_NODE = 40
then:
MODEL_NPES = NX * NY = 24 * 144 = 3456
NUM_MODEL_NODES = ceil(MODEL_NPES / NCPUS_PER_NODE) = ceil(3456 / 40) = ceil(86.4) = 87
The next step of the code:
set NUM_FRONTEND_PES=`echo "scale=6;($MODEL_NPES * 0.1)" | bc | awk 'function ceil(x, y){y=int(x); return(x>y?y+1:y)} {print ceil($1)}'`
is to calculate the number of frontend PEs (NUM_FRONTEND_PES) by multiplying the number of model processes (MODEL_NPES) by 0.1 (10%). Again, the result is rounded up.
We have:
MODEL_NPES = 3456
then:
NUM_FRONTEND_PES = ceil(MODEL_NPES * 0.1) = ceil(3456 * 0.1) = ceil(345.6) = 346
This bit of code:
set NUM_HIST_COLLECTIONS=`cat $TMPHIST | sed -n '/^COLLECTIONS:/,/^ *::$/{p;/^ *::$/q}' | grep -v '^ *#' | wc -l`
looks at the HISTORY template selected by the user (TMPHIST) and counts the number of collections by counting the number of lines between the line beginning with "COLLECTIONS:" and the line beginning with "::" and then, excluding lines beginning with "#".
For example:
COLLECTIONS: 'geosgcm_prog'
# 'prog.eta'
'geosgcm_surf'
'geosgcm_ocn'
'geosgcm_moist'
'geosgcm_turb'
'geosgcm_gwd'
'geosgcm_tend'
'geosgcm_budi'
'geosgcm_buda'
'geosgcm_landice'
'geosgcm_meltwtr'
'geosgcm_snowlayer'
'geosgcm_tracer'
>>>HIST_GOCART<<< 'tavg2d_aer_x'
>>>HIST_GOCART<<< 'tavg3d_aer_p'
# 'geosgcm_iau'
# 'geosgcm_conv'
# 'goswim_catch'
# 'goswim_land'
# 'goswim_landice'
# 'geosgcm_lidar'
# 'geosgcm_parasol'
# 'geosgcm_modis'
# 'geosgcm_radar'
# 'geosgcm_isccp'
# 'geosgcm_misr'
::
So above that would be 15 collections. Now this is not perfect, but it's ... a good estimate.
We have:
NUM_HIST_COLLECTIONS = 15
This bit of code:
@ NUM_OSERVER_PES=$NUM_FRONTEND_PES + $NUM_HIST_COLLECTIONS
adds the number of frontend PEs (NUM_FRONTEND_PES) to the number of history collections (NUM_HIST_COLLECTIONS) to get the total number of oserver PEs
We have:
NUM_FRONTEND_PES = 346
NUM_HIST_COLLECTIONS = 13
then:
NUM_OSERVER_PES = NUM_FRONTEND_PES + NUM_HIST_COLLECTIONS = 346 + 13 = 359
This bit of code:
set NUM_OSERVER_NODES=`echo "scale=6;($NUM_OSERVER_PES / $NCPUS_PER_NODE)" | bc | awk 'function ceil(x, y){y=int(x); return(x>y?y+1:y)} {print ceil($1)}'`
divides the number of oserver PEs (NUM_OSERVER_PES) by the number of CPUs per node (NCPUS_PER_NODE) and rounds up to get the number of oserver nodes.
We have:
NUM_OSERVER_PES = 359
NCPUS_PER_NODE = 40
then:
NUM_OSERVER_NODES = ceil(NUM_OSERVER_PES / NCPUS_PER_NODE) = ceil(359 / 40) = ceil(8.975) = 9
This bit of code:
set NUM_BACKEND_PES=`echo "scale=6;($NUM_HIST_COLLECTIONS / $NUM_OSERVER_NODES)" | bc | awk 'function ceil(x, y){y=int(x); return(x>y?y+1:y)} {print ceil($1)}'`
if ($NUM_BACKEND_PES < 2) set NUM_BACKEND_PES = 2
divides the number of history collections (NUM_HIST_COLLECTIONS) by the number of oserver nodes (NUM_OSERVER_NODES) and rounds up to get the number of backend PEs. And we force the number of backend PEs to be at least 2.
We have:
NUM_HIST_COLLECTIONS = 13
NUM_OSERVER_NODES = 9
then:
NUM_BACKEND_PES = ceil(NUM_HIST_COLLECTIONS / NUM_OSERVER_NODES) = ceil(13 / 9) = ceil(1.444) = 2
Since we got 2 from the calculation, we don't need to force it to be at least 2.
This bit of code:
@ NODES=$NUM_MODEL_NODES + $NUM_OSERVER_NODES
adds the number of model nodes (NUM_MODEL_NODES) to the number of oserver nodes (NUM_OSERVER_NODES) to get the total number of nodes to request from batch.
We have:
NUM_MODEL_NODES = 87
NUM_OSERVER_NODES = 9
then:
NODES = NUM_MODEL_NODES + NUM_OSERVER_NODES = 87 + 9 = 96
If DO_IOS is FALSE, then the code just calculates the number of model nodes (NUM_MODEL_NODES) and sets the number of oserver nodes (NUM_OSERVER_NODES) and the number of backend PEs (NUM_BACKEND_PES) to 0.
This bit of code:
set NODES=`echo "scale=6;($MODEL_NPES / $NCPUS_PER_NODE)" | bc | awk 'function ceil(x, y){y=int(x); return(x>y?y+1:y)} {print ceil($1)}'`
divides the number of model processes (MODEL_NPES) by the number of CPUs per node (NCPUS_PER_NODE) and rounds up to get the number of model nodes.
We have:
MODEL_NPES = 3456
NCPUS_PER_NODE = 40
then:
NODES = ceil(MODEL_NPES / NCPUS_PER_NODE) = ceil(3456 / 40) = ceil(86.4) = 87
And then we set:
NUM_OSERVER_NODES = 0
NUM_BACKEND_PES = 0