I am trying to benchmark the mr weather model application by completely subscribing all the available cores on our machine which happens to be 128 (AMD 7742,AMD 7713).
In past, we have benchmarked WRF weather application with 16 MPI ranks x8 threads per process, 32 MPI ranks x 4 threads per process and 128 MPI ranks and we target to benchmark the mr weather model in similar way.
I have managed to create the setup by following the instructions at -
The default recommendations for MPI ranks (by the model) for C96, 192 and 384 resolutions of ths model are - 108, 180, 252.
user1@node001] ./create_newcase --case DORIAN_C96_GFSv15p2 --compset GFSv15p2 --res C96 --workflow ufs-mrweather --machine cluster1
user1@node001] cd DORIAN_C96_GFSv15p2
Comp NTASKS NTHRDS ROOTPE
ATM : 108/ 1; 0
user1@node001] ./xmlchange NTASKS_ATM=128
For your changes to take effect, run:
user1@node001] ./case.setup --reset
Building case in directory /home/user1/Milan_UFS/1.1.0_intel/src/my_ufs_sandbox/cime/scripts/DORIAN_C96_GFSv15p2
sharedlib_only is False
model_only is False
Setting Environment OMP_STACKSIZE=256M
Setting Environment OMP_NUM_THREADS=1
Generating component namelists as part of build
Creating component namelists
2021-07-23 09:45:19 atm
Checking /home/user1/Milan_UFS/1.1.0_intel/datasetmrweather/inputs/ufs_inputdata/icfiles/201908/20190829 directory to find raw input files ...
Found 'atm.input.ic.grb2' in input directory
CHGRES namelist option 'input_type' is set to grib2
CHGRES will use /home/user1/Milan_UFS/1.1.0_intel/datasetmrweather/inputs/ufs_inputdata/icfiles/201908/20190829/atm.input.ic.grb2
removing file /home/user1/Milan_UFS/1.1.0_intel/src/my_ufs_sandbox/cime/scripts/DORIAN_C96_GFSv15p2/Buildconf/ufsatm.input_data_list
input_type grib2 nstf_name ['0', '1', '0', '0', '0']
ERROR: Total number of PE need to be consistent with the model namelist options:
Total number of PE (ntask_atm) = 128
Decomposition in x and y direction (layout) = 4x4
Number of tile (ntiles) = 6
Number of I/O group (write_groups) = 1
Number of tasks in each I/O group (write_tasks_per_group) = 12
I understand that the general recommendations like 128 MPI ranks, 32 MPI ranks x 4 threads/rank, and 16 MPI ranks x 8 threads/rank may not be applicable here as the ranks seem to be governed by the “pe layout formula” .
could you please advice on recommendations to customize this application/application input so that i can subscribe all available cores.
or is there any other dataset which i can use with UFS model for aforementioned launch configs.