benchmarking the mr weather model with user defined MPI ranks/OpenMP threads

I am trying to benchmark the mr weather model application by completely subscribing all the available cores on our machine which happens to be 128 (AMD 7742,AMD 7713).
In past, we have benchmarked WRF weather application with 16 MPI ranks x8 threads per process, 32 MPI ranks x 4 threads per process and 128 MPI ranks and we target to benchmark the mr weather model in similar way.

I have managed to create the setup by following the instructions at -   and


The default recommendations for MPI ranks (by the model) for C96, 192 and 384 resolutions of ths model are -  108, 180, 252.
example - 
user1@node001] ./create_newcase --case DORIAN_C96_GFSv15p2 --compset GFSv15p2 --res C96 --workflow ufs-mrweather --machine cluster1
user1@node001] cd DORIAN_C96_GFSv15p2
user1@node001] ./pelayout
ATM :    108/     1;      0
user1@node001] ./xmlchange NTASKS_ATM=128
For your changes to take effect, run:
./case.setup --reset
user1@node001]  ./case.setup --reset
user1@node001]  ./
Building case in directory /home/user1/Milan_UFS/1.1.0_intel/src/my_ufs_sandbox/cime/scripts/DORIAN_C96_GFSv15p2
sharedlib_only is False
model_only is False
Setting Environment OMP_STACKSIZE=256M
Setting Environment OMP_NUM_THREADS=1
Generating component namelists as part of build
Creating component namelists
  2021-07-23 09:45:19 atm
   Calling /home/user1/Milan_UFS/1.1.0_intel/src/my_ufs_sandbox/src/model/FV3/cime/cime_config/buildnml
Checking /home/user1/Milan_UFS/1.1.0_intel/datasetmrweather/inputs/ufs_inputdata/icfiles/201908/20190829 directory to find raw input files ...
Found 'atm.input.ic.grb2' in input directory
CHGRES namelist option 'input_type' is set to grib2
CHGRES will use /home/user1/Milan_UFS/1.1.0_intel/datasetmrweather/inputs/ufs_inputdata/icfiles/201908/20190829/atm.input.ic.grb2
removing file /home/user1/Milan_UFS/1.1.0_intel/src/my_ufs_sandbox/cime/scripts/DORIAN_C96_GFSv15p2/Buildconf/ufsatm.input_data_list
input_type grib2 nstf_name ['0', '1', '0', '0', '0']
ERROR: Total number of PE need to be consistent with the model namelist options:
        Total number of PE (ntask_atm) = 128
        Decomposition in x and y direction (layout) = 4x4
        Number of tile (ntiles) = 6
        Number of I/O group (write_groups) = 1
        Number of tasks in each I/O group (write_tasks_per_group) = 12


I understand that the general recommendations like 128 MPI ranks, 32 MPI ranks x 4 threads/rank, and 16 MPI ranks x 8 threads/rank may not be applicable here as the ranks seem to be governed by the “pe layout formula” .
could you please advice on recommendations to customize this application/application input so that i can subscribe all available cores.
or is there any other dataset which i can use with UFS model for aforementioned launch configs.


the issue is resolved.
i was able to modify the run for 128 , 32 MPI processes per node.
however the 32 process runs are unable to spawn the threads specified via OMP_NUM_THREADS.
i will raise a separate thread, the current thread can be closed.