Hi,
I am trying to run the mr weather model example with 32 MPI ranks and 4 threads per MPI rank.
the run fails with following error -
[0] Rayleigh friction E-folding time (days):
[0] 1 0.379150775374218 10.8096140887264
[0] 2 0.963871677296582 16.5818681310322
[0] 3 1.76542623475949 29.0560344329854
[0] 4 2.67225797307616 53.4602794273955
[0] 5 3.70625064534251 110.544757762893
[0] 6 4.88725381108638 293.610300159162
[0] 7 6.23670999273840 1568.14167700556
[20]
[20] FATAL from PE 20: NaN in input field of mpp_reproducing_sum(_2d), this indicates numerical instability
[20]
[20] Abort(1) on node 20 (rank 20 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 20
[15]
[15] FATAL from PE 15: NaN in input field of mpp_reproducing_sum(_2d), this indicates numerical instability
[15]
[15] Abort(1) on node 15 (rank 15 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 15
[18]
[18] FATAL from PE 18: Overflow in mpp_reproducing_sum(_2d) conversion of 4.73530E+57
[18]
[18] Abort(1) on node 18 (rank 18 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 18
[22]
[22] FATAL from PE 22: NaN in input field of mpp_reproducing_sum(_2d), this indicates numerical instability
[22]
[22] Abort(1) on node 22 (rank 22 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 22
I had used the following in user_nl_ufsatm -
layout = 2,2
write_groups = 8
write_tasks_per_group = 1
Do i need to use different layout/write_groups/write_tasks_per_group values to make the hybrid (MPI+OpenMP) runs work ?
Also, in case values in user_nl_ufsatm could be triggering this issue then are there are some recommendations for the value selection?
Please advice .
I have so far successfully tested run with 128 MPI ranks x 1 OpenMP thread with following user_nl-ufsatm -
layout = 2,4
write_groups = 80
write_tasks_per_group = 1
- 99 views