Error in Write-Out Frequency on Hera

I am attempting a 1km horizontal resolution run on Hera, but I am currently running into an error with the simulation on Hera that I was not experiencing while running on Jet. My simulation will run fine and create the expected netcdf files (dynf000.nc, phyf000.nc, etc.) every hour, but the data inside the netcdf files only changes every 13 hours. For example, phyf000.nc through phyf012.nc will all have the exact same values while phyf013.nc will update to the correct values and files phyf013.nc through phyf026.nc will then all have the same values. The time stamp in the file does change. This error does not happen for my 3km simulations.

For reference, here is what I have setup for the write component:

write_groups:            20
write_tasks_per_group:   4
nfhout:     1
nfhmax_hf:  60
nfhout_hf:  1
nsout:      -1

fdiag = 1
 

Are there any other settings for the write component that I might be missing? I have been increasing the write_groups and write_tasks_per_group, but do I need to increase it even more?

Thank you!
-David

Thank you, Linlin! My directory is here:

/scratch2/BMC/wrfruc/David.M.Wright/expt_dir/TestH7_GL_1km_Dec2017_JetCopy_MYNNUpdate
 

Jeff, I have fhcyc = 0.0.

The run in this directory seems crashed in the middle:

 in fcst run phase 2, na=        3354

srun: error: h23c30: task 468: Killed

srun: launch/slurm: _step_signal: Terminating StepId=21906288.0

slurmstepd: error: *** STEP 21906288.0 ON h1c03 CANCELLED AT 2021-08-25T19:44:59 ***

forrtl: error (78): process killed (SIGTERM)

Image              PC                Routine            Line        Source

ufs_model          0000000004059C8F  Unknown               Unknown  Unknown

libpthread-2.17.s  00002B34AEE0E630  Unknown               Unknown  Unknown

ufs_model          000000000270DC90  Unknown               Unknown  Unknown

forrtl: error (78): process killed (SIGTERM)

Sorry, Linlin. Try this log file:

/scratch2/BMC/wrfruc/David.M.Wright/expt_dir/TestH3_GL_1km_Dec2017_MYNNUpdate/log/run_fcst_2017122412.log.0

 

While the simulation presented in this log file did not complete due to a wallclock limit, the simulation does have the same problems described in the original post. Some of the values might be slightly different (write_groups, write_tasks_per_group) for this run than what I originally presented, but the results are the same.

I checked the directory /scratch2/BMC/wrfruc/David.M.Wright/expt_dir/TestH3_GL_1km_Dec2017_MYNNUpdate/2017122412 and don't see the same output.

Is it the correct directory?

Thanks!

My apologies! I forgot that I had moved the data off the machine to save some space... I just recreated a run here:

 

/scratch2/BMC/wrfruc/David.M.Wright/expt_dir/TestH7_GL_1km_Dec2017_JetCopy_MYNNUpdate/2017122412
 

This one ended a few hours early due to my poor wallclock estimate, but the outcome is still the same with output in the .nc files only changing every 13 hours.

Thank you for looking at this!
-David

The similar issue was also reported at other places and it was fixed with recent updates:

https://github.com/ufs-community/ufs-weather-model/issues/339

https://github.com/ufs-community/ufs-weather-model/issues/674

Another issue in the regional workflow is that the conda loading part needs to be updated, e.g.,

https://github.com/NOAA-EMC/regional_workflow/pull/542

Thanks,

Linlin

David - This is the PR that fixed the output file interval issue in the ufs-weather-model:  https://github.com/ufs-community/ufs-weather-model/pull/691  It also includes some other changes, so you may want to update to the latest develop branch, if possible.  Otherwise, the issue listed above by Linlin (#339) has a workaround, to ensure that the namelist input.nml and the model.configure are consistent.

Laurie

Thank you Linlin and Laurie for the responses! I (embarrassingly) figured out that the issue was with my dt_atmos variable and not the write out components. I had accidently set the dt_atmos to 13s instead of 12s. This caused the timestep to only be on the hour every 13 hours, explaining why it was not updating. Changing it to be 12s has solved the problem on both the newer and older pulls of the ufs-weather-model.

Thank you again for your time!!

-David