FATAL ERROR: OPENING INPUT SOURCE FILE: NetCDF: HDF error

Dear all,

I am currently trying to run the workflow on the Linux platform.

However, when I was trying to run "./run_make_sfc_climo.sh", I met "FATAL ERROR: OPENING INPUT SOURCE FILE: NetCDF: HDF error"

The netcdf I have is cray-netcdf 4.6.3. I also tried 4.7.4 but it is the same.

I have attached the slurm file. Any bits of help is much appreciated.

 

Thank you,

Haochen

 

Please ignore the GCC error in the slurm file. The HPC I used set 8.3.0 as default so when I use 9.3.0 it shows this error but it does not affect the program to run.

I am not sure why this program is able to open other file like:  

- OPEN MODEL GRID MOSAIC FILE: /global/cscratch1/sd/htan2013/UFS_TestCase/TEST_CONUS_25km_GFSv15p2/fix_lam/C403_mosaic.halo4.nc

 - OPEN FIRST MODEL GRID OROGRAPHY FILE: /global/cscratch1/sd/htan2013/UFS_TestCase/TEST_CONUS_25km_GFSv15p2/fix_lam/C403_oro_data.tile7.halo4.nc

 

But it can not open:

- OPEN FILE: /global/homes/h/htan2013/UFS_Static/sfc_climo/vegetation_type.igbp.0.05.nc

 

 

 

SLURM.out:

Module Deprecation Warning: upgrade to 3.21.3 (new default on 2021-09-21)
gcc/8.3.0(3):ERROR:150: Module 'gcc/8.3.0' conflicts with the currently loaded module(s) 'gcc/9.3.0'
gcc/8.3.0(3):ERROR:102: Tcl command execution failed: conflict gcc

+ source /global/cscratch1/sd/htan2013/UFS_TestCase/TEST_CONUS_25km_GFSv15p2/var_defns.sh
++ RUN_ENVIR=community
++ MACHINE=LINUX
++ ACCOUNT=htan2013
++ WORKFLOW_MANAGER=none
++ SCHED=none
++ PARTITION_DEFAULT=
++ CLUSTERS_DEFAULT=
++ QUEUE_DEFAULT=
++ PARTITION_HPSS=
++ CLUSTERS_HPSS=
++ QUEUE_HPSS=
++ PARTITION_FCST=
++ CLUSTERS_FCST=
++ QUEUE_FCST=
++ RUN_CMD_UTILS='mpirun -np 1'
++ RUN_CMD_POST='mpirun -np 1'
++ USE_CRON_TO_RELAUNCH=FALSE
++ CRON_RELAUNCH_INTVL_MNTS=03
++ EXPT_BASEDIR=/global/cscratch1/sd/htan2013/UFS_TestCase
++ EXPT_SUBDIR=TEST_CONUS_25km_GFSv15p2
++ COMINgfs=/base/path/of/directory/containing/gfs/input/files
++ STMP=/base/path/of/directory/containing/model/input/and/raw/output/files
++ NET=rrfs
++ envir=para
++ RUN=experiment_name
++ PTMP=/base/path/of/directory/containing/postprocessed/output/files
++ DOT_OR_USCORE=_
++ EXPT_CONFIG_FN=config.sh
++ RGNL_GRID_NML_FN=regional_grid.nml
++ DATA_TABLE_FN=data_table
++ DIAG_TABLE_FN=diag_table
++ FIELD_TABLE_FN=field_table
++ FV3_NML_BASE_SUITE_FN=input.nml.FV3
++ FV3_NML_YAML_CONFIG_FN=FV3.input.yml
++ FV3_NML_BASE_ENS_FN=input.nml.base_ens
++ MODEL_CONFIG_FN=model_configure
++ NEMS_CONFIG_FN=nems.configure
++ FV3_EXEC_FN=NEMS.exe
++ WFLOW_XML_FN=FV3LAM_wflow.xml
++ GLOBAL_VAR_DEFNS_FN=var_defns.sh
++ EXTRN_MDL_ICS_VAR_DEFNS_FN=extrn_mdl_ics_var_defns.sh
++ EXTRN_MDL_LBCS_VAR_DEFNS_FN=extrn_mdl_lbcs_var_defns.sh
++ WFLOW_LAUNCH_SCRIPT_FN=launch_FV3LAM_wflow.sh
++ WFLOW_LAUNCH_LOG_FN=log.launch_FV3LAM_wflow
++ DATE_FIRST_CYCL=20190615
++ DATE_LAST_CYCL=20190615
++ CYCL_HRS=00
++ FCST_LEN_HRS=48
++ EXTRN_MDL_NAME_ICS=FV3GFS
++ EXTRN_MDL_NAME_LBCS=FV3GFS
++ LBC_SPEC_INTVL_HRS=6
++ FV3GFS_FILE_FMT_ICS=grib2
++ FV3GFS_FILE_FMT_LBCS=grib2
++ NOMADS=FALSE
++ NOMADS_file_type=nemsio
++ USE_USER_STAGED_EXTRN_FILES=TRUE
++ EXTRN_MDL_SOURCE_BASEDIR_ICS=/global/homes/h/htan2013/UFS_ICBC
++ EXTRN_MDL_FILES_ICS=gfs.pgrb2.0p25.f000
++ EXTRN_MDL_SOURCE_BASEDIR_LBCS=/global/homes/h/htan2013/UFS_ICBC
++ EXTRN_MDL_FILES_LBCS=("gfs.pgrb2.0p25.f006" "gfs.pgrb2.0p25.f012" "gfs.pgrb2.0p25.f018" "gfs.pgrb2.0p25.f024" "gfs.pgrb2.0p25.f030" "gfs.pgrb2.0p25.f036" "gfs.pgrb2.0p25.f042" "gfs.pgrb2.0p25.f048")
++ CCPP_PHYS_SUITE=FV3_GFS_v15p2
++ GRID_GEN_METHOD=ESGgrid
++ GFDLgrid_LON_T6_CTR=
++ GFDLgrid_LAT_T6_CTR=
++ GFDLgrid_RES=
++ GFDLgrid_STRETCH_FAC=
++ GFDLgrid_REFINE_RATIO=
++ GFDLgrid_ISTART_OF_RGNL_DOM_ON_T6G=
++ GFDLgrid_IEND_OF_RGNL_DOM_ON_T6G=
++ GFDLgrid_JSTART_OF_RGNL_DOM_ON_T6G=
++ GFDLgrid_JEND_OF_RGNL_DOM_ON_T6G=
++ GFDLgrid_USE_GFDLgrid_RES_IN_FILENAMES=
++ ESGgrid_LON_CTR=-97.5
++ ESGgrid_LAT_CTR=38.55
++ ESGgrid_DELX=25000.0
++ ESGgrid_DELY=25000.0
++ ESGgrid_NX=202
++ ESGgrid_NY=117
++ ESGgrid_WIDE_HALO_WIDTH=6
++ DT_ATMOS=300
++ LAYOUT_X=5
++ LAYOUT_Y=2
++ BLOCKSIZE=40
++ QUILTING=TRUE
++ PRINT_ESMF=FALSE
++ WRTCMP_write_groups=1
++ WRTCMP_write_tasks_per_group=2
++ WRTCMP_output_grid=lambert_conformal
++ WRTCMP_cen_lon=-97.5
++ WRTCMP_cen_lat=38.52
++ WRTCMP_lon_lwr_left=-121.34461033
++ WRTCMP_lat_lwr_left=22.95987123
++ WRTCMP_lon_upr_rght=
++ WRTCMP_lat_upr_rght=
++ WRTCMP_dlon=
++ WRTCMP_dlat=
++ WRTCMP_stdlat1=38.52
++ WRTCMP_stdlat2=38.52
++ WRTCMP_nx=201
++ WRTCMP_ny=115
++ WRTCMP_dx=25000.0
++ WRTCMP_dy=25000.0
++ PREDEF_GRID_NAME=RRFS_CONUS_25km
++ PREEXISTING_DIR_METHOD=rename
++ VERBOSE=TRUE
++ RUN_TASK_MAKE_GRID=TRUE
++ GRID_DIR=/global/cscratch1/sd/htan2013/UFS_TestCase/TEST_CONUS_25km_GFSv15p2/grid
++ RUN_TASK_MAKE_OROG=TRUE
++ OROG_DIR=/global/cscratch1/sd/htan2013/UFS_TestCase/TEST_CONUS_25km_GFSv15p2/orog
++ RUN_TASK_MAKE_SFC_CLIMO=TRUE
++ SFC_CLIMO_DIR=/global/cscratch1/sd/htan2013/UFS_TestCase/TEST_CONUS_25km_GFSv15p2/sfc_climo
++ SFC_CLIMO_FIELDS=("facsf" "maximum_snow_albedo" "slope_type" "snowfree_albedo" "soil_type" "substrate_temperature" "vegetation_greenness" "vegetation_type")
++ FIXgsm=/global/homes/h/htan2013/UFS_Static/fix_am
++ TOPO_DIR=/global/homes/h/htan2013/UFS_Static/fix_orog
++ SFC_CLIMO_INPUT_DIR=/global/homes/h/htan2013/UFS_Static/sfc_climo
++ FNGLAC=global_glacier.2x2.grb
++ FNMXIC=global_maxice.2x2.grb
++ FNTSFC=RTGSST.1982.2012.monthly.clim.grb
++ FNSNOC=global_snoclim.1.875.grb
++ FNZORC=igbp
++ FNAISC=CFSR.SEAICE.1982.2012.monthly.clim.grb
++ FNSMCC=global_soilmgldas.t126.384.190.grb
++ FNMSKH=seaice_newland.grb
++ FIXgsm_FILES_TO_COPY_TO_FIXam=("global_glacier.2x2.grb" "global_maxice.2x2.grb" "RTGSST.1982.2012.monthly.clim.grb" "global_snoclim.1.875.grb" "CFSR.SEAICE.1982.2012.monthly.clim.grb" "global_soilmgldas.t126.384.190.grb" "seaice_newland.grb" "global_climaeropac_global.txt" "fix_co2_proj/global_co2historicaldata_2010.txt" "fix_co2_proj/global_co2historicaldata_2011.txt" "fix_co2_proj/global_co2historicaldata_2012.txt" "fix_co2_proj/global_co2historicaldata_2013.txt" "fix_co2_proj/global_co2historicaldata_2014.txt" "fix_co2_proj/global_co2historicaldata_2015.txt" "fix_co2_proj/global_co2historicaldata_2016.txt" "fix_co2_proj/global_co2historicaldata_2017.txt" "fix_co2_proj/global_co2historicaldata_2018.txt" "global_co2historicaldata_glob.txt" "co2monthlycyc.txt" "global_h2o_pltc.f77" "global_hyblev.l65.txt" "global_zorclim.1x1.grb" "global_sfc_emissivity_idx.txt" "global_solarconstant_noaa_an.txt" "ozprdlos_2015_new_sbuvO3_tclm15_nuchem.f77")
++ FV3_NML_VARNAME_TO_FIXam_FILES_MAPPING=("FNGLAC | global_glacier.2x2.grb" "FNMXIC | global_maxice.2x2.grb" "FNTSFC | RTGSST.1982.2012.monthly.clim.grb" "FNSNOC | global_snoclim.1.875.grb" "FNAISC | CFSR.SEAICE.1982.2012.monthly.clim.grb" "FNSMCC | global_soilmgldas.t126.384.190.grb" "FNMSKH | seaice_newland.grb")
++ FV3_NML_VARNAME_TO_SFC_CLIMO_FIELD_MAPPING=("FNALBC  | snowfree_albedo" "FNALBC2 | facsf" "FNTG3C  | substrate_temperature" "FNVEGC  | vegetation_greenness" "FNVETC  | vegetation_type" "FNSOTC  | soil_type" "FNVMNC  | vegetation_greenness" "FNVMXC  | vegetation_greenness" "FNSLPC  | slope_type" "FNABSC  | maximum_snow_albedo")
++ CYCLEDIR_LINKS_TO_FIXam_FILES_MAPPING=("aerosol.dat                | global_climaeropac_global.txt" "co2historicaldata_2010.txt | fix_co2_proj/global_co2historicaldata_2010.txt" "co2historicaldata_2011.txt | fix_co2_proj/global_co2historicaldata_2011.txt" "co2historicaldata_2012.txt | fix_co2_proj/global_co2historicaldata_2012.txt" "co2historicaldata_2013.txt | fix_co2_proj/global_co2historicaldata_2013.txt" "co2historicaldata_2014.txt | fix_co2_proj/global_co2historicaldata_2014.txt" "co2historicaldata_2015.txt | fix_co2_proj/global_co2historicaldata_2015.txt" "co2historicaldata_2016.txt | fix_co2_proj/global_co2historicaldata_2016.txt" "co2historicaldata_2017.txt | fix_co2_proj/global_co2historicaldata_2017.txt" "co2historicaldata_2018.txt | fix_co2_proj/global_co2historicaldata_2018.txt" "co2historicaldata_glob.txt | global_co2historicaldata_glob.txt" "co2monthlycyc.txt          | co2monthlycyc.txt" "global_h2oprdlos.f77       | global_h2o_pltc.f77" "global_zorclim.1x1.grb     | global_zorclim.1x1.grb" "sfc_emissivity_idx.txt     | global_sfc_emissivity_idx.txt" "solarconstant_noaa_an.txt  | global_solarconstant_noaa_an.txt" "global_o3prdlos.f77        | ozprdlos_2015_new_sbuvO3_tclm15_nuchem.f77")
++ MAKE_GRID_TN=make_grid
++ MAKE_OROG_TN=make_orog
++ MAKE_SFC_CLIMO_TN=make_sfc_climo
++ GET_EXTRN_ICS_TN=get_extrn_ics
++ GET_EXTRN_LBCS_TN=get_extrn_lbcs
++ MAKE_ICS_TN=make_ics
++ MAKE_LBCS_TN=make_lbcs
++ RUN_FCST_TN=run_fcst
++ RUN_POST_TN=run_post
++ NNODES_MAKE_GRID=1
++ NNODES_MAKE_OROG=1
++ NNODES_MAKE_SFC_CLIMO=2
++ NNODES_GET_EXTRN_ICS=1
++ NNODES_GET_EXTRN_LBCS=1
++ NNODES_MAKE_ICS=4
++ NNODES_MAKE_LBCS=4
++ NNODES_RUN_FCST=1
++ NNODES_RUN_POST=2
++ PPN_MAKE_GRID=24
++ PPN_MAKE_OROG=24
++ PPN_MAKE_SFC_CLIMO=24
++ PPN_GET_EXTRN_ICS=1
++ PPN_GET_EXTRN_LBCS=1
++ PPN_MAKE_ICS=12
++ PPN_MAKE_LBCS=12
++ PPN_RUN_FCST=24
++ PPN_RUN_POST=24
++ WTIME_MAKE_GRID=00:20:00
++ WTIME_MAKE_OROG=00:20:00
++ WTIME_MAKE_SFC_CLIMO=00:20:00
++ WTIME_GET_EXTRN_ICS=00:45:00
++ WTIME_GET_EXTRN_LBCS=00:45:00
++ WTIME_MAKE_ICS=00:30:00
++ WTIME_MAKE_LBCS=00:30:00
++ WTIME_RUN_FCST=06:00:00
++ WTIME_RUN_POST=00:15:00
++ MAXTRIES_MAKE_GRID=1
++ MAXTRIES_MAKE_OROG=1
++ MAXTRIES_MAKE_SFC_CLIMO=1
++ MAXTRIES_GET_EXTRN_ICS=1
++ MAXTRIES_GET_EXTRN_LBCS=1
++ MAXTRIES_MAKE_ICS=1
++ MAXTRIES_MAKE_LBCS=1
++ MAXTRIES_RUN_FCST=1
++ MAXTRIES_RUN_POST=1
++ USE_CUSTOM_POST_CONFIG_FILE=FALSE
++ CUSTOM_POST_CONFIG_FP=
++ DO_ENSEMBLE=FALSE
++ NUM_ENS_MEMBERS=1
++ DO_SHUM=FALSE
++ DO_SPPT=FALSE
++ DO_SKEB=FALSE
++ SHUM_MAG=-999.0
++ SHUM_LSCALE=150000
++ SHUM_TSCALE=21600
++ SHUM_INT=3600
++ SPPT_MAG=-999.0
++ SPPT_LSCALE=150000
++ SPPT_TSCALE=21600
++ SPPT_INT=3600
++ SKEB_MAG=-999.0
++ SKEB_LSCALE=150000
++ SKEB_TSCALE=21600
++ SKEB_INT=3600
++ SKEB_VDOF=10
++ USE_ZMTNBLCK=false
++ HALO_BLEND=10
++ USE_FVCOM=FALSE
++ FVCOM_DIR=/user/defined/dir/to/fvcom/data
++ FVCOM_FILE=fvcom.nc
++ COMPILER=intel
++ WFLOW_LAUNCH_SCRIPT_FP=/global/u2/h/htan2013/ufs-srweather-app/regional_workflow/ush/launch_FV3LAM_wflow.sh
++ WFLOW_LAUNCH_LOG_FP=/global/cscratch1/sd/htan2013/UFS_TestCase/TEST_CONUS_25km_GFSv15p2/log.launch_FV3LAM_wflow
++ CRONTAB_LINE=
++ SR_WX_APP_TOP_DIR=/global/u2/h/htan2013/ufs-srweather-app
++ HOMErrfs=/global/u2/h/htan2013/ufs-srweather-app/regional_workflow
++ USHDIR=/global/u2/h/htan2013/ufs-srweather-app/regional_workflow/ush
++ SCRIPTSDIR=/global/u2/h/htan2013/ufs-srweather-app/regional_workflow/scripts
++ JOBSDIR=/global/u2/h/htan2013/ufs-srweather-app/regional_workflow/jobs
++ SORCDIR=/global/u2/h/htan2013/ufs-srweather-app/regional_workflow/sorc
++ SRC_DIR=/global/u2/h/htan2013/ufs-srweather-app/src
++ PARMDIR=/global/u2/h/htan2013/ufs-srweather-app/regional_workflow/parm
++ MODULES_DIR=/global/u2/h/htan2013/ufs-srweather-app/regional_workflow/modulefiles
++ EXECDIR=/global/u2/h/htan2013/ufs-srweather-app/bin
++ FIXrrfs=/global/u2/h/htan2013/ufs-srweather-app/regional_workflow/fix
++ FIXam=/global/cscratch1/sd/htan2013/UFS_TestCase/TEST_CONUS_25km_GFSv15p2/fix_am
++ FIXLAM=/global/cscratch1/sd/htan2013/UFS_TestCase/TEST_CONUS_25km_GFSv15p2/fix_lam
++ FIXgsm=/global/homes/h/htan2013/UFS_Static/fix_am
++ COMROOT=
++ COMOUT_BASEDIR=
++ TEMPLATE_DIR=/global/u2/h/htan2013/ufs-srweather-app/regional_workflow/ush/templates
++ UFS_WTHR_MDL_DIR=/global/u2/h/htan2013/ufs-srweather-app/src/ufs_weather_model
++ UFS_UTILS_DIR=/global/u2/h/htan2013/ufs-srweather-app/src/UFS_UTILS
++ SFC_CLIMO_INPUT_DIR=/global/homes/h/htan2013/UFS_Static/sfc_climo
++ TOPO_DIR=/global/homes/h/htan2013/UFS_Static/fix_orog
++ EMC_POST_DIR=/global/u2/h/htan2013/ufs-srweather-app/src/EMC_post
++ EXPTDIR=/global/cscratch1/sd/htan2013/UFS_TestCase/TEST_CONUS_25km_GFSv15p2
++ LOGDIR=/global/cscratch1/sd/htan2013/UFS_TestCase/TEST_CONUS_25km_GFSv15p2/log
++ CYCLE_BASEDIR=/global/cscratch1/sd/htan2013/UFS_TestCase/TEST_CONUS_25km_GFSv15p2
++ GRID_DIR=/global/cscratch1/sd/htan2013/UFS_TestCase/TEST_CONUS_25km_GFSv15p2/grid
++ OROG_DIR=/global/cscratch1/sd/htan2013/UFS_TestCase/TEST_CONUS_25km_GFSv15p2/orog
++ SFC_CLIMO_DIR=/global/cscratch1/sd/htan2013/UFS_TestCase/TEST_CONUS_25km_GFSv15p2/sfc_climo
++ NDIGITS_ENSMEM_NAMES=0
++ ENSMEM_NAMES=("")
++ FV3_NML_ENSMEM_FPS=("")
++ GLOBAL_VAR_DEFNS_FP=/global/cscratch1/sd/htan2013/UFS_TestCase/TEST_CONUS_25km_GFSv15p2/var_defns.sh
++ DATA_TABLE_TMPL_FN=data_table
++ DIAG_TABLE_TMPL_FN=diag_table.FV3_GFS_v15p2
++ FIELD_TABLE_TMPL_FN=field_table.FV3_GFS_v15p2
++ MODEL_CONFIG_TMPL_FN=model_configure
++ NEMS_CONFIG_TMPL_FN=nems.configure
++ DATA_TABLE_TMPL_FP=/global/u2/h/htan2013/ufs-srweather-app/regional_workflow/ush/templates/data_table
++ DIAG_TABLE_TMPL_FP=/global/u2/h/htan2013/ufs-srweather-app/regional_workflow/ush/templates/diag_table.FV3_GFS_v15p2
++ FIELD_TABLE_TMPL_FP=/global/u2/h/htan2013/ufs-srweather-app/regional_workflow/ush/templates/field_table.FV3_GFS_v15p2
++ FV3_NML_BASE_SUITE_FP=/global/u2/h/htan2013/ufs-srweather-app/regional_workflow/ush/templates/input.nml.FV3
++ FV3_NML_YAML_CONFIG_FP=/global/u2/h/htan2013/ufs-srweather-app/regional_workflow/ush/templates/FV3.input.yml
++ FV3_NML_BASE_ENS_FP=/global/cscratch1/sd/htan2013/UFS_TestCase/TEST_CONUS_25km_GFSv15p2/input.nml.base_ens
++ MODEL_CONFIG_TMPL_FP=/global/u2/h/htan2013/ufs-srweather-app/regional_workflow/ush/templates/model_configure
++ NEMS_CONFIG_TMPL_FP=/global/u2/h/htan2013/ufs-srweather-app/regional_workflow/ush/templates/nems.configure
++ CCPP_PHYS_SUITE_FN=suite_FV3_GFS_v15p2.xml
++ CCPP_PHYS_SUITE_IN_CCPP_FP=/global/u2/h/htan2013/ufs-srweather-app/src/ufs_weather_model/FV3/ccpp/suites/suite_FV3_GFS_v15p2.xml
++ CCPP_PHYS_SUITE_FP=/global/cscratch1/sd/htan2013/UFS_TestCase/TEST_CONUS_25km_GFSv15p2/suite_FV3_GFS_v15p2.xml
++ DATA_TABLE_FP=/global/cscratch1/sd/htan2013/UFS_TestCase/TEST_CONUS_25km_GFSv15p2/data_table
++ FIELD_TABLE_FP=/global/cscratch1/sd/htan2013/UFS_TestCase/TEST_CONUS_25km_GFSv15p2/field_table
++ FV3_NML_FN=input.nml
++ FV3_NML_FP=/global/cscratch1/sd/htan2013/UFS_TestCase/TEST_CONUS_25km_GFSv15p2/input.nml
++ NEMS_CONFIG_FP=/global/cscratch1/sd/htan2013/UFS_TestCase/TEST_CONUS_25km_GFSv15p2/nems.configure
++ FV3_EXEC_FP=/global/u2/h/htan2013/ufs-srweather-app/bin/NEMS.exe
++ LOAD_MODULES_RUN_TASK_FP=/global/u2/h/htan2013/ufs-srweather-app/regional_workflow/ush/load_modules_run_task.sh
++ GTYPE=regional
++ TILE_RGNL=7
++ NH0=0
++ NH3=3
++ NH4=4
++ LON_CTR=-97.5
++ LAT_CTR=38.55
++ NX=202
++ NY=117
++ NHW=6
++ STRETCH_FAC=0.999
++ RES_IN_FIXLAM_FILENAMES=
++ CRES=C403
++ DEL_ANGLE_X_SG=0.1124152007
++ DEL_ANGLE_Y_SG=0.1124152007
++ NEG_NX_OF_DOM_WITH_WIDE_HALO=-214
++ NEG_NY_OF_DOM_WITH_WIDE_HALO=-129
++ OZONE_PARAM=ozphys_2015
++ EXTRN_MDL_SYSBASEDIR_ICS=
++ EXTRN_MDL_SYSBASEDIR_LBCS=
++ EXTRN_MDL_LBCS_OFFSET_HRS=0
++ LBC_SPEC_FCST_HRS=(6 12 18 24 30 36 42 48)
++ NUM_CYCLES=1
++ ALL_CDATES=("2019061500")
++ USE_FVCOM=FALSE
++ FVCOM_DIR=/user/defined/dir/to/fvcom/data
++ FVCOM_FILE=fvcom.nc
++ NCORES_PER_NODE=
++ PE_MEMBER01=12
++ RUN_CMD_FCST='mpirun -np 12'
+ export CDATE=2019061500
+ CDATE=2019061500
+ export CYCLE_DIR=/global/cscratch1/sd/htan2013/UFS_TestCase/TEST_CONUS_25km_GFSv15p2/2019061500
+ CYCLE_DIR=/global/cscratch1/sd/htan2013/UFS_TestCase/TEST_CONUS_25km_GFSv15p2/2019061500
+ /global/u2/h/htan2013/ufs-srweather-app/regional_workflow/jobs/JREGIONAL_MAKE_SFC_CLIMO
Module Deprecation Warning: upgrade to 3.21.3 (new default on 2021-09-21)
gcc/8.3.0(3):ERROR:150: Module 'gcc/8.3.0' conflicts with the currently loaded module(s) 'gcc/9.3.0'
gcc/8.3.0(3):ERROR:102: Tcl command execution failed: conflict gcc

Warning: Python module not loaded, you already have Python loaded via conda init

========================================================================
Entering script:  "JREGIONAL_MAKE_SFC_CLIMO"
In directory:     "/global/u2/h/htan2013/ufs-srweather-app/regional_workflow/jobs"

This is the J-job script for the task that generates surface fields from
climatology.
========================================================================
Module Deprecation Warning: upgrade to 3.21.3 (new default on 2021-09-21)
gcc/8.3.0(3):ERROR:150: Module 'gcc/8.3.0' conflicts with the currently loaded module(s) 'gcc/9.3.0'
gcc/8.3.0(3):ERROR:102: Tcl command execution failed: conflict gcc

Warning: Python module not loaded, you already have Python loaded via conda init

========================================================================
Entering script:  "exregional_make_sfc_climo.sh"
In directory:     "/global/u2/h/htan2013/ufs-srweather-app/regional_workflow/scripts"

This is the ex-script for the task that generates surface fields from
climatology.
========================================================================

The arguments to the script in file

  "/global/u2/h/htan2013/ufs-srweather-app/regional_workflow/scripts/exregional_make_sfc_climo.sh"

have been set as follows:

  declare -- workdir="/global/cscratch1/sd/htan2013/UFS_TestCase/TEST_CONUS_25km_GFSv15p2/sfc_climo/tmp"
 - INITIALIZE ESMF
 - CALL VMGetGlobal
 - CALL VMGet
 - NPETS IS             1
 - LOCAL PET            0
 - READ SETUP NAMELIST, LOCALPET:            0
 - OPEN MODEL GRID MOSAIC FILE: /global/cscratch1/sd/htan2013/UFS_TestCase/TEST_CONUS_25km_GFSv15p2/fix_lam/C403_mosaic.halo4.nc
 - READ NUMBER OF TILES
 - READ TILE NAMES
 - NUMBER OF TILES, MODEL GRID IS            1
 - OPEN FIRST MODEL GRID OROGRAPHY FILE: /global/cscratch1/sd/htan2013/UFS_TestCase/TEST_CONUS_25km_GFSv15p2/fix_lam/C403_oro_data.tile7.halo4.nc
 - READ GRID DIMENSIONS
 - I/J DIMENSIONS OF THE MODEL GRID TILES          210         125
 - CALL GridCreateMosaic FOR MODEL GRID
 - CALL FieldCreate FOR DATA INTERPOLATED TO MODEL GRID.
 - CALL FieldCreate FOR VEGETATION TYPE INTERPOLATED TO MODEL GRID.
 - CALL FieldCreate FOR MODEL GRID LATITUDE.
 - CALL FieldCreate FOR MODEL GRID LONGITUDE.
 - CALL FieldCreate FOR MODEL GRID LANDMASK.
 - CALL FieldGet FOR MODEL GRID LANDMASK.
 - READ MODEL OROGRAPHY FILE
 - OPEN FILE: /global/cscratch1/sd/htan2013/UFS_TestCase/TEST_CONUS_25km_GFSv15p2/fix_lam/C403_oro_data.tile7.halo4.nc
 - READ I-DIMENSION
 - READ J-DIMENSION
 - I/J DIMENSIONS:          210         125
 - READ LAND MASK (SLMSK)
 - READ LATITUDE
 - READ LONGITUDE
 - CALL FieldScatter FOR MODEL GRID MASK. TILE IS:            1
 - CALL FieldScatter FOR MODEL LATITUDE. TILE IS:            1
 - CALL FieldScatter FOR MODEL LONGITUDE. TILE IS:            1
 - CALL GridAddItem FOR MODEL GRID.
 - CALL GridGetItem FOR MODEL GRID.
 - OPEN FILE: /global/homes/h/htan2013/UFS_Static/sfc_climo/vegetation_type.igbp.0.05.nc
 
 FATAL ERROR: OPENING INPUT SOURCE FILE: NetCDF: HDF error
 STOP.
Rank 0 [Mon Oct 11 17:21:21 2021] [c1-1c0s5n0] application called MPI_Abort(MPI_COMM_WORLD, 999) - process 0

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
#0  0x2aaaad18649f in ???
#1  0x2aaaad186420 in ???
#2  0x2aaaad187a00 in ???
#3  0x2aaaac710267 in ???
#4  0x2aaaac61d5e5 in ???
#5  0x2aaaac664a84 in ???
#6  0x2008fc85 in __utils_MOD_netcdf_err
    at /global/homes/h/htan2013/ufs-srweather-app/src/UFS_UTILS/sorc/sfc_climo_gen.fd/utils.f90:26
#7  0x2008d3aa in __source_grid_MOD_define_source_grid
    at /global/homes/h/htan2013/ufs-srweather-app/src/UFS_UTILS/sorc/sfc_climo_gen.fd/source_grid.F90:117
#8  0x20083a56 in driver
    at /global/homes/h/htan2013/ufs-srweather-app/src/UFS_UTILS/sorc/sfc_climo_gen.fd/driver.F90:79
#9  0x20080e46 in main
    at /global/homes/h/htan2013/ufs-srweather-app/src/UFS_UTILS/sorc/sfc_climo_gen.fd/driver.F90:23
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 213124 on node nid02516 exited on signal 6 (Aborted).
--------------------------------------------------------------------------

ERROR:
  From script:  "exregional_make_sfc_climo.sh"
  Full path to script:  "/global/u2/h/htan2013/ufs-srweather-app/regional_workflow/scripts/exregional_make_sfc_climo.sh"
Call to executable (exec_fp) to generate surface climatology files returned
with nonzero exit code:
  exec_fp = "/global/u2/h/htan2013/ufs-srweather-app/bin/sfc_climo_gen"
Exiting with nonzero status.

ERROR:
  From script:  "JREGIONAL_MAKE_SFC_CLIMO"
  Full path to script:  "/global/u2/h/htan2013/ufs-srweather-app/regional_workflow/jobs/JREGIONAL_MAKE_SFC_CLIMO"
Call to ex-script corresponding to J-job "JREGIONAL_MAKE_SFC_CLIMO" failed.
Exiting with nonzero status.
 

The problem seems relate to NetCDF / HDF5 file locking, try to set environment variable:

export HDF5_USE_FILE_LOCKING=FALSE

Thanks,

Linlin

This error is occurring from within ESMF, which is not used for the grid or orography steps, so that's probably why NetCDF was not an issue with those tasks.  Hopefully Linlin's tip works.  Otherwise, I would check to make sure your version of ESMF was built and is being loaded with the same NetCDF/HDF5 that you used for the grid/orog tasks.

Hi Jbeck,

 

I also have a small question regarding the MPI.

In the config.sh, the "RUN_CMD_UTILS="mpirun -np 4" controls the MPI-enabled pre-processing utilities, i.e. the number of processors. 

However, the sbatch job script also can control the MPI setting. Which one should I use?

If I set RUN_CMD_UTILS="mpirun -np 4, do I also need to set n_nodes and n_tasks in sbatch (slurm) job script?

 

Thanks,

Haochen

Hi Haochen, sorry for the delay as I have been out of the office.

How you use the Linux capability without a workflow manager has some flexibility to it. The scripts provided in ufs-srweather-app/regional_workflow/ush/wrappers are designed to be run "out-of-the-box" on a Linux machine/cluster with no batch submission system. However, they can be modified to be used with any batch submission system, provided you create and/or modify the proper wrapper scripts. The "sq_job.sh" script is provided as an example of such a script, but honestly for a new machine that you are familiar with submitting jobs on, it is probably best to simply modify the existing stand-alone scripts (for the sfc_climo step you mentioned above, this would be run_make_sfc_climo.sh) to include the proper SBATCH directives at the beginning.

When not using a workflow manager, the run command that will be used for each step's executable is set by one of three "RUN_CMD" variables: RUN_CMD_UTILS (for all pre-processing), RUN_CMD_FCST (for the weather model executable), and RUN_CMD_POST (for the post executable). By default, RUN_CMD_UTILS and RUN_CMD_POST are set to `mpirun -np 1`. These variables are required because there is a calling tree to the workflow for each task: wrapper --> "j-job" (scripts in regional_workflow/jobs) --> "ex-script" (scripts in regional_workflow/scripts). Ideally you should be able to set everything you need in the config.sh file when generating the workflow, and so these variables are used to avoid having to modify any of the lower script layers for your specific machine. Therefore, the "RUN_CMD" variables are fully user-specifiable in order to keep as much flexibility as possible: this is especially important because some batch systems require a full mpirun or similar command (specifying the number of processors within the run command, e.g.) while others simply have a generic run command.

To give concrete examples based on the batch submission systems I am familiar with: the NCAR Cheyenne machine uses PBS, which requires that each submitted job look similar to this:

#! /bin/sh
#PBS -A account_name
#PBS -q queue
#PBS -l select=1:mpiprocs=24:ncpus=24
#PBS -l walltime=00:20:00
#PBS -N job_name
#PBS -j oe -o logfile_path.log

mpirun -np 24 executable_name.exe

Note that in this case, we had to include processor/node information both in the PBS directives and in the run command. So for a machine like this and this example task, RUN_CMD_UTILS should be `mpirun -np 24`.

On the other hand, it sounds like your machine uses Slurm, which often (though not always) has a generic run command, getting processor information from the antecedent SBATCH directives:

#! /bin/sh
#SBATCH --account=account_name
#SBATCH --qos=queue
#SBATCH --partition=partition
#SBATCH --nodes=2-2
#SBATCH --tasks-per-node=24
#SBATCH -t 00:20:00
#SBATCH --job-name=job_name
#SBATCH -o logfile_path.log

srun executable_name.exe

If your machine is indeed like the second example, then you should set all of the RUN_CMD_* variables to "srun" (or whatever your machine's Slurm run command is) and include the appropriate mpi/node/processor settings in the SBATCH directives. This will mean that ultimately, within the ex-script for that particular job, the executable will be called with whatever command you provided with RUN_CMD_UTILS, and the MPI settings will be taken from the SBATCH directives.

Now, all that said, this will get more complicated when you arrive at the forecast model step, because unfortunately the model is designed in such a way that MPI options are passed in via a namelist, so if your namelist settings (created when you generated the workflow) are not consistent with the MPI settings you submit with, the model may perform poorly or even crash. That's a bridge we can cross when you get to it; please follow up with a response if you need assistance upon reaching that step.

Let me know how this all goes, or if you have more questions!

 

-Mike

 

Permalink

In reply to by kavulich

Hi Mike,

Thank you so much for your reply!

I have used both PBS on Cheyenne and SLURM on Cori, NERSC. 

I have two small questions:

1) Are the options like NNODES_*="1" for workflow manager only or it can be used without workflow manager?

2) If I want to use 2 nodes with 24 processors on each node for run_make_sfc_climo.sh, should I set in config.sh and sbatch file?

config.sh

RUN_CMD_UTILS="srun -n 24"  

NNODES_MAKE_SFC_CLIMO="2"

PPN_MAKE_SFC_CLIMO="12"

 

Sbatch file:

#/bin/sh
#SBATCH --account=account_name
#SBATCH --qos=queue
#SBATCH --partition=partition
#SBATCH --nodes=2
#SBATCH --tasks-per-node=24
#SBATCH -t 00:20:00
#SBATCH --job-name=job_name
#SBATCH -o logfile_path.log

srun -n 48 ./run_make_grid.sh   

Should this step have srun? In sq_job, it does not have srun, just ./run_make_grid.sh.

The reason that I am asking this is that if I set srun in here, I feel like I am using srun twice, in here and config.sh.

 

Many thanks in advance,

Haochen

Haochen,

 

1. That is correct, the NNODES_ settings are currently only used for pre-defined platforms with a workflow manager, and will not affect running the scripts stand-alone (as you are trying to do)

2. In this case, you will want to use RUN_CMD_UTILS="srun -n 48" if you want 48 MPI tasks. When calling the script, you should not invoke srun: you are correct that this would probably cause problems with calling srun twice. So your job script should look like this:

Sbatch file:

#/bin/sh
#SBATCH --account=account_name
#SBATCH --qos=queue
#SBATCH --partition=partition
#SBATCH --nodes=2
#SBATCH --tasks-per-node=24
#SBATCH -t 00:20:00
#SBATCH --job-name=job_name
#SBATCH -o logfile_path.log

./run_make_grid.sh   

It is possible that your machine will have different settings that may complicate this, but from the platforms I am familiar with that should work.

 

The NNODES_MAKE_SFC_CLIMO and PPN_MAKE_SFC_CLIMO settings will have no effect 

Attach Files