WRF-CMAQ v5.3.2 Compiler Test
Following the directions available on
https://github.com/kmfoley/CMAQ/blob/v532_20200702/DOCS/Users_Guide/Tutorials/CMAQ_UG_tutorial_WRF-CMAQ_build_gcc.md
Made some modifications to the tutorial. They are located under.
https://github.com/lizadams/CMAQ/blob/patch-3/DOCS/Users_Guide/Tutorials/CMAQ_UG_tutorial_WRF-CMAQ_build_gcc.md
or
https://github.com/lizadams/CMAQ/blob/master/DOCS/Users_Guide/Tutorials/CMAQ_UG_tutorial_WRF-CMAQ_build_gcc.md
I will combine and submit a pull request to Kristen when the edits are complete.
I was able to get the debug version to run without a floating point error by changing
Debug Flag in configure.wrf using the following
FCOPTIM = -O0
FCDEBUG = -g $(FCNOOPT)
Modifying FCOPTIM in this way, means that I don't need to edit the Makefile.twoway in the cmaq code.
When I used the following debug flags the floating point error stopped the run as before (almost before it got started)
- -ggdb -fbacktrace -fcheck=bounds,do,mem,pointer -ffpe-trap=invalid,zero,overflow
The Optimized version of WRF-CMAQv5.3.2 does not match the debug version. I ran WRFv4.1.1-CMAQv5.3.2 with the following option
setenv RUN_CMAQ_DRIVER F # [F]
This confirmed that the WRF output does not change using the optimized -O2 option of WRF.
FCOPTIM = -O2 -ftree-vectorize -funroll-loops
Optimized Flag in configure.wrf
FCOPTIM = -O2 -ftree-vectorize -funroll-loops
The plots are available here:
The debug version matches Fahim's results for the no-feedback case.
Click on the following link and then click submit. You should be able to scroll thru the variables.
https://dataviewer-dept-cempd.cloudapps.unc.edu/index.cfm?back_address=/WRFv4.1.1-CMAQv5.3.2/fahim_debug_16pe_nf/plots/compiler_sens/base_nf/layer1_only
The debug 16 pe version matches Fahim's results for the short-wave feedback case. (#pe dependent)
https://dataviewer-dept-cempd.cloudapps.unc.edu/index.cfm?back_address=/WRFv4.1.1-CMAQv5.3.2/fahim_debug_16pe_sf/plots/compiler_sens/base_sf/layer1_only
Side by side comparison of the no feedback and shortwave feedback is available here: https://dataviewer-dept-cempd.cloudapps.unc.edu/compare.cfm?back_address2=/WRFv4.1.1-CMAQv5.3.2/fahim_debug_16pe_sf/plots/compiler_sens/base_sf/layer1_only&back_address1=/WRFv4.1.1-CMAQv5.3.2/fahim_debug_16pe_nf/plots/compiler_sens/base_nf/layer1_only
The shortwave feedback percent difference plot shows that there is a difference in output for the debug version on 16 and 32 pe. We see this with the no feed back runs, but to a lesser magnitude.
David Wong suggested that I use a -O2 or less aggressive compiler optimization to see if that helps reduce the difference between the debug and optimized versions.
I tried the following compile options
-O2 | -O1 | -Og | -O0 -g (debug) | |
---|---|---|---|---|
real | 2009.05 | 2230.82 | 2753.16 | 8090.41 |
user | 20,597.59 | 35,559.21 | 43,522.31 | 64,661.29 |
m3diff at 2016183:230000 | A:B 1.12602E+02@( 90,30, 1) -1.05625E+01@( 90,32, 1) 3.87858E+01 1.53473E+01 | A:B 4.97734E+01@( 99,65, 1) -4.92266E+01@( 99,58, 1) 4.80420E-01 5.49204E+00 | A:B 0.000 | B |
Fahim and David will take a look at the domain decomposition issues. Looking at the CO variable
Note: the debug version of WRF-CMAQ with Shortwave Feedback will not run on 8 pe in time to complete successfully in the debug queue, and can't run in the larger queues as it is too small a PE configuration.
Table of Plots Available
debug | opt | Opt&Debug vs Debug | |
---|---|---|---|
WRF-CMAQ NF | 120 | 8 | 200 |
WRF-CMAQ SF | 200 | 200 | 200 |
CMAQ | 0 | 25 (2x4) | 150 |