- REVIEW
- CODING
- General
- Fortran
- C/C++
- Scripts
This document started as the set of standards for the EPA Models-3 project, and was continued by the author while at the North Carolina Supercomputing Center, then at Baron Advanced Meteorological Systems, LLC. [BAMS], and now at UNC's Institute for the Environment. It is the software development standard under which the Models-3 I/O API, the MAQSIP air quality model, the NCSC/BAMS operational-forecast tools and models, and various other software systems were and are maintained. It is offered here for use with such other projects as may find it useful.(There are still some "dangling" BAMS references, e.g., to the Chief Architect, Chief Operations Officer, test vs production repositories, etc. These give some level of recognition to the importance of the items that reference them.)
Review processes are envisioned as a mechanism providing control and feedback into the analysis, design, implementation, testing, and use of software and data. It is expected that the scope and extent of the review processes will be highly dependent upon the size and complexity of the software being reviewed.The resources required are mostly the time of the personnel involved, and should usually amount to about 10 - 20 % of the man-hours required for the initial software development.
It should be recognized that the review processes should not only result in higher quality in the software developed, but also in the increased communication and software development expertise of the personnel involved.
At the present time, review shall require design and/or code walk through that involves the Chief Architect and the Chief Operations Officer or his designate.
Design review objectives include the following:
- faithful representation -- i. e., that design accurately addresses the requirements
- good representation:
- feasible
- modular and flexible
- technically sound
- structurally sound (good, robust design choices)
- design standards are met
- adequate test design specified.
Code review objectives include the following
- code accurately follows the design
- structurally sound (well-thought-out) implementation
- coding standards met (implies readability)
- testing performed adequately
An especially important aspect of design and code is that of interfaces.
Interfaces include:
- Shared state, e.g., a situation where an operation is split across several different subroutines (etc.) which must cooperate in order to perform that operation.
- Preconditions (what must be done before invocation) and postconditions (what is guaranteed to be true after invocation)
- The set of input and output data structures, particularly the input and output files,
- The set of environment variables required (including logical file names),
- The names of the variables read from or written to those files
- The sequence of prompts required by the program (and the meaning of the responses to those prompts).
- return status
- For
PROGRAM
s, interfaces include whatever else is required in order to write a script for the program.- For
MODULE
s, interfaces include allPUBLIC
entities.Note that it is good practice forPUBLIC
entities only to includeTYPE
s,PARAMETER
s,SUBROUTINE
s andFUNCTION
s, andPROTECTED
variables.Module variables generally should be
PRIVATE
. In a few exceptional cases, variables may bePUBLIC, PROTECTED
, in order to give read-only access to their values to code thatUSE
s the module.- For
SUBROUTINE
s andFUNCTION
s, interfaces include the arguments and theirINTENT
s, the preconditions, and the postconditions.Good design, with modularity and flexibility, is largely the art of finding that separation of software into components which yields the best and simplest set of interfaces, where each task is internally cohesive, and produces internally consistent objects that are then inputs to other processes.
Generally, interfaces should be kept as simple--and as standardized--as possible. For example:
- Minimize shared state.
Separating initialization from computation is a violation of the shared-state/cohesive-components/simple-interfaces principle.- Subsidiarity: Declare variables (etc.) at the most local scope consistent with their use.
For example, declaring aMODULE
-variable that is allocated and deallocated within a particular subroutine, and is used only in that subroutine and its callees is a violation: that variable should have been local to that subroutine.- Having multiple specialized names for multiple variables with the same functionality, for the purpose of distinguishing what program produced that variable is a violation of interface simplicity and standardization. If variables are functionally interchangeable, they ought to have the same name.
- Requiring management-oriented metadata as part of an interface is a violation of interface simplicity and component cohesion.
General Requirements
- Where possible, internal names should match external-interface names. For example, if an input file has a variable named FOO then the source code should also refer to that variable by the name FOO unless either multiple input variables named FOO are present or the name of the corresponding input variable is in fact a program-option.
- Program code must pass Chief Architect review before being accepted for the production repository.
- Production executables will be built directly from the production repository.
- Scripts must pass Chief Operations Officer review before being accepted for production use.
- For new programs, interfaces shall be documented.
- For new major programs, interfaces and program function shall be specified, and must be accepted by the Chief Architect before initiating work.
- For modified programs, interfaces and program function shall not be changed without permission of the Chief Architect.
- The purpose of the main code repository is to maintain the main code path, not, e.g., 8 different versions of the code
noahr_mod.F90.begin_week_ncar
, ...
Makefile
structure in the repository is as follows:
Makefile.template
is the repository copy. It is not used directly.Makefile
is the local (non-repository) customized copy, actually used for building codes.- Updating
Makefile.template
is a required part of a repository commit.- Updating
Makefile
is a required part of a repository checkout or update that creates/changesMakefile.template
.
- Makefile.template updates must accompany items submitted, where program-changes affect Makefiles.
General
- Code shall be written with view to readability and maintainability. Code is actually more a communications medium than merely a mechanical algorithm-description.
In particular, avoid "clutter".
- Code shall be compliant with the relevant language standards, with exceptions that may be allowed by the Chief Architect.
- Programs must build successfully under all three of: Intel, Sun, and GNU compilers before being accepted. Generally, any variance from programming language standards must be approved by the Chief Architect before acceptance.
- Models-3 standard: All modeling and tools programs must use
M3EXIT()
for termination, with exit-status following the normal UNIX conventions: see /usr/include/stdlib.h. By this convention,Fortran
- 0 means normal successful completion;
- 1 means I/O related error;
- 2 means general program-related error; and
- N > 2 allows you to classify errors for your program in detail.
STOP
is forbidden.
- Generally, program-logs should contain enough information both to diagnose any error(s) which may have occured and to reconstruct a script of the program. In particular, memory-allocation, file-opening, environment-variable evaluation, and MPI-operation status shall be checked, and failures shall be logged [I/O API routines do this automatically, for the most part.]
- Programs shall include a "splash screen" in their output/logs, that documents the following:
- Program name and function
- preconditions, and requirements (e.g., preconditions, environment variables, input and output files)
- Copyright notice
- Program version (Subversion
$
Id$
- Use mixed-case output to a log: the human-factors specialists say that WHILE ALL CAPS GETS THE READER'S ATTENTION, IT IS HARDER TO READ THAN TEXT THAT IS GIVEN IN MIXED CASE.
- Programs, modules, subroutines and functions should have header comments with the following:
Header-comment revision history will be kept up-to-date.
- The purpose of the module, subroutine, or function
- the function and logic of the algorithms,
- Literature references, if appropriate
- the purposes of major variables
- Preconditions, especially required environment variables,
- Revision history
- M3IO programs shall use
M3EXIT()
for program termination.
- Declare variables (etc.) at the most local scope consistent with their use. Also, "auto" local array variables are preferred over explicitly allocated/deallocated local arrays, whenever possible.
- Specify
INTENT
for subroutine and function arguments (either explicitly, for Fortran, or in the descriptive comments for C and other languages).
- Avoid multiple (> 2) levels of definitions, recursion, script-invocation, and other nested structures: the cognitive psychologyists say this greatly decreases understandability of the code.
For this purpose, "perfect" loop-nests may be regarded as a single control structure; indentation shall reflect this:DO R = 1, NROWS DO C = 1, NCOLS ... END DO END DO- Do not make comments that merely echo the content of program source code.
For example, the following is improper:SUBROUTINE INITCHAN( OUTSTEP, SDATE, STIME, OVLSTEP ) !! begin initchan ... RETURN END SUBROUTINE INITCHAN !! end initchan- File-names:
- File and directory names of the form core or core.* are strictly forbidden.
- lib is only used for directories containing binary-library
.a
and/or.so
files.
- File extensions are:
.c
C source.C
C++ source.csh
C-shell/TCSH source.ksh
Korn-shell source.sh
Bourne-shell/BASH source.f
fixed-source-form Fortran.f90
free-source-form Fortran.F
fixed-source-form Fortran requiring pre-processing.F90
free-source-form Fortran requiring pre-processing.pl
PERL source
- For Fortran MODULEs, the basename of the source file is the lower-case of the
MODULE
-name (e.g.,MODULE MODFOO
shall be in source file modfoo.f or modfoo.f90).
- Only code that compiles correctly may go into the repository.
- For each repository directory, the code version contained shall be unique. For multiple versions of an executable, you should use svn branches.
- For any svn add or svn delete, make corresponding changes to the master
Makefile.template
before svn commit.
- No one but the Chief Architect may change the source code in ioapi and m3tools.
- Indentation style shall be consistent across entire source files.
- All I/O and memory-allocation operations (and other operations that may potentially succeed or fail, e.g., environment-variable calls, MPI operations) shall check the success/failure status of the indicated operation, and shall log failures, preferably with the error-status value. (For M3IO READ3(), etc., the log-operation is done automatically; still, however, success/failure status shall be checked, and the program code shall act accordingly.)
Such log messages shall indicate the situation in adequate detail for post-mortem diagnosis—including such error-status and record-identifier information as is available. In principle, the text of the error-message shall be sufficient to determine the exact location and reason for program termination.
Fortran
- Use fully-qualified
END
statements for program units:END FUNCTION
END SUBROUTINE
END MODULE
END PROGRAM
- The
IMPLICIT NONE
statement must be present.
- Specify
INTENT
for subroutine and function arguments.
It is metadata perjury to specify this incorrectly, e.g., by the lazy use ofINTENT(INOUT)
for everything.
Use ofINTENT(INOUT)
is a red flag indicating probable bad modularity, as it tends to indicate splitting a single task across multiple code-units.
- Arithmetic
IF
, AlternateRETURN
,PAUSE
, and Assigned and computedGOTO
statements are forbidden.
- Counters of Fortran
DO
loops will beINTEGER
variables.
- Loops and block constructions (
IF-THEN...ELSE IF...ELSE... END IF
,DO
loops, etc.) with either deep nesting or long blocks of code in them should have documentation (by in-line exclamation point ("!...") comments) for block beginning, alternatives (ELSE
blocks), and ending, so that their beginnings, alternatives, and endings can be matched up easily.
- Loop beginning and loop ending for loops constructed with
IF
s andGO TO
s must be documented by comments.
- Labels shall appear only on
CONTINUE
statements (otherwise, the flow-of-control may not be clear).
- Do not
GO TO
the interior of aDO
-loop or anIF-THEN-ELSE
block from outside the block: the effects are "unspecified", according to the Fortran standard, and if a compiler-vendor chooses to implement them by "Start WW-III", they're within their rights and we don't want to be stuck with it.
- Avoid use of tabs in Fortran source code: they mean different things to different systems, and make use of indentation-tools difficult.
REAL
orCOMPLEX
variables shall not be tested for equality. This is not numerically robust nor reliable.
- Free source form is preferred for new projects and codes.
Where possible, program code should fit within columns 1-80 (and cols 1-120 in all cases).
- Comment text should be in lower case, and program code should be in UPPER CASE. For new development, "!" comments are preferred.
- Comments should be separated from program-code sufficiently that neither one interferes with the reading of the other (either by having block comments interspersed with blocks of code and set off by blank lines, or by having comments in the right margin, with program code on the left.
- Whitespace and blank lines should be used to set off constructs and to emphasize commonality, to enhance readability. E.g.,
X ( IROW ) = EE + SIN ( B * C ) / D YAB( IROW ) = EE + B**2 - C**2is preferable to:X(IROW)=EE+SIN(B*C)/D YAB(IROW)=EE+B**2-C**2
- Indentation shall be in multiples of four columns, with initiation and termination at the same level (e.g., a
DO
and its terminatingEND DO
start at the same column) ("Algol/Wirth/Whitesmith" style indentation).
- Continuation-line code shall be indented at least 6 spaces beyond the current indentation level.
- Parallel directives shall use the same indentation as other source code.
- The use of zeros where needed on either side of a decimal point to enhance readability of floating-point constants is preferred, e.g.,
1.0
and0.007
instead of1.
and.007
.
- Do not use repeated divisions in Fortran expressions: e.g., use
FOO/(BAR*QUX)
instead ofFOO/BAR/QUX
:
(a) Fortran precedence rules are the opposite of normal algebra rules, and this does lead to confusion and to obscure bugs; and
(b) divisions are typically 30-100x more expensive than multiplications, so is inefficient anyway.
Those meteorology professors who encourage this practice should be shot! (For the most part, fractions in "first normal form" are most readable and most efficient: numerator, with parenthesis as necessary, and with constants leading more complex sub-expressions, a divide-symbol, then denominator, with parentheses as necessary, and with constants leading more complex sub-expressions.)
- Code with numerical round-off properties in mind. In particular, avoid things like the following stupidity out of MCIP (with a divide by
GSFC
at the end of scratch-variableF3
, and the a multiply by it inRSTOMI(C,R)
, accomplishing only round-off error and wasting CPU time on an (expensive) divide-operation):F3 = 0.5 * ( GSFC - GA & + SQRT( GA * GA & + GA * GSFC * (4.0 * Q1C / QSS - 2.0) & + GSFC * GSFC ) ) / GSFC RSTOMI(C,R) = GSFC * F3
- Where possible, avoid use of explicit
KIND
s: I/O API data is "default"INTEGER
,REAL
,DOUBLE PRECISION
, and you may cause obscure, hard-to-diagnose bugs. Moreover, explicit usages such asREAL(KIND=4) FOO
are incorrect and work only by vendor-choice accident: the standard does NOT equate such a declaration toIEEE REAL*4
or other 4-byte real.
- New codes should
USE M3UTILIO
instead ofINCLUDE
ing I/O APIINCLUDE
files (PARMS3.EXT, FDESC3.EX, IODECL3.EXT,
etc.). This not only is a substantial convenience, it also allows the compiler to check argument-list compatibility for most I/O API calls.
- Generally, modeling codes should not use
CLOSE3()
: in a modular model such asMAQSIP
orCMAQ
, file-state is a global model property, and changing it in one module can violate the environment of a different module. The one acceptable use is for intialization/restart situations where one may want be able to do the following kind of sequence:IF ( OPEN3( 'INIT_STUFF', FSREAD3, PNAME ) ) THEN ... END IF IF ( .NOT.CLOSE3( 'INIT_STUFF' ) ) THEN ...we have an error to process... ELSE IF ( OPEN3( 'OUTPUT_STUFF', FSUNKN3, PNAME ) ) THEN ... END IFwhere for a "cold-start" initialization,INIT_STUFF
is the cold-start initialization file andOUTPUT_STUFF
is the new output file, but for a restart,INIT_STUFF
andOUTPUT_STUFF
are the same file, and we are usingCLOSE3
to avoid having the same physical file open in different (read-only vs read-write) modes under two different names, which might lead to data corruption. (You may recall we rather had to jump through hoops -- generating obscure warning messages -- to avoid this problems without usingCLOSE3
in MAQSIP).
- Note that there are standard tools like findent ( https://sourceforge.net/projects/findent/) on Linux that can help you with the code-format issues above...
C/C++
- Where possible, program code should fit within columns 1-80 (and cols 1-120 in all cases).
- Avoid use of tabs in C/C++ source code: they mean different things to different systems, and make use of indentation-tools difficult.
- Whitespace should be use to set off constructs and to emphasize commonality, to enhance readability.
- Comments should be separated from program-code sufficiently that neither interferes with the reading of the other.
- Indentation shall be in multiples of four columns, using "Algol/Wirth/Whitesmiths" brace style, e.g.,
for ( i=0 ; i<N ; i++ ) { something() ; }
- Use fully-braced bodies for if-block and for-loop constructs:
(a) this helps to avoid a class of obscure bugs; and
(b) it enhances maintainability, if the body goes from single-statement to multi-statement later.
- Note that there are standard tools like astyle on Linux that can help you with the code-format issues above...
Scripts
- Scripts that execute M3IO program/model executables (e.g., Fortran programs) shall not split program log into multiple pieces. Output from a program will be included within the log for the script that executes that program.
- The use of environment variable
LOGFILE
to separate program-log from script-log is forbidden.
- Scripts that execute programs/models (e.g., Fortran programs) shall log program environment and (if present) contents of the command-input ("UI") file.
- Scripts shall return program exit status.
- Indentation shall have initiation and termination at the same level (e.g., an
if
and its terminatingendif
orfor
and its terminatingend
shall start at the same column) ("Algol/Wirth/Whitesmith" style indentation). Multiples of four columns is preferred.
- To ease analysis: Whenever a script uses redirection to create a new log file, the script will display, in the current log, the fully qualified path to the new log prior to executing the command with the redirection.
- To promote portability: Where practical, scripts will not use hard coded directory names and file names. Scripts will use environment variables where available and parameters to build file paths.
- To promote re-use: Where appropriate scripts will be designed to perform a set of related generic functions. Scripts will use an argument list, environment variables, or a configuration file to obtain parameters required by the functions.
- To ease analysis: Where practical, scripts will be designed to reduce the depth of calling sequences by bundling related functions within a single script.
In particular, it is bad practice to have three or more levels of script-calling, for which definitions in one level are used two or more calling-levels further down.
- To ease analysis: When appropriate, scripts will echo parameter information and configuration information. Scripts will echo appropriate level of detail to describe operations being performed.
- To ease analysis: Where appropriate a script will identify its file name and the current date/time at the beginning of its execution. A script will also identify its file name and current date/time at the end of normal execution.
- To support monitoring: When a script detects an error the script will echo its file name and appropriate details. The script will prepend the keyword
###
to each line of the error message.
- Each production script will include near the top a comment to display the contents of the Subversion keyword Id.
- Each script will include near the top a comment to display a its author, revision history, and copyright statement.
- The switches
-cw
will be used in a command perl -cw to verify the syntax of a PERL script before the script is put into production.
- In PERL scripts, the switch
-w
will be used on the PERL invocation line (e.g.,#!/usr/bin/perl -w ...
) to enable warnings during execution of the script.
- In a csh script not invoked directly from a cron table, the switch
-f
will be used on the csh invocation line (i.e.,#!/bin/csh -f ...
) to disable re-execution of the login profile.
$Id: STANDARDS.html 75 2018-01-12 15:48:16Z coats $