- REVIEW
- CODING
- General
- Fortran
- C/C++
- Scripts
This document started as the set of standards for the EPA Models-3 project, and was continued by the author while at the North Carolina Supercomputing Center, then at Baron Advanced Meteorological Systems, LLC. [BAMS], and now at UNC's Institute for the Environment. It is the software development standard under which the Models-3 I/O API, the MAQSIP air quality model, the NCSC/BAMS operational-forecast tools and models, and various other software systems were and are maintained. It is offered here for use with such other projects as may find it useful.(There are still some "dangling" BAMS references, e.g., to the Chief Architect, Chief Operations Officer, test vs production repositories, etc. These give some level of recognition to the importance of the items that reference them.)
Review processes are envisioned as a mechanism providing control and feedback into the analysis, design, implementation, testing, and use of software and data. It is expected that the scope and extent of the review processes will be highly dependent upon the size and complexity of the software being reviewed.The resources required are mostly the time of the personnel involved, and should usually amount to about 10 - 20 % of the man-hours required for the initial software development.
It should be recognized that the review processes should not only result in higher quality in the software developed, but also in the increased communication and software development expertise of the personnel involved.
At the present time, review shall require design and/or code walk through that involves the Chief Architect and the Chief Operations Officer or his designate.
Design review objectives include the following:
- faithful representation -- i. e., that design accurately addresses the requirements
- good representation:
- feasible
- modular and flexible
- technically sound
- structurally sound (good, robust design choices)
- design standards are met
- adequate test design specified.
Code review objectives include the following
- code accurately follows the design
- structurally sound (well-thought-out) implementation
- coding standards met (implies readability)
- testing performed adequately
An especially important aspect of design and code is that of interfaces.
Interfaces include:
- Shared state, e.g., a situation where an operation is split across several different subroutines (etc.) which must cooperate in order to perform that operation.
- Preconditions (what must be done before invocation) and postconditions (what is guaranteed to be true after invocation)
- The set of input and output data structures, particularly the input and output files,
- The set of environment variables required (including logical file names),
- The names of the variables read from or written to those files
- The sequence of prompts required by the program (and the meaning of the responses to those prompts).
- return status
- For
PROGRAMs, interfaces include whatever else is required in order to write a script for the program.- For
MODULEs, interfaces include allPUBLICentities.Note that it is good practice forPUBLICentities only to includeTYPEs,PARAMETERs,SUBROUTINEs andFUNCTIONs, andPROTECTEDvariables.Module variables generally should be
PRIVATE. In a few exceptional cases, variables may bePUBLIC, PROTECTED, in order to give read-only access to their values to code thatUSEs the module.- For
SUBROUTINEs andFUNCTIONs, interfaces include the arguments and theirINTENTs, the preconditions, and the postconditions.Good design, with modularity and flexibility, is largely the art of finding that separation of software into components which yields the best and simplest set of interfaces, where each task is internally cohesive, and produces internally consistent objects that are then inputs to other processes.
Generally, interfaces should be kept as simple--and as standardized--as possible. For example:
- Minimize shared state.
Separating initialization from computation is a violation of the shared-state/cohesive-components/simple-interfaces principle.- Subsidiarity: Declare variables (etc.) at the most local scope consistent with their use.
For example, declaring aMODULE-variable that is allocated and deallocated within a particular subroutine, and is used only in that subroutine and its callees is a violation: that variable should have been local to that subroutine.- Having multiple specialized names for multiple variables with the same functionality, for the purpose of distinguishing what program produced that variable is a violation of interface simplicity and standardization. If variables are functionally interchangeable, they ought to have the same name.
- Requiring management-oriented metadata as part of an interface is a violation of interface simplicity and component cohesion.
General Requirements
- Where possible, internal names should match external-interface names. For example, if an input file has a variable named FOO then the source code should also refer to that variable by the name FOO unless either multiple input variables named FOO are present or the name of the corresponding input variable is in fact a program-option.
- Program code must pass Chief Architect review before being accepted for the production repository.
- Production executables will be built directly from the production repository.
- Scripts must pass Chief Operations Officer review before being accepted for production use.
- For new programs, interfaces shall be documented.
- For new major programs, interfaces and program function shall be specified, and must be accepted by the Chief Architect before initiating work.
- For modified programs, interfaces and program function shall not be changed without permission of the Chief Architect.
- The purpose of the main code repository is to maintain the main code path, not, e.g., 8 different versions of the code
noahr_mod.F90.begin_week_ncar, ...
Makefilestructure in the repository is as follows:
Makefile.templateis the repository copy. It is not used directly.Makefileis the local (non-repository) customized copy, actually used for building codes.- Updating
Makefile.templateis a required part of a repository commit.- Updating
Makefileis a required part of a repository checkout or update that creates/changesMakefile.template.
- Makefile.template updates must accompany items submitted, where program-changes affect Makefiles.
General
- Code shall be written with view to readability and maintainability. Code is actually more a communications medium than merely a mechanical algorithm-description.
In particular, avoid "clutter".
- Code shall be compliant with the relevant language standards, with exceptions that may be allowed by the Chief Architect.
- Programs must build successfully under all three of: Intel, Sun, and GNU compilers before being accepted. Generally, any variance from programming language standards must be approved by the Chief Architect before acceptance.
- Models-3 standard: All modeling and tools programs must use
M3EXIT()for termination, with exit-status following the normal UNIX conventions: see /usr/include/stdlib.h. By this convention,Fortran
- 0 means normal successful completion;
- 1 means I/O related error;
- 2 means general program-related error; and
- N > 2 allows you to classify errors for your program in detail.
STOPis forbidden.
- Generally, program-logs should contain enough information both to diagnose any error(s) which may have occured and to reconstruct a script of the program. In particular, memory-allocation, file-opening, environment-variable evaluation, and MPI-operation status shall be checked, and failures shall be logged [I/O API routines do this automatically, for the most part.]
- Programs shall include a "splash screen" in their output/logs, that documents the following:
- Program name and function
- preconditions, and requirements (e.g., preconditions, environment variables, input and output files)
- Copyright notice
- Program version (Subversion
$Id$
- Use mixed-case output to a log: the human-factors specialists say that WHILE ALL CAPS GETS THE READER'S ATTENTION, IT IS HARDER TO READ THAN TEXT THAT IS GIVEN IN MIXED CASE.
- Programs, modules, subroutines and functions should have header comments with the following:
Header-comment revision history will be kept up-to-date.
- The purpose of the module, subroutine, or function
- the function and logic of the algorithms,
- Literature references, if appropriate
- the purposes of major variables
- Preconditions, especially required environment variables,
- Revision history
- M3IO programs shall use
M3EXIT()for program termination.
- Declare variables (etc.) at the most local scope consistent with their use. Also, "auto" local array variables are preferred over explicitly allocated/deallocated local arrays, whenever possible.
- Specify
INTENTfor subroutine and function arguments (either explicitly, for Fortran, or in the descriptive comments for C and other languages).
- Avoid multiple (> 2) levels of definitions, recursion, script-invocation, and other nested structures: the cognitive psychologyists say this greatly decreases understandability of the code.
For this purpose, "perfect" loop-nests may be regarded as a single control structure; indentation shall reflect this:DO R = 1, NROWS DO C = 1, NCOLS ... END DO END DO- Do not make comments that merely echo the content of program source code.
For example, the following is improper:SUBROUTINE INITCHAN( OUTSTEP, SDATE, STIME, OVLSTEP ) !! begin initchan ... RETURN END SUBROUTINE INITCHAN !! end initchan- File-names:
- File and directory names of the form core or core.* are strictly forbidden.
- lib is only used for directories containing binary-library
.aand/or.sofiles.
- File extensions are:
.cC source.CC++ source.cshC-shell/TCSH source.kshKorn-shell source.shBourne-shell/BASH source.ffixed-source-form Fortran.f90free-source-form Fortran.Ffixed-source-form Fortran requiring pre-processing.F90free-source-form Fortran requiring pre-processing.plPERL source
- For Fortran MODULEs, the basename of the source file is the lower-case of the
MODULE-name (e.g.,MODULE MODFOOshall be in source file modfoo.f or modfoo.f90).
- Only code that compiles correctly may go into the repository.
- For each repository directory, the code version contained shall be unique. For multiple versions of an executable, you should use svn branches.
- For any svn add or svn delete, make corresponding changes to the master
Makefile.templatebefore svn commit.
- No one but the Chief Architect may change the source code in ioapi and m3tools.
- Indentation style shall be consistent across entire source files.
- All I/O and memory-allocation operations (and other operations that may potentially succeed or fail, e.g., environment-variable calls, MPI operations) shall check the success/failure status of the indicated operation, and shall log failures, preferably with the error-status value. (For M3IO READ3(), etc., the log-operation is done automatically; still, however, success/failure status shall be checked, and the program code shall act accordingly.)
Such log messages shall indicate the situation in adequate detail for post-mortem diagnosis—including such error-status and record-identifier information as is available. In principle, the text of the error-message shall be sufficient to determine the exact location and reason for program termination.
Fortran
- Use fully-qualified
ENDstatements for program units:END FUNCTION
END SUBROUTINE
END MODULE
END PROGRAM
- The
IMPLICIT NONEstatement must be present.
- Specify
INTENTfor subroutine and function arguments.
It is metadata perjury to specify this incorrectly, e.g., by the lazy use ofINTENT(INOUT)for everything.
Use ofINTENT(INOUT)is a red flag indicating probable bad modularity, as it tends to indicate splitting a single task across multiple code-units.
- Arithmetic
IF, AlternateRETURN,PAUSE, and Assigned and computedGOTOstatements are forbidden.
- Counters of Fortran
DOloops will beINTEGERvariables.
- Loops and block constructions (
IF-THEN...ELSE IF...ELSE... END IF,DOloops, etc.) with either deep nesting or long blocks of code in them should have documentation (by in-line exclamation point ("!...") comments) for block beginning, alternatives (ELSEblocks), and ending, so that their beginnings, alternatives, and endings can be matched up easily.
- Loop beginning and loop ending for loops constructed with
IFs andGO TOs must be documented by comments.
- Labels shall appear only on
CONTINUEstatements (otherwise, the flow-of-control may not be clear).
- Do not
GO TOthe interior of aDO-loop or anIF-THEN-ELSEblock from outside the block: the effects are "unspecified", according to the Fortran standard, and if a compiler-vendor chooses to implement them by "Start WW-III", they're within their rights and we don't want to be stuck with it.
- Avoid use of tabs in Fortran source code: they mean different things to different systems, and make use of indentation-tools difficult.
REALorCOMPLEXvariables shall not be tested for equality. This is not numerically robust nor reliable.
- Free source form is preferred for new projects and codes.
Where possible, program code should fit within columns 1-80 (and cols 1-120 in all cases).
- Comment text should be in lower case, and program code should be in UPPER CASE. For new development, "!" comments are preferred.
- Comments should be separated from program-code sufficiently that neither one interferes with the reading of the other (either by having block comments interspersed with blocks of code and set off by blank lines, or by having comments in the right margin, with program code on the left.
- Whitespace and blank lines should be used to set off constructs and to emphasize commonality, to enhance readability. E.g.,
X ( IROW ) = EE + SIN ( B * C ) / D YAB( IROW ) = EE + B**2 - C**2is preferable to:X(IROW)=EE+SIN(B*C)/D YAB(IROW)=EE+B**2-C**2
- Indentation shall be in multiples of four columns, with initiation and termination at the same level (e.g., a
DOand its terminatingEND DOstart at the same column) ("Algol/Wirth/Whitesmith" style indentation).
- Continuation-line code shall be indented at least 6 spaces beyond the current indentation level.
- Parallel directives shall use the same indentation as other source code.
- The use of zeros where needed on either side of a decimal point to enhance readability of floating-point constants is preferred, e.g.,
1.0and0.007instead of1.and.007.
- Do not use repeated divisions in Fortran expressions: e.g., use
FOO/(BAR*QUX)instead ofFOO/BAR/QUX:
(a) Fortran precedence rules are the opposite of normal algebra rules, and this does lead to confusion and to obscure bugs; and
(b) divisions are typically 30-100x more expensive than multiplications, so is inefficient anyway.
Those meteorology professors who encourage this practice should be shot! (For the most part, fractions in "first normal form" are most readable and most efficient: numerator, with parenthesis as necessary, and with constants leading more complex sub-expressions, a divide-symbol, then denominator, with parentheses as necessary, and with constants leading more complex sub-expressions.)
- Code with numerical round-off properties in mind. In particular, avoid things like the following stupidity out of MCIP (with a divide by
GSFCat the end of scratch-variableF3, and the a multiply by it inRSTOMI(C,R), accomplishing only round-off error and wasting CPU time on an (expensive) divide-operation):F3 = 0.5 * ( GSFC - GA & + SQRT( GA * GA & + GA * GSFC * (4.0 * Q1C / QSS - 2.0) & + GSFC * GSFC ) ) / GSFC RSTOMI(C,R) = GSFC * F3
- Where possible, avoid use of explicit
KINDs: I/O API data is "default"INTEGER,REAL,DOUBLE PRECISION, and you may cause obscure, hard-to-diagnose bugs. Moreover, explicit usages such asREAL(KIND=4) FOOare incorrect and work only by vendor-choice accident: the standard does NOT equate such a declaration toIEEE REAL*4or other 4-byte real.
- New codes should
USE M3UTILIOinstead ofINCLUDEing I/O APIINCLUDEfiles (PARMS3.EXT, FDESC3.EX, IODECL3.EXT,etc.). This not only is a substantial convenience, it also allows the compiler to check argument-list compatibility for most I/O API calls.
- Generally, modeling codes should not use
CLOSE3(): in a modular model such asMAQSIPorCMAQ, file-state is a global model property, and changing it in one module can violate the environment of a different module. The one acceptable use is for intialization/restart situations where one may want be able to do the following kind of sequence:IF ( OPEN3( 'INIT_STUFF', FSREAD3, PNAME ) ) THEN ... END IF IF ( .NOT.CLOSE3( 'INIT_STUFF' ) ) THEN ...we have an error to process... ELSE IF ( OPEN3( 'OUTPUT_STUFF', FSUNKN3, PNAME ) ) THEN ... END IFwhere for a "cold-start" initialization,INIT_STUFFis the cold-start initialization file andOUTPUT_STUFFis the new output file, but for a restart,INIT_STUFFandOUTPUT_STUFFare the same file, and we are usingCLOSE3to avoid having the same physical file open in different (read-only vs read-write) modes under two different names, which might lead to data corruption. (You may recall we rather had to jump through hoops -- generating obscure warning messages -- to avoid this problems without usingCLOSE3in MAQSIP).
- Note that there are standard tools like findent ( https://sourceforge.net/projects/findent/) on Linux that can help you with the code-format issues above...
C/C++
- Where possible, program code should fit within columns 1-80 (and cols 1-120 in all cases).
- Avoid use of tabs in C/C++ source code: they mean different things to different systems, and make use of indentation-tools difficult.
- Whitespace should be use to set off constructs and to emphasize commonality, to enhance readability.
- Comments should be separated from program-code sufficiently that neither interferes with the reading of the other.
- Indentation shall be in multiples of four columns, using "Algol/Wirth/Whitesmiths" brace style, e.g.,
for ( i=0 ; i<N ; i++ ) { something() ; }
- Use fully-braced bodies for if-block and for-loop constructs:
(a) this helps to avoid a class of obscure bugs; and
(b) it enhances maintainability, if the body goes from single-statement to multi-statement later.
- Note that there are standard tools like astyle on Linux that can help you with the code-format issues above...
Scripts
- Scripts that execute M3IO program/model executables (e.g., Fortran programs) shall not split program log into multiple pieces. Output from a program will be included within the log for the script that executes that program.
- The use of environment variable
LOGFILEto separate program-log from script-log is forbidden.
- Scripts that execute programs/models (e.g., Fortran programs) shall log program environment and (if present) contents of the command-input ("UI") file.
- Scripts shall return program exit status.
- Indentation shall have initiation and termination at the same level (e.g., an
ifand its terminatingendiforforand its terminatingendshall start at the same column) ("Algol/Wirth/Whitesmith" style indentation). Multiples of four columns is preferred.
- To ease analysis: Whenever a script uses redirection to create a new log file, the script will display, in the current log, the fully qualified path to the new log prior to executing the command with the redirection.
- To promote portability: Where practical, scripts will not use hard coded directory names and file names. Scripts will use environment variables where available and parameters to build file paths.
- To promote re-use: Where appropriate scripts will be designed to perform a set of related generic functions. Scripts will use an argument list, environment variables, or a configuration file to obtain parameters required by the functions.
- To ease analysis: Where practical, scripts will be designed to reduce the depth of calling sequences by bundling related functions within a single script.
In particular, it is bad practice to have three or more levels of script-calling, for which definitions in one level are used two or more calling-levels further down.
- To ease analysis: When appropriate, scripts will echo parameter information and configuration information. Scripts will echo appropriate level of detail to describe operations being performed.
- To ease analysis: Where appropriate a script will identify its file name and the current date/time at the beginning of its execution. A script will also identify its file name and current date/time at the end of normal execution.
- To support monitoring: When a script detects an error the script will echo its file name and appropriate details. The script will prepend the keyword
###to each line of the error message.
- Each production script will include near the top a comment to display the contents of the Subversion keyword Id.
- Each script will include near the top a comment to display a its author, revision history, and copyright statement.
- The switches
-cwwill be used in a command perl -cw to verify the syntax of a PERL script before the script is put into production.
- In PERL scripts, the switch
-wwill be used on the PERL invocation line (e.g.,#!/usr/bin/perl -w ...) to enable warnings during execution of the script.
- In a csh script not invoked directly from a cron table, the switch
-fwill be used on the csh invocation line (i.e.,#!/bin/csh -f ...) to disable re-execution of the login profile.
$Id: STANDARDS.html 75 2018-01-12 15:48:16Z coats $