The I/O API provides both real, disk-based files (which may be "volatile" or not, and are implemented on top of either netCDF, PnetCDF, or a native-binary implementation) and "virtual files" that may be used to provide safe, structured exchange of data -- the latter of "gridded," "boundary," or "custom" types only -- between either cooperating programs or different modules in the same program. You may even safely interchange between using real files and virtual files in different executions of the same program merely by changing the values of the logical names for the files at program launch (this would allow you to look at the data being shared between modules whenever you want to, for example, at high temporal resolution). There are two types of virtual files: memory-resident BUFFERED virtual files that can be used to share data between modules of a single program; and PVM-mailbox based COUPLING-MODE virtual files that can be used to share data and coordinate scheduling between different programs, even if they are executing on different machines half a continent apart across the Internet.Back to Contents
I/O API Version 3.2 introduces support for distributed parallel I/O using PnetCDF from Argonne National Laboratory, for use in CMAQ-5.1 and later. The original concept and the prototype code are due to Dr. David Wong, US EPA. This code has been extensively extended and revised, to meet the needs of proper I/O API integration and software engineering standards.Back to ContentsThere are a number of restrictions:
Additionally, the following environment variables are needed for setting up how the data is distributed over the processors:
- Only GRIDDED files are supported.
- All distributed-I/O files must be on the same grid (matching the CMAQ cross-point grid).
- There is one fixed data-distribution and processor-map, which is the same as the one specified by CMAQ.
To declare that a particular file is to be used with PnetCDF parallel I/O, you need a
GRIDDESC
- Path name for the GRIDDESC file
GRID_NAME
GRIDDESC
-name for the data grid
NPCOL_NPROW
- Blank-delimited list with the column- and row-dimensions for the processor-grid
Note that this list needs to be enclosed by quotes (either single or double).
MPI:
prefix on the path-name in the usualsetenv
statement, as in the following example running on a 6×8 processor-grid (48 processors in all):setenv GRIDDESC /nas01/depts/ie/cempd/WRFCMAQ/CMAQv5.0.1.GRIDDESC.txt setenv GRID_NAME US36_CRO setenv NPCOL_NPROW "6 8" ... setenv CHEMCONC3D MPI:/tmp/mydir/cmaq.conc.US36_CRO.2015233.ncfI/O API builds usiing PnetCDF are not link compatible with ordinary builds, and should be kept carefully separate from them.
You can build the I/O API to use PnetCDF/MPI distributed I/O using the following binary types (or use them as templates to build your own custom binary type):When performing the link-step to create model-executables, you will need to put the PnetCDF libraries in the library-build directory, and add the PnetCDF libraries to the link-step command line (assuming netCDF-4 style libraries below):
- Makeinclude.Linux2_x86_64gfortmpi
- Makeinclude.Linux2_x86_64ifortmpi
- Makeinclude.Linux2_x86_64pgmpi
- Makeinclude.Linux2_x86_64sunmpi
... -lpnetcdf -lnetcdff -lnetcdf ...
Multiple files which have the same structure (type, dimensions, list of variables, time step), and which cover an extended time period may be opened under a single logical name by:Back to Contentssetenv FILE_1 <path name> ... setenv FILE_N <path name> setenv ANAME LIST:FILE_1,...,FILE_Nsubject to the requirement that the value forANAME
has length at most 256 for I/O API 3.0 or earlier, and 65535 for 3.1 or later. In case of overlapping time step sequences, the rule is "first file wins," i.e., if the data is available from the first file,FILE_1
, use it; else if it is available from the second file, use that, and so on.Because of this rule, if you have a sequence of overlapping files covering an extended time period, you probably want to put the list
LIST:FILE_1,...,FILE_N
in reverse chronological order. For example, if the filesdata.M-N.ncf
have data from 00Z on day M through 00Z on day N from consecutive model runs, then you would probably want to list them in reverse chronological order, at least if you want to get data for 2015124:000000 from filedata.2015124-2015125.ncf
:setenv F123 /my/dir/data.2015123-2015124.ncf setenv F124 /my/dir/data.2015124-2015125.ncf setenv F125 /my/dir/data.2015125-2015126.ncf setenv F126 /my/dir/data.2015126-2015127.ncf setenv ANAME LIST:F126,F125,F124,F123 ...
Real (netCDF or native-binary disk-based) I/O API files may optionally be declared "volatile" by the addition of a trailingBack to Contents" -v"
to the value of the file's logical name in order to tell the I/O API to perform disk-synch operations before every input and after every output operation on that file:... setenv QUX "/tmp/mydir/volatiledata.mymodel -v"
These file based lower layers attempt the I/O optimization of not writing a file's header—needed in order to interpret the file's contents—out to disk until either a "synch" operation is performed, or until the file is closed. This has the effect of making non-volatile output files unreadable until the program that writes them does a
SYNC3()
call for the individual files, orSHUT3()
orM3EXIT()
(or making the files unreadable if the program crashes unexpectedly). This extra "synch" operation does cause some (usually small) performance penalty, but it allows other programs to read I/O API files while they are still being written, and prevents data loss upon program crashes.
I/O API Snoop Mode capability may be activated by addingBack to Contents-DIOAPI_SNOOP=1
toDEFINEFLAGS
in the ioapi/Makefile.Snoop Mode is designed to enable "pipelining" of data through multiple modeling/product-generation programs, e.g., for forecast-modeling systems. This allows the generation of early-hour products well before the entire forecast is complete; moreover, it will enable the operating system to make better use of its internal I/O-buffers, further increasing system modeling efficiency.
It is controlled by environment variables
SNOOPSECS3
andSNOOPTRY3
. When it is active (positive-integer values for these environment variables), when read-operationsREAD3(), XTRACT3(), INTERP3(), and DDTVAR3()
encounter end-of-file, they will re-try for up toSNOOPTRY3
attempts, with delaySNOOPSECS3
seconds in between attempts.If
SNOOPTRY3 < 0
orSNOOPSECS3 ≤ 0
, then Snoop Mode is turned off.If
SNOOPTRY3 = 0
then the number of re-tries is (almost) unlimited.
For memory-resident BUFFERED files, one restriction at present is that the basic data type of all variables in the virtual file be either integer or real. The other restriction is that only two time steps of data are kept in the buffered file -- the "even step" and the "odd step" (which in normal usage are the last two time steps written). Otherwise, you write code to open, read, write, or interpolate data just as you would with a "real" file. This provides for structured name based identity-tagged exchange of data between different modules in the same program -- since data are stored and accessed by file-name, variable-name, date, and time, the system will detect at run-time the attempt to request data not yet initialized (unlike the situation where data is exchanged via FortranCOMMON
s - we've detected some obscure use-before-calculate bugs by replacingCOMMON
s with BUFFERED virtual files.)Restrictions
- For I/O API-3.2,
XTRACT3()
,READ3()
andWRITE3()
, variables may be of typesM3REAL
,M3INT
,M3DBLE
, orM3INT8
. Prior to the release of April 13, 2020 there are troubles for non-M3REAL
All-Variables (VNAME=ALLVARS3
) calls.- For I/O API-3.1,
/XTRACT3()
,READ3()
andWRITE3()
, variables may be of typesM3REAL
,M3INT
, orM3DBLE
; there are troubles for non-M3REAL
All-Variables (VNAME=ALLVARS3
) calls.- For
INTERP3()
andINTERPX()
, the variables must be of typeM3REAL
orM3DBLE
. All-Variables (VNAME=ALLVARS3
calls are not supported.To set up a buffered virtual file, setenv the value of the file's logical name to the value
BUFFERED
(instead of to the pathname of a real physical file), as given below:... # # myprogram uses "qux" for internal data sharing: # setenv qux BUFFERED ... /user/mydir/myprogram ...Restrictions:
- For all-variable READ3() and WRITE3() calls, all of the variables in the file must be of type
M3REAL
.- Prior to the I/O API V 2.2-beta-May-3-2002 release, all variables in buffered virtual files must be of type
M3REAL
.
As part of the MCNC Practical Parallel Project, MCNC developed an extended Model Coupling Mode for the I/O API. This mode, implemented using PVM 3.4 mailboxes, allows the user to specify in the run-script whether "file" means a physical file on disk or a PVM mailbox-based communications channel (a virtual file), on the basis of the value of the file's logical name:Back to Contentssetenv FOO "virtual BAR" setenv IOAPI_KEEP_NSTEPS 3declares thatFOO
is the logical name of a virtual file whose physical name (in terms of PVM mailbox names) isBAR
. The additional environment variableIOAPI_KEEP_NSTEPS
determines the number of time steps to keep in PVM mailbox buffers -- if it is 3 (as here), and there are already 3 timesteps of variableQUX
in the mailboxes for virtual fileFOO
, then writing a fourth time step ofQUX
toFOO
causes the earliest time step ofQUX
to be erased, leaving only timesteps 2, 3, and 4. This is necessary, so that the coupled modeling system does not require an infinite amount of memory for its sustained operation. If not set,IOAPI_KEEP_NSTEPS
defaults to 2 (the minimum needed to supportINTERP3()
's double-buffering).The (UNIX) environments in which the modeler launches multiple models each of which reads or writes from a virtual file must all agree on its physical name (usually achieved by sourcing some script that contains the relevant setenv commands).
For models exchanging data via virtual files of the I/O API's coupling mode, the I/O API schedules the various processes on the basis of data availability:
There are two requirements on the modeler:
- The modeler must start up a PVM session that will "contain" all the virtual files and enroll in it all those machines which will be running the various modeling programs before starting up the various models in a coupled modeling system on those respective machines.
OPEN3()
calls for read-access to virtual files that haven't yet been opened for write access by some other process put the caller to sleep until the file is opened; and
READ3()
,INTERP3()
, orDDTVAR3()
calls for virtual-file data which has not yet been written put the reading process to sleep until the data arrives, at which point the reader is awakened and given the data it requested.Using coupling mode to construct complex modeling systems has several advantages from the model-engineering point of view:
- structuring reads and writes so as to avoid deadlocks (two or more models, each asleep while waiting for input from the other); and
- providing enough feedbacks to prevent one process from "racing ahead" of the others. In a one-way coupled system, this may mean the introduction of artificial synchronization files which exist solely to provide these feedbacks.
- Since data is tagged by variable-name, simulation date, and time, the system is not subject to data scrambling because of implicit programming assumptions about the data ordering, in the way that stream-like communications channels are.
- The same programs work unchanged both in standalone mode (reading input from files and writing output to files) and in coupled-model mode (reading and writing selected inputs or outputs to/from PVM mailboxes).
- Readers and writers do not need to know about each other in detail. In particular, any reader only needs to know that some writer will put the variables it needs into the mailbox. Writers don't care whether readers even exist or not. It is easy to change system configuration by just adding additional processes or by deleting processes and replacing them by appropriate disk-based files containing the data that would have been produced. In MCNC's Real-Time Ozone Forecast System, for example, the set of programs that runs to compute each day's ozone forecast varies from day to day, on the basis of such things as whether particular data ingest feeds have succeeded or failed over the past two days.
- One writer can supply multiple readers without special programming (and without needing to know who they are). For example, in a coupled system with the MM5/MCPL meteorology model, the SMOKE emissions model, and the MAQSIP air quality model, MM5 produces 5 time-stepped output "virtual files", some variables of two of which are read by SMOKE and all of which are read by MAQSIP; and SMOKE produces one output "virtual files" read by MAQSIP. SMOKE is itself a system of five programs coupled together by virtual files and fed by a number of additional disk-files produced off-line. MAQSIP produces a "synchronization file" read by MM5/MCIP and used to keep MM5/MCIP from running ahead and exhausting all memory available for mailbox-buffer space.
NOTE: These are primarily for use at NCEP, where the office politics forbids the presence of netCDF on their computers.The following environment-variable assignment tells the I/O API that the indicated file is in the I/O API version of native binary representation, rather than netCDF or PVM-mailbox virtual:
Note that this assignment is on a file-by-file basis, so that a program may use several different I/O API files with different modes for different purposes. As a special case, this allows ordinary "I/O API M3TOOLS" programs such as M3CPLE to serve as translators back and forth between I/O API native binary and I/O API netCDF.setenv <logical name> BIN:<path name>
This section describes the structure of the files for a new underlying ("BINFIL3
") binary mode for the EDSS/Models-3 I/O API, to supplement the existing (and default) netCDF-file mode, the in-memory BUFFERED mode, and the PVM-based virtual mode.Since this mode uses native machine binary representation for its data as its underlying data representation layer, it should offer somewhat greater performance than the machine independent lower layers (
netCDF
,PVM
) do, for applications where I/O performance is critical. On the other hand, it is very desirable to keep the header metadata in a portable format, so that user-level programs can still read the data on binary-incompatible platforms and perform the appropriate data conversion themselves. For this reason, header metadata is stored in the portable formats, as described below.The sequence of data structures in these files is modeled somewhat after the structure of netCDF files, although the implementation mechanisms to store some of the metadata in a machine independent fashion are to some extent borrowed from ideas found in other formats, e.g., GRIB.
- Initially, the supported platforms are ones with UNIXoid Fortrans (as listed below), but not Win32 nor Cray. Of these latter, Cray is the more difficult (made more difficult by the fact that I don't have access to one of their systems any more...)
- OSF/Alpha from DEC^H^H^HCompaq^H^H^H^H^H^H HP
- HP/UX
- IBM AIX
- Sun
- SGI
- Linux
- x86 with gcc/g77, gcc/g95, gcc/lf95, pgcc/pgf90, gcc/pgf90, or icc/ifc;
- x86_64 with gcc/g77, gcc/g95, or pathcc/pathf90;
- Alpha with gcc/g77 or cc/fort;
- ia64 with gcc/g77, gcc/g95, or icc/ifort;
- [PPC970 with either gcc/g77, gcc andAbsoft f90, or IBM xlc/xlf should not be difficult but hasn't been done yet]
- [Mac OS-X with either gcc/g77 or xlc/xlf should not be difficult but hasn't been done yet either, AFAIK]
- Initially, the supported data types are those needed for current air quality modeling (and excluding the grid-nest and stream-hydrology data types):
CUSTOM3
GRDDED3
BNDARY3
IDDATA3
PROFIL3
SMATRX3
- Initially, the following (as far as I know, unused) two I/O routines are not supported:
READ4D
WRITE4D
- Implementation is in C, interfacing to Fortran in the same manner as the rest of the I/O API C code.
- Uses C
stdio
, and particularly usesfseeko()
for seeks (instead offseek()
), in order to interoperate with large file systems (implies Linuxglibc
version > 2.0).
- Implementation is in file
iobin3.c
.
INIT3
callsINITBIN3
FLUSH3
calls and other required disk synchronizations use new routineSYNCFID
that unifies calls toFLUSHBIN3
andNF_SYNC
- For
BINFIL3
files,
CRTFIL3
callsCRTBIN3
OPNFIL3
callsOPNBIN3
RDTFLAG
callsRDBFLAG
WRTFLAG
callsWRBFLAG
RDVARS
callsRDBVARS
WRVARS
callsWRBVARS
XTRACT3
callsXTRBIN3
CLOSE3
callsCLOSEBIN3
OPNLOG3
(called fromOPEN3
) now logs the implementation-layer used
SHUT3
does a sequence ofCLOSEBIN3
calls
The following representations of primitive data types of significance to the I/O API are used to store metadata in a portable fashion (so that the metadata can be interpreted on platforms other than the originating platform) in I/O APIBINFIL3
files. In principle, this lets the application programmer use theBINFIL3
layer of the I/O API to read the data on any platform, determine the transformations necessary to interpret it on his platform, and then perform the transformations on the data and use it.
INT4
- represented by a 4-byte string, in little-Endian order:
BYTE_0(X)
contains(unsigned char)(X&&255)
, i.e., the least significant byte of X
BYTE_1(X)
contains(unsigned char)((X/256)&&255)
BYTE_2(X)
contains(unsigned char)((X/65536)&&255)
BYTE_3(X)
contains(unsigned char)((X/16777216)&&255)
REAL
- represented by a character string formatted with format equivalent to the Fortran
FORMAT 1PE15.9
, followed by a trailing ASCII NULL
DOUBLE
- represented by a character string formatted as
1PD27.19
, followed by a trailing ASCII NULL
NAME
- Equivalent to a Fortran
CHARACTER*16
type (fixed-length 16-byte string, padded on the right by blanks; not nul-terminated as a C string would be.)
LINE
- Equivalent to a Fortran
CHARACTER*80
type (fixed-length 80-byte string, padded on the right by blanks)
STRING
- Equivalent to the Mac Fortran internal representation of a Fortran
CHARACTER*(*)
variable (with blank-padding on the right), i.e., as a C "struct hack"struct{
INT4 length;
char contents[ length ];
} ;
The structure of aBINFIL3
file is as follows:
Header SectionINT4 IOAPI_VRSN: I/O API VersionMachine/Compiler Architecture Metadata
INT4 BYTE_ORDER: Byte order, i.e., the C subscripts at which BYTE_0, BYTE_1, BYTE_2, BYTE_3 would occur if we think of an integer as a C union:union{ int idata; char cdata[4] } ;INT4 INTSIZE: size of Fortran "INTEGER"
INT4 REALSIZE: size of Fortran "REAL"
INT4 DBLESIZE: size of Fortran "DOUBLE PRECISION"
Per-File Metadata
NAME GRIDNAME: grid nameNAME UPDATE_NAME: name of the last program writing to file
LINE EXECUTION: value of environment variable EXECUTION_ID
LINE FILE_DESC[ MXDESC3=60 ]: array containing file description (set by programmer during OPEN3())
LINE UPDATE_DESC[ MXDESC3=60 ]: array containing run description, from file with logical name SCENFILE
Dimension/Type Metadata
INT4 FTYPE: File data typeCUSTOM3, GRDDED3, BNDARY3, IDDATA3, PROFIL3, or SMATRX3INT4 GDTYP: map projection type
LATGRD3=1 (Lat-Lon),
LAMGRD3=2 (Lambert conformal conic),
MERGRD3=3 (general tangent Mercator),
STEGRD3=4 (general tangent stereographic),
UTMGRD3=5 (UTM, a special case of Mercator),
POLGRD3=6 (polar secant stereographic),
EQMGRD3=7 (equatorial secant Mercator), or
TRMGRD3=8 (transverse secant Mercator)
INT4 VGTYP: vertical coordinate type
VGSGPH3=1 (hydrostatic sigma-P),INT4 NCOLS: number of grid columns
VGSGPN3=2 (nonhydrostatic sigma-P),
VGSIGZ3=3 (sigma-Z),
VGPRES3=4 (pressure (mb)),
VGZVAL3=5 (Z (m above sea lvl), or
VGHVAL3=6 (H (m above ground))INT4 NROWS: number of grid rows
INT4 NLAYS: number of layers
INT4 NTHIK:
for BNDARY3 files, perimeter thickness (cells), or for SMATRX3 files, number of matrix-columns (unused for other file types)Temporal Metadata
INT4 SDATE: starting date, coded YYYYDDD according to Models-3 conventionsINT4 STIME: starting time, coded HHMMSS according to Models-3 conventions
INT4 TSTEP: time step, coded HHMMSS according to Models-3 conventions
INT4 NRECS: current number of time step records in the file (1-based Fortran-style counting)
Spatial Metadata
DOUBLE P_ALPHA: first map projection descriptive parameterDOUBLE P_BETA: second map projection descriptive parameter
DOUBLE P_GAMMA: third map projection descriptive parameter
DOUBLE X_CENTER: Longitude of the Cartesian map projection coordinate-origin (location where X=Y=0)
DOUBLE Y_CENTER: Latitude of the Cartesian map projection coordinate origin (map units)
DOUBLE X_ORIGIN: Cartesian X-coordinate of the lower left corner of the (1,1) grid cell (map units)
DOUBLE Y_ORIGIN: Cartesian Y-coordinate of the lower left corner of the (1,1) grid cell (map units)
DOUBLE X_CELLSIZE: X-coordinate cell dimension (map units)
DOUBLE Y_CELLSIZE: Y-coordinate cell dimension (map units)
REAL VGTOP: model-top, for sigma vertical-coordinate types
REAL VGLEVELS[0:NLAYS+1]: array of vertical coordinate level values; level 1 of the grid goes from vertical coordinate VGLEVELS[0] to VGLEVELS[1], etc.
Per-Variable Metadata
NAME VNAME[ NVARS ]: array of variable namesNAME UNITS[ NVARS ]: array of units or 'none'
LINE VDESC[ NVARS ]: array of array of variable descriptions
INT4 VTYPE[ NVARS ]: array of variable types:
M3BYTE = 1
M3INT = 4
M3REAL = 5
M3DBLE = 6Additional attributes
Not implemented at this time.Eventually: TBD, as necessary for the WRF extensions placed in I/O API Version 2.2. At this point, we anticipate that the implementation will be in terms of a sequence of <name-type-value> triplets
Data Section
sequence of time step records
Time Step Header
INT4 FLAGS[2,NVARS]: array of data-availability flags (with Fortran-style left-major, 1-based subscripting):FLAGS[1,V] are the dates for the data record, encoded YYYYDDDFLAGS[2,V] are the times for the data record, encoded HHMMSS
FLAGS[1,V] and FLAGS[2,V] are in consecutive memory/disk locations.
(NOTE: This amount of data is not functionally necessary; however, it is included for the historical reasons involving the convenience of visualization-system programmers.)
Time step Contents:
array of data records, subscripted by variable 1, ..., NVARS:
<type> array of data for this variable and time step. Data is in native machine binary format.
To: Models-3/EDSS I/O API: The Help Pages