REAL FILES AND "VIRTUAL FILES"

Introduction
PnetCDF Distributed-I/O Files
File-Set File Lists
VOLATILE Real Files
SNOOP Mode File Handling
BUFFERED Virtual Files
COUPLING-MODE Virtual Files
Native-Binary Real Files

Introduction

The I/O API provides both real, disk-based files (which may be "volatile" or not, and are implemented on top of either netCDF, PnetCDF, or a native-binary implementation) and "virtual files" that may be used to provide safe, structured exchange of data -- the latter of "gridded," "boundary," or "custom" types only -- between either cooperating programs or different modules in the same program. You may even safely interchange between using real files and virtual files in different executions of the same program merely by changing the values of the logical names for the files at program launch (this would allow you to look at the data being shared between modules whenever you want to, for example, at high temporal resolution). There are two types of virtual files: memory-resident BUFFERED virtual files that can be used to share data between modules of a single program; and PVM-mailbox based COUPLING-MODE virtual files that can be used to share data and coordinate scheduling between different programs, even if they are executing on different machines half a continent apart across the Internet.

Back to Contents

`pnetCDF` Distributed-I/O Files

I/O API Version 3.2 introduces support for distributed parallel I/O using PnetCDF from Argonne National Laboratory, for use in CMAQ-5.1 and later. The original concept and the prototype code are due to Dr. David Wong, US EPA. This code has been extensively extended and revised, to meet the needs of proper I/O API integration and software engineering standards.
There are a number of restrictions:

Only GRIDDED files are supported.
All distributed-I/O files must be on the same grid (matching the CMAQ cross-point grid).
There is one fixed data-distribution and processor-map, which is the same as the one specified by CMAQ.
Additionally, the following environment variables are needed for setting up how the data is distributed over the processors:

GRIDDESC

Path name for the GRIDDESC file

GRID_NAME

GRIDDESC-name for the data grid

NPCOL_NPROW

Blank-delimited list with the column- and row-dimensions for the processor-grid
Note that this list needs to be enclosed by quotes (either single or double).

To declare that a particular file is to be used with PnetCDF parallel I/O, you need a MPI: prefix on the path-name in the usual setenv statement, as in the following example running on a 6×8 processor-grid (48 processors in all):
    setenv  GRIDDESC     /nas01/depts/ie/cempd/WRFCMAQ/CMAQv5.0.1.GRIDDESC.txt
    setenv  GRID_NAME    US36_CRO
    setenv  NPCOL_NPROW  "6 8"
    ...
    setenv  CHEMCONC3D   MPI:/tmp/mydir/cmaq.conc.US36_CRO.2015233.ncf
    
I/O API builds usiing PnetCDF are not link compatible with ordinary builds, and should be kept carefully separate from them.
You can build the I/O API to use PnetCDF/MPI distributed I/O using the following binary types (or use them as templates to build your own custom binary type):

Makeinclude.Linux2_x86_64gfortmpi
Makeinclude.Linux2_x86_64ifortmpi
Makeinclude.Linux2_x86_64pgmpi
Makeinclude.Linux2_x86_64sunmpi
When performing the link-step to create model-executables, you will need to put the PnetCDF libraries in the library-build directory, and add the PnetCDF libraries to the link-step command line (assuming netCDF-4 style libraries below):
... -lpnetcdf -lnetcdff -lnetcdf ...

Back to Contents

File-Set File Lists

Multiple files which have the same structure (type, dimensions, list of variables, time step), and which cover an extended time period may be opened under a single logical name by:
    setenv FILE_1 <path name>
    ...
    setenv FILE_N <path name>
    setenv ANAME  LIST:FILE_1,...,FILE_N
    
subject to the requirement that the value for ANAME has length at most 256 for I/O API 3.0 or earlier, and 65535 for 3.1 or later. In case of overlapping time step sequences, the rule is "first file wins," i.e., if the data is available from the first file, FILE_1, use it; else if it is available from the second file, use that, and so on.
Because of this rule, if you have a sequence of overlapping files covering an extended time period, you probably want to put the list LIST:FILE_1,...,FILE_N in reverse chronological order. For example, if the files data.M-N.ncf have data from 00Z on day M through 00Z on day N from consecutive model runs, then you would probably want to list them in reverse chronological order, at least if you want to get data for 2015124:000000 from file data.2015124-2015125.ncf:
    setenv F123 /my/dir/data.2015123-2015124.ncf
    setenv F124 /my/dir/data.2015124-2015125.ncf
    setenv F125 /my/dir/data.2015125-2015126.ncf
    setenv F126 /my/dir/data.2015126-2015127.ncf
    setenv ANAME  LIST:F126,F125,F124,F123
    ...
    

Back to Contents

VOLATILE Real Files

Real (netCDF or native-binary disk-based) I/O API files may optionally be declared "volatile" by the addition of a trailing " -v" to the value of the file's logical name in order to tell the I/O API to perform disk-synch operations before every input and after every output operation on that file:
    ...
    setenv  QUX  "/tmp/mydir/volatiledata.mymodel -v"
    
These file based lower layers attempt the I/O optimization of not writing a file's header—needed in order to interpret the file's contents—out to disk until either a "synch" operation is performed, or until the file is closed. This has the effect of making non-volatile output files unreadable until the program that writes them does a SYNC3() call for the individual files, or SHUT3() or M3EXIT() (or making the files unreadable if the program crashes unexpectedly). This extra "synch" operation does cause some (usually small) performance penalty, but it allows other programs to read I/O API files while they are still being written, and prevents data loss upon program crashes.

Back to Contents

`Snoop Mode`. File-Handling

I/O API Snoop Mode capability may be activated by adding -DIOAPI_SNOOP=1 to DEFINEFLAGS in the ioapi/Makefile.
Snoop Mode is designed to enable "pipelining" of data through multiple modeling/product-generation programs, e.g., for forecast-modeling systems. This allows the generation of early-hour products well before the entire forecast is complete; moreover, it will enable the operating system to make better use of its internal I/O-buffers, further increasing system modeling efficiency.
It is controlled by environment variables SNOOPSECS3 and SNOOPTRY3. When it is active (positive-integer values for these environment variables), when read-operations READ3(), XTRACT3(), INTERP3(), and DDTVAR3() encounter end-of-file, they will re-try for up to SNOOPTRY3 attempts, with delay SNOOPSECS3 seconds in between attempts.
If SNOOPTRY3 < 0 or SNOOPSECS3 ≤ 0, then Snoop Mode is turned off.
If SNOOPTRY3 = 0 then the number of re-tries is (almost) unlimited.

Back to Contents

BUFFERED Virtual Files

For memory-resident BUFFERED files, one restriction at present is that the basic data type of all variables in the virtual file be either integer or real. The other restriction is that only two time steps of data are kept in the buffered file -- the "even step" and the "odd step" (which in normal usage are the last two time steps written). Otherwise, you write code to open, read, write, or interpolate data just as you would with a "real" file. This provides for structured name based identity-tagged exchange of data between different modules in the same program -- since data are stored and accessed by file-name, variable-name, date, and time, the system will detect at run-time the attempt to request data not yet initialized (unlike the situation where data is exchanged via Fortran COMMONs - we've detected some obscure use-before-calculate bugs by replacing COMMONs with BUFFERED virtual files.)
Restrictions

For I/O API-3.2, XTRACT3(), READ3() and WRITE3(), variables may be of types M3REAL, M3INT, M3DBLE, or M3INT8. Prior to the release of April 13, 2020 there are troubles for non-M3REAL All-Variables (VNAME=ALLVARS3) calls.
For I/O API-3.1, /XTRACT3(), READ3() and WRITE3(), variables may be of types M3REAL, M3INT, or M3DBLE; there are troubles for non-M3REAL All-Variables (VNAME=ALLVARS3) calls.
For INTERP3() and INTERPX(), the variables must be of type M3REAL or M3DBLE. All-Variables (VNAME=ALLVARS3 calls are not supported.

To set up a buffered virtual file, setenv the value of the file's logical name to the value BUFFERED (instead of to the pathname of a real physical file), as given below:
    ...
    #
    # myprogram uses "qux" for internal data sharing:
    #
    setenv qux BUFFERED
    ...
    /user/mydir/myprogram
    ...
Restrictions:

For all-variable READ3() and WRITE3() calls, all of the variables in the file must be of type M3REAL.
Prior to the I/O API V 2.2-beta-May-3-2002 release, all variables in buffered virtual files must be of type M3REAL.

Back to Contents

COUPLING-MODE Virtual Files

As part of the MCNC Practical Parallel Project, MCNC developed an extended Model Coupling Mode for the I/O API. This mode, implemented using PVM 3.4 mailboxes, allows the user to specify in the run-script whether "file" means a physical file on disk or a PVM mailbox-based communications channel (a virtual file), on the basis of the value of the file's logical name:
    setenv FOO                "virtual BAR"
    setenv IOAPI_KEEP_NSTEPS  3
    
declares that FOO is the logical name of a virtual file whose physical name (in terms of PVM mailbox names) is BAR. The additional environment variable IOAPI_KEEP_NSTEPS determines the number of time steps to keep in PVM mailbox buffers -- if it is 3 (as here), and there are already 3 timesteps of variable QUX in the mailboxes for virtual file FOO, then writing a fourth time step of QUX to FOO causes the earliest time step of QUX to be erased, leaving only timesteps 2, 3, and 4. This is necessary, so that the coupled modeling system does not require an infinite amount of memory for its sustained operation. If not set, IOAPI_KEEP_NSTEPS defaults to 2 (the minimum needed to support INTERP3()'s double-buffering).
The (UNIX) environments in which the modeler launches multiple models each of which reads or writes from a virtual file must all agree on its physical name (usually achieved by sourcing some script that contains the relevant setenv commands).
For models exchanging data via virtual files of the I/O API's coupling mode, the I/O API schedules the various processes on the basis of data availability:

The modeler must start up a PVM session that will "contain" all the virtual files and enroll in it all those machines which will be running the various modeling programs before starting up the various models in a coupled modeling system on those respective machines.

OPEN3() calls for read-access to virtual files that haven't yet been opened for write access by some other process put the caller to sleep until the file is opened; and

READ3(), INTERP3(), or DDTVAR3() calls for virtual-file data which has not yet been written put the reading process to sleep until the data arrives, at which point the reader is awakened and given the data it requested.
There are two requirements on the modeler:

structuring reads and writes so as to avoid deadlocks (two or more models, each asleep while waiting for input from the other); and

providing enough feedbacks to prevent one process from "racing ahead" of the others. In a one-way coupled system, this may mean the introduction of artificial synchronization files which exist solely to provide these feedbacks.
Using coupling mode to construct complex modeling systems has several advantages from the model-engineering point of view:

Since data is tagged by variable-name, simulation date, and time, the system is not subject to data scrambling because of implicit programming assumptions about the data ordering, in the way that stream-like communications channels are.

The same programs work unchanged both in standalone mode (reading input from files and writing output to files) and in coupled-model mode (reading and writing selected inputs or outputs to/from PVM mailboxes).

Readers and writers do not need to know about each other in detail. In particular, any reader only needs to know that some writer will put the variables it needs into the mailbox. Writers don't care whether readers even exist or not. It is easy to change system configuration by just adding additional processes or by deleting processes and replacing them by appropriate disk-based files containing the data that would have been produced. In MCNC's Real-Time Ozone Forecast System, for example, the set of programs that runs to compute each day's ozone forecast varies from day to day, on the basis of such things as whether particular data ingest feeds have succeeded or failed over the past two days.

One writer can supply multiple readers without special programming (and without needing to know who they are). For example, in a coupled system with the MM5/MCPL meteorology model, the SMOKE emissions model, and the MAQSIP air quality model, MM5 produces 5 time-stepped output "virtual files", some variables of two of which are read by SMOKE and all of which are read by MAQSIP; and SMOKE produces one output "virtual files" read by MAQSIP. SMOKE is itself a system of five programs coupled together by virtual files and fed by a number of additional disk-files produced off-line. MAQSIP produces a "synchronization file" read by MM5/MCIP and used to keep MM5/MCIP from running ahead and exhausting all memory available for mailbox-buffer space.

Back to Contents

Native-Binary Real Files

Abstract: Use

NOTE: These are primarily for use at NCEP, where the office politics forbids the presence of netCDF on their computers.
The following environment-variable assignment tells the I/O API that the indicated file is in the I/O API version of native binary representation, rather than netCDF or PVM-mailbox virtual:
setenv <logical name> BIN:<path name>
Note that this assignment is on a file-by-file basis, so that a program may use several different I/O API files with different modes for different purposes. As a special case, this allows ordinary "I/O API M3TOOLS" programs such as M3CPLE to serve as translators back and forth between I/O API native binary and I/O API netCDF.

Introduction

This section describes the structure of the files for a new underlying ("BINFIL3") binary mode for the EDSS/Models-3 I/O API, to supplement the existing (and default) netCDF-file mode, the in-memory BUFFERED mode, and the PVM-based virtual mode.
Since this mode uses native machine binary representation for its data as its underlying data representation layer, it should offer somewhat greater performance than the machine independent lower layers (netCDF, PVM) do, for applications where I/O performance is critical. On the other hand, it is very desirable to keep the header metadata in a portable format, so that user-level programs can still read the data on binary-incompatible platforms and perform the appropriate data conversion themselves. For this reason, header metadata is stored in the portable formats, as described below.
The sequence of data structures in these files is modeled somewhat after the structure of netCDF files, although the implementation mechanisms to store some of the metadata in a machine independent fashion are to some extent borrowed from ideas found in other formats, e.g., GRIB.

Implementation Considerations: Restrictions and Limitations

Initially, the supported platforms are ones with UNIXoid Fortrans (as listed below), but not Win32 nor Cray. Of these latter, Cray is the more difficult (made more difficult by the fact that I don't have access to one of their systems any more...)

OSF/Alpha from DEC^H^H^HCompaq^H^H^H^H^H^H HP
HP/UX
IBM AIX
Sun
SGI
Linux

x86 with gcc/g77, gcc/g95, gcc/lf95, pgcc/pgf90, gcc/pgf90, or icc/ifc;
x86_64 with gcc/g77, gcc/g95, or pathcc/pathf90;
Alpha with gcc/g77 or cc/fort;
ia64 with gcc/g77, gcc/g95, or icc/ifort;
[PPC970 with either gcc/g77, gcc andAbsoft f90, or IBM xlc/xlf should not be difficult but hasn't been done yet]

[Mac OS-X with either gcc/g77 or xlc/xlf should not be difficult but hasn't been done yet either, AFAIK]

Initially, the supported data types are those needed for current air quality modeling (and excluding the grid-nest and stream-hydrology data types):

CUSTOM3
GRDDED3
BNDARY3
IDDATA3
PROFIL3
SMATRX3

Initially, the following (as far as I know, unused) two I/O routines are not supported:

READ4D
WRITE4D

Implementation Strategy

Implementation is in C, interfacing to Fortran in the same manner as the rest of the I/O API C code.

Uses C stdio, and particularly uses fseeko() for seeks (instead of fseek()), in order to interoperate with large file systems (implies Linux glibc version > 2.0).

Implementation is in file iobin3.c.

INIT3 calls INITBIN3

FLUSH3 calls and other required disk synchronizations use new routine SYNCFID that unifies calls to FLUSHBIN3 and NF_SYNC

For BINFIL3 files,

CRTFIL3 calls CRTBIN3
OPNFIL3 calls OPNBIN3
RDTFLAG calls RDBFLAG
WRTFLAG calls WRBFLAG
RDVARS calls RDBVARS
WRVARS calls WRBVARS
XTRACT3 calls XTRBIN3
CLOSE3 calls CLOSEBIN3

OPNLOG3 (called from OPEN3) now logs the implementation-layer used

SHUT3 does a sequence of CLOSEBIN3 calls

Metadata Format

The following representations of primitive data types of significance to the I/O API are used to store metadata in a portable fashion (so that the metadata can be interpreted on platforms other than the originating platform) in I/O API BINFIL3 files. In principle, this lets the application programmer use the BINFIL3 layer of the I/O API to read the data on any platform, determine the transformations necessary to interpret it on his platform, and then perform the transformations on the data and use it.

INT4
represented by a 4-byte string, in little-Endian order:
BYTE_0(X) contains (unsigned char)(X&&255), i.e., the least significant byte of X
BYTE_1(X) contains (unsigned char)((X/256)&&255)
BYTE_2(X) contains (unsigned char)((X/65536)&&255)
BYTE_3(X) contains (unsigned char)((X/16777216)&&255)

REAL
represented by a character string formatted with format equivalent to the Fortran FORMAT 1PE15.9, followed by a trailing ASCII NULL

DOUBLE
represented by a character string formatted as 1PD27.19, followed by a trailing ASCII NULL

NAME
Equivalent to a Fortran CHARACTER*16 type (fixed-length 16-byte string, padded on the right by blanks; not nul-terminated as a C string would be.)

LINE
Equivalent to a Fortran CHARACTER*80 type (fixed-length 80-byte string, padded on the right by blanks)

STRING
Equivalent to the Mac Fortran internal representation of a Fortran CHARACTER*(*) variable (with blank-padding on the right), i.e., as a C "struct hack" struct{ INT4 length; char contents[ length ]; } ;

File Data Structure Design

The structure of a BINFIL3 file is as follows:

Header Section INT4 IOAPI_VRSN: I/O API Version Machine/Compiler Architecture Metadata INT4 BYTE_ORDER: Byte order, i.e., the C subscripts at which BYTE_0, BYTE_1, BYTE_2, BYTE_3 would occur if we think of an integer as a C union: union{ int idata; char cdata[4] } ; INT4 INTSIZE: size of Fortran "INTEGER" INT4 REALSIZE: size of Fortran "REAL" INT4 DBLESIZE: size of Fortran "DOUBLE PRECISION" Per-File Metadata NAME GRIDNAME: grid name NAME UPDATE_NAME: name of the last program writing to file LINE EXECUTION: value of environment variable EXECUTION_ID LINE FILE_DESC[ MXDESC3=60 ]: array containing file description (set by programmer during OPEN3()) LINE UPDATE_DESC[ MXDESC3=60 ]: array containing run description, from file with logical name SCENFILE Dimension/Type Metadata INT4 FTYPE: File data type CUSTOM3, GRDDED3, BNDARY3, IDDATA3, PROFIL3, or SMATRX3 INT4 GDTYP: map projection type LATGRD3=1 (Lat-Lon), LAMGRD3=2 (Lambert conformal conic), MERGRD3=3 (general tangent Mercator), STEGRD3=4 (general tangent stereographic), UTMGRD3=5 (UTM, a special case of Mercator), POLGRD3=6 (polar secant stereographic), EQMGRD3=7 (equatorial secant Mercator), or TRMGRD3=8 (transverse secant Mercator) INT4 VGTYP: vertical coordinate type VGSGPH3=1 (hydrostatic sigma-P), VGSGPN3=2 (nonhydrostatic sigma-P), VGSIGZ3=3 (sigma-Z), VGPRES3=4 (pressure (mb)), VGZVAL3=5 (Z (m above sea lvl), or VGHVAL3=6 (H (m above ground)) INT4 NCOLS: number of grid columns INT4 NROWS: number of grid rows INT4 NLAYS: number of layers INT4 NTHIK: for BNDARY3 files, perimeter thickness (cells), or for SMATRX3 files, number of matrix-columns (unused for other file types) Temporal Metadata INT4 SDATE: starting date, coded YYYYDDD according to Models-3 conventions INT4 STIME: starting time, coded HHMMSS according to Models-3 conventions INT4 TSTEP: time step, coded HHMMSS according to Models-3 conventions INT4 NRECS: current number of time step records in the file (1-based Fortran-style counting) Spatial Metadata DOUBLE P_ALPHA: first map projection descriptive parameter DOUBLE P_BETA: second map projection descriptive parameter DOUBLE P_GAMMA: third map projection descriptive parameter DOUBLE X_CENTER: Longitude of the Cartesian map projection coordinate-origin (location where X=Y=0) DOUBLE Y_CENTER: Latitude of the Cartesian map projection coordinate origin (map units) DOUBLE X_ORIGIN: Cartesian X-coordinate of the lower left corner of the (1,1) grid cell (map units) DOUBLE Y_ORIGIN: Cartesian Y-coordinate of the lower left corner of the (1,1) grid cell (map units) DOUBLE X_CELLSIZE: X-coordinate cell dimension (map units) DOUBLE Y_CELLSIZE: Y-coordinate cell dimension (map units) REAL VGTOP: model-top, for sigma vertical-coordinate types REAL VGLEVELS[0:NLAYS+1]: array of vertical coordinate level values; level 1 of the grid goes from vertical coordinate VGLEVELS[0] to VGLEVELS[1], etc. Per-Variable Metadata NAME VNAME[ NVARS ]: array of variable names NAME UNITS[ NVARS ]: array of units or 'none' LINE VDESC[ NVARS ]: array of array of variable descriptions INT4 VTYPE[ NVARS ]: array of variable types: M3BYTE = 1 M3INT = 4 M3REAL = 5 M3DBLE = 6 Additional attributes Not implemented at this time. Eventually: TBD, as necessary for the WRF extensions placed in I/O API Version 2.2. At this point, we anticipate that the implementation will be in terms of a sequence of <name-type-value> triplets Data Section sequence of time step records Time Step Header INT4 FLAGS[2,NVARS]: array of data-availability flags (with Fortran-style left-major, 1-based subscripting): FLAGS[1,V] are the dates for the data record, encoded YYYYDDD FLAGS[2,V] are the times for the data record, encoded HHMMSS FLAGS[1,V] and FLAGS[2,V] are in consecutive memory/disk locations. (NOTE: This amount of data is not functionally necessary; however, it is included for the historical reasons involving the convenience of visualization-system programmers.) Time step Contents: array of data records, subscripted by variable 1, ..., NVARS: <type> array of data for this variable and time step. Data is in native machine binary format.

Back to Contents

Previous Section: Variables and Layers and Time Steps

Up: Conventions

To: Models-3/EDSS I/O API: The Help Pages