Proposal: A Geo-Element File Type
in the Models-3/EDSS I/O API

Carlie J. Coats, Jr., Ph.D.
MCNC Environmental Programs
`carlie@jyarborough.com`

!! This document is under construction !!

Abstract
Geospatial Cell Complexes
Public Data Structures and Routines
Applicability to Geospatial and Finite Element Problems
Issues and Questions to be Resolved
Implementation Status
(dummy index entry)

Abstract

The intent is to extend the Models-3/EDSS I/O API by adding a new data type (and therefore a new file type), the geospatial-element cell complex (GECC) data type, that supports both geospatial and finite-element data in a way upwardly compatible with the existing API and its library. This extension significantly generalizes the geospatial metadata structure espoused by Butler et al, along lines discussed by John Ambrosiano and the author since 1996. The data structures involved are based upon foundational twentieth century work on the classification of geometric structure by topologists and geometers Whitney, Whitehead, Kervaire, Milnor, and Adams, and more recently Quillen and Sullivan, who proved that the data structures proposed are both necessary and sufficient to deal with problems of three dimensional geometric structure (and that the obvious generalization is adequate through six dimensions; Sullivan and Quillen give an explicit classification of its indeterminacy for problems of seven or greater dimensions).

GECC files have three sections:

A header, like current I/O API file headers;
A geometry specification, which specifies the geometric/topological structure and its georegistration;
A set of variables, which may be optionally time-stepped, and which live on a specified dimensional part of the geometry (i.e., each variable "lives" either on the vertices, the edges, the faces, or the 3-cells of the geometry.

There are new data structures introduced to allow for access, storage, and retrieval of these geometric specifications, as well as new API-methods GEOPEN3() (if necessary, depending upon decisions that need to be made (below)) to open or create GECC files, GEDESC3() to retrieve a file's geometric specification, and various utility methods to perform other geometry-related tasks. Existing I/O API routines READ3(), WRITE3(), INTERP3(), and DDTVAR3() would be extended for GECC-file data access, storage, and retrieval. The resulting API will be callable from at least Fortran 77 and 90, C, and C++. GECC-files offer direct support for polygon-based data such as is found in emissions (e.g., area sources) and in GIS-related coverages, without biasing the representation by artificial triangulation or reticulation, and without the extra storage overhead which would also ensue.

One aspect that deserves attention is dealing with the existing built-in I/O API layer structure: unless this is over-ridden, the geometry will automatically have a layer structure given by the built-in layer structure. One could argue that it would be simpler to ignore the 3-dimensional-cell description aspect and use purely the layer structure to extend the geometry to the vertical dimension. This would be sufficient for the vast majority of GIS-related geospatial data applications; however, it would fail to support data with irregular or only partially-layered vertical structures (like ocean models or irregular three-dimensional geological structures for ground water modeling) which would be significant at a later date. The recommendation is to do the fully three-dimensional cell-complex geometry descriptions and to support the I/O API layer structure, with the interpretation that when the number of layers is nontrivial and the number of higher-dimensional cells is zero (no 3-cells, etc.), then the generic I/O API layer structure is responsible for the vertical structure (giving "prismic geometry" for which all horizontal cross-sections are identical as a result). This would be useful, for example, for emissions point-source plume rise for which there is a vertical structure, but only at the vertices (there being no edges, faces, nor 3-cells). The one ambiguous interpretation problem is dealing with the case of a fully three-dimensional cell complex that also has a nontrivial vertical layer structure. Options are to:

prohibit it within the I/O API code itself;
forbid it as a matter of coding standards; or
allow it, on the chance that someone will find a useful reason for it as a case of data-structure abuse.

Back to Contents

Geospatial Cell Complexes

A Geospatial Element Cell Complex is a geometric data structure with the following components and relationships:

A set of nodes (or vertices), that have specified positions. Topologists will frequently call this set the 0-skeleton.
A set of edges (which are line segments). Topologists will frequently call this set the 1-skeleton.
A set of faces (which are polygons). Topologists will frequently call this set the 2-skeleton.
A set of cells (which are 3-dimensional polyhedral solids). Topologists will frequently call this set the 3-skeleton.
An edge::node boundary relation that specifies the beginning and ending nodes for each edge.
A face::edge boundary relation that specifies the (signed) set of edges which constitute the boundary of each face.
An cell::face boundary relation that specifies the (signed) set of faces which constitute the boundary of each cell.

The dimension of a GECC is given by the smallest nonempty skeleton: 0-dimensional if only the node-set is nonempty, 1-dimensional if the edge-set (and therefore the node-set) is nonempty. etc. In environmental modeling, we need to describe variables in terms of how the modeling algorithms interpret them: some variables are naturally thought of as being a property of the nodes; others of the edges, faces, or cells. In the current air quality models, for example, we think of concentrations as being cell-means (and so associated with the 3-skeleton), whereas we think of wind fields as being associated with corners (and so associated with the 0-skeleton). In emissions, area and biogenic sources will be associated with county-polygons, i.e., the 2-skeleton, etc.

All edges, faces, and cells are oriented, as is required to do line, surface, and volume integrals (and to make the relationships among them correct -- Green's, Stokes', Gauss' Theorems, etc.). This orientedness shows up in the boundary relations:

An edge has a starting node and an ending node. The boundary relation can be represented as a mapping of the set of edges into ordered pairs <starting-node,ending-node> of nodes.
A face has a "top" and a "bottom" as its specification of orientation, and it has some finite set of edges. The face and an edge in its boundary have matching orientations if the edge is oriented in the counterclockwise direction relative to the interior of the face when viewed from the top side of the face, and reverse orientations otherwise. The boundary relation can be represented as (sparse) incidence matrix, with rows subscripted by faces, columns subscripted by edges, with positive values for boundary edges with orientations matching the face, negative values for reversed orientation, and zeros for edges not on the boundaries of the respective faces.
A cell has a positive orientation if it has a right handed coordinate system with respect to the underlying map projection Cartesian coordinate system, and negative orientation otherwise. The boundary relation can be represented as (sparse) incidence matrix, with rows subscripted by cells, columns subscripted by faces, with positive values for boundary faces with matching orientations, negative values for reverse orientation, etc., much as for faces and edges.

Many of the applications anticipated in atmosphere-related environmental modeling will profit from a natural I/O API extension -- layered GECC's, for which the set of cells (and possibly the sets of faces or edges) is empty, and one is trying to model the Cartesian product structure of some lower-dimensional cell complex with a layered atmospheric structure. The most obvious application of this is in modeling point source plume rise, in which there is a natural 0-dimensional cell complex of point sources, and a variable (plume fraction) that "lives" on the atmospheric layer structure above the point sources. Arguably, one might improve plume-in-grid modeling by having either a 1-dimensional cell complex that is the union of the various plume-centerlines, a vertically layered atmospheric structure, and a Gaussian horizontal structure, or a two-dimensional horizontal GECC structure with a vertically layered atmospheric structure. Both cases require at least time-varying geometry (with time-independent topology), as discussed in the section on issues, below.

There are additional geometric-utility routines that might be desirable, including routines for at least the following tasks:

Find the Centroids, areas, volumes.. of cells/faces/edges.
Find the intersection-complex of a pair of GECCs or of a GECC and a regualr grid.
Find the common-boundary sub-complex for a pair of neighboring cells/faces/edges.
Construct the Poincare dual complex to a particular GECC. (This is a homeomorphic complex, with one node for each cell, one edge for each face, one face for each edge, and one cell for each node. Note that in the usual 2-D gridded situation, the "dot-point grid" is the Poincare dual of the "cross-point grid")
Others to be determined (below)...

Back to Contents

Public Data Structures and Routines

NOTE: details of these will vary, dependent upon certain choices/design decisions that need to be made (below).

New parameter tokens for include files PARMS3.EXT and parms3.h:
GEODAT3 as a file type, to indicate a GECC-file.
GESKEL0 to indicate in GDESC3.EXT (etc., below) that the corresponding variable in GECC-file lives on the 0-skeleton of the GECC.
GESKEL1 to indicate that the corresponding variable in GECC-file lives on the 1-skeleton of the GECC.
GESKEL2 to indicate that the corresponding variable in GECC-file lives on the 2-skeleton of the GECC.
GESKEL3 to indicate that the corresponding variable in GECC-file lives on the 3-skeleton of the GECC.
New COMMON GDESC3 either in include file FDESC3.EXT or in new include file GDESC3.EXT (tbd, below), and new typedef IOAPI_GDesc3 and its memory-layout-compatible definition as a struct in iodecl3.h. Fields in this COMMON or struct include
NVERT3D the number of elements in the 0-skeleton (the set of nodes/vertices).
NEDGE3D the number of elements in the set of edges
NFACE3D the number of elements in the set of faces
NCELL3D the number of elements in the set of cells
N2BDY3D the number of nonzero entries in the (sparse) faces::edges boundary relation.
N3BDY3D the number of nonzero entries in the (sparse) cells::faces boundary relation.
VSKEL3D(MXVARS3) Token value GESKEL[0-3] to indicate which skeleton each variable lives on.
VSTEP3D(MXVARS3) Token value 0 or TSTEP3D to indicate that a variable is time independent or time stepped. See (below).

One reason this should be in a new COMMON is to ensure that bad attempts at configuration, with incompatible INCLUDE-files and/or libraries, will either fail to compile or fail to link, instead of having obscure and difficult-to-debug run-time failures.
Sparse matrix data structures for the boundary relations defined as follows, where NEDGES, NFACES, NCELLS, N2BDY, and N3BDY are the numbers of edges, faces, cells, size of the sparse face::edge boundary relation, and size of the cell::face boundary relation, respectively) :
EDGEBDY(2,NEDGES) is an INTEGER array containing the vertex subscripts for the starting and ending nodes of each edge. Note that in this case there are always exactly two boundary vertices for each edge.
FACEDEX(NFACES) is an INTEGER array containing the number of boundary-edges for each face.
FACEBDY(N2BDY) is an INTEGER array containing the plus or minus the edge subscripts (according to whether the edge occurs with positive orientation or negative orientation within the boundary) for the boundary-edges of the faces, in serialized consecutive order: FACEBDY(1:FACEDEX(1)) contains the boundary edges for the first face, FACEBDY(FACEDEX(1)+1:FACEDEX(1)+FACEDEX(2)) the boundary edges for the second face, etc.
CELLDEX(NCELLS) has the same role for the cell::face boundary relation that FACEDEX does for the face::edge boundary relation.
CELLBDY(N3BDY) has the same role for the cell::face boundary relation that FACEBDY does for the face::edge boundary relation.

GEOPEN3() and geopen3c() are the Fortran and routines used instead of OPEN3() to open/create GECC-files. It has extra arguments for the boundary relations (and possibly for the node-position variables, depending upon decisions described below). COMMON GDESC3 or the extra IOAPI_GDesc3-pointer argument must be correctly filled in, if the mode of opening is "new", "unknown", or "truncate" and thus may require file creation or consistency checking.

NOTE: For GEODAT3 files which already exist, one may also use OPEN3() or open3c() to open them in modes FSREAD3, FSRDWR3.

Fortran and C Usages:

              LOGICAL FUNCTION  GEOPEN3( FNAME,
     &                                   EDGEBDY, 
     &                                   FACEDEX, FACEBDY, 
     &                                   CELLDEX, CELLBDY,
     &                                   FSTATUS, PGNAME )
              CHARACTER*(*)  FNAME
              INTEGER        EDGEBDY( 2, NEDGE3D )
              INTEGER        FACEDEX( NFACE3D )
              INTEGER        FACEBDY( N2BDY3D )
              INTEGER        CELLDEX( NCELL3D )
              INTEGER        CELLBDY( N3BDY3D )
              INTEGER        FSTATUS    !  FSREAD3, FSRDWR3, FSUNKN3, etc.
              CHARACTER*(*)  PGNAME
              ...
              int geopen3c( const char * FNAME, 
                            const IOAPI_Bdesc3  * bdesc ,
                            const IOAPI_Cdesc3  * cdesc ,
                            const IOAPI_GEdesc3 * gdesc ,
                            const int    EDGEBDY[][2],
                            const int    FACEDEX[],
                            const int    FACEBDY[],
                            const int    CELLDEX[],
                            const int    CELLBDY[],
                            int          STATUS,
                            const char * PNAME ) ;

GEDESC3() and gedesc3c() return as arguments sparse-matrix arrays containing the boundary relations (and possibly the node-positions, depending upon decisions described below) for the geometry from the file header. It is the responsibility of the caller to allocate these arrays before the call.

Fortran and C Usages:

              LOGICAL FUNCTION GEDESC3( FNAME,
              &                         EDGEBDY,
              &                         FACEDEX, FACEBDY,
              &                         CELLDEX, CELLBDY )
              CHARACTER*(*)  FNAME
              INTEGER        EDGEBDY( 2, NEDGE3D )
              INTEGER        FACEDEX( NFACE3D )
              INTEGER        FACEBDY( N2BDY3D )
              INTEGER        CELLDEX( NCELL3D )
              INTEGER        CELLBDY( N3BDY3D )
              ...
              int gedesc3c( const char * FNAME, 
                            int          EDGEBDY[][2],
                            int          FACEDEX[],
                            int          FACEBDY[],
                            int          CELLDEX[],
                            int          CELLBDY[] ) ;

When called on a GECC-file, DESC3() fills in the COMMONs in both FDESC3.EXT and GDESC3.EXT.
There is an issue here about the C routine desc3c(), because this would need both FDESC-pointer and GDESC3-pointer arguments, changing the routine's signature... this probably means that we need an extra C routine for the GECC-file description-task
;-(
Other C bindings are unchanged.
:-) !!
Additional new geometric-utility routines and tools (tbd, below...)
Extensions of existing analysis and visualization tools:
- PAVE
- Models-3 Vis Tool
- m3stat
- m3diff
- m3xtract
- m3tshift
- m3cple
- mtxcple

Back to Contents

Applicability to Geospatial and Finite Element Problems

(Optionally Time-stepped) Geographic Coverage Representation (something which traditional GIS systems do not do well, since they have a "flat database" view of coverages without any temporal structure.
Finite Element Modeling: Supports multiple finite element schemes with various different kinds of element decompositions/triangulations/griddings, and with shape functions that "live" on the various skeletons. Because of the generality, there may be more set-up overhead than with data structures closely tailored for the specifics of particular finite element schemes, but this should still offer excellent performance for the time-dependent part of processing.
Emissions Modeling:
- Area source emissions very naturally fits as an application of 2-dimensional GECC files., for which the primary polygons are the relevant counties.
- Biogenic emissions also is an application of 2-dimensional GECC files, where the polygons are either counties or land-cover tracts (depending upon the nature of the underlying land-cover inventory).
- Mobile source emissions is naturally an application of mixed 1,2-dimensinal GECC files, with emissions living on either links or counties.
- Point source emissions live naturally on 0-dimensional GECC files.
- Point source plume rise is naturally an application of layered 0-dimensional GECC files.
Plume in Grid Modeling: The plumes are layered two-dimensional (effectively three-dimensional) GECC's with time dependent geometry (the cell complex having a down-plume structure, a cross-plume structure, and also a layer structure inherited from the atmospheric grid model).
Stream-Network Flow Modeling: with a one-dimensional tree-connected cell complex with one link for each significant stream-reach.
Hydrology/Runoff Modeling: with a two-dimensional cell complex composed of hill-slope patches that have specific drainiage characteristics.
"Mesa-layered" models (for which there are voids in the grid or layer structure). Examples include ocean circulation models and the ETA met model.
Groundwater models, especially ones with irregular two- or three-dimensional geological structures.
(dummy list item)

Back to Contents

Issues and Questions to be Resolved

Put the GEDESC3 COMMON into the existing FDESC3.EXT include-file, or not?
My inclination is to say that it should be put into the same INCLUDE file, simplifying the API for the modelers that use it, particularly since it is used for exactly the same purpose, and in the same OPEN3() and DESC3() calls as that file.
How much flexibility do we give to the variables? We can potentially allow individual variables to be time-stepped or time-independent... do we allow time-stepped variables to have different time steps?
(Almost certainly this would cause extra netCDF fill-overhead and probably break the internal paradigm of some of our visualization tools; just the choice of "this is time independent" versus "this uses the file's time step" on a variable-by-variable basis would be relatively clean to implement and use, would not incur extra netCDF overhead, and not break vis tools.)
Given the time-stepping flexibility above, do we make the node-position variables user-level variables, or do we insist that they be part of the geometry data structure, and so specified during the GEOPEN3() call?
I favor the choice of allowing individual variables to be either time-stepped with the file-specified time step, or time-independent: this deals cleanly with the situation of cleanly allowing 2-dimensional GECCs to require only two node-position variables, whereas 3-dimensional GECCs need three. It also cleanly handles the question: "What are the units for the node position variables". At the same time, it allows both easy implementation and additional flexibility which may prove useful to support additional time-independent attributes of GECCs.
Do we need also to support some version of time varying geometry? If so, there are two options, a simpler and weaker notion that is much easier to deal with from both the I/O API implementation and the user/modeler points of view, and a more-complex alternative notion:
Time-independent topology where only the (node) positions change, but the sets of nodes, faces, edges, and cells, and the boundary relations among them are time-independent. For this case, INTERP3() make sense as an I/O API call. This choice can be implemented by simply allowing the node-position variables to be either time-stepped or time independent, as the user selects at file-creation time.
Time-dependent topology in which the entire geometry can change from time step to time step (and for which the modeler reading data must first make a geometry inquiry, then allocate appropriate buffers, and finally read data into them. The write-operation also requires a separate call, for which both the geometry and the data are arguments. Note that INTERP3() does not make sense as an I/O API call in this case, because there is no topology on which to define it.
I recommend that we not implement this option at first; if we later decide that it is necessary, then we should implement it as an additional and different file type (Time Stepped Cell Complex, or TSCC files?), using similar data structures and employing the lessons learned with the time-independent topology GECC files.
How do we handle text based metadata, such as names and descriptions for nodes, edges, faces, or cells?
What additional geometry-related routines are needed?
Examples of these might include routines to compute:
Geometry-construction is apt to be a rather tedious and detailed task, particularly for the irregular geometry cases. What support tools do there need to be in order to:
- Construct the geometry for the various cases.
- Import data from "foreign" databases and inventories.
- Compute GECC-to grid, grid-to-GECC, and GECC-to-GECC sparse transform matrices to use with general I/O API transform program MTXCPLE, etc.
- Do what other tasks?
(dummy list item)

Back to Contents

Implementation Status

The following INCLUDE files are affected:

Public Fortran and C INCLUDE files PARMS3.EXT and parms3.h: new parameters
- GEODAT3: new file type token parameter "geospatial-element cell complex"
- GESKEL0: new description-token parameter"variable lives on the 0-skeleton in a GEODAT3-file"
- GESKEL1: new description-token parameter"variable lives on the 1-skeleton in a GEODAT3-file"
- GESKEL2: new description-token parameter"variable lives on the 2-skeleton in a GEODAT3-file"
- GESKEL3: new description-token parameter"variable lives on the 3-skeleton in a GEODAT3-file"
STATUS:: Coded
Public Fortran and C INCLUDE files FDESC3.EXT and fdesc3.h:: new COMMON GDESC3 with data structures to use describing/defining GEODAT3-files:
- NVERT3D: number of vertices in the geomety of the cell complex.
- NEDGE3D: number of edges in the geomety of the cell complex.
- NFACE3D: number of faces in the geomety of the cell complex.
- NCELL3D: number of 3-cells in the geomety of the cell complex.
- N2BDY3D: number of elements in the face::edge boundary relation of the cell complex.
- N3BDY3D: number of elements in the cell::face boundary relation of the cell complex.
- VSKEL3D( MXVARS3 ): skeleton on which each variable "lives": values are token parameters GESKEL[0-3]
- VSTEP3D( MXVARS3 ): per-variable time step. Should be either zero or TSTEP3D.
- typedef struct{...} IOAPI_GEdesc3 to provide a data structure definition to use for GEODAT3 geometry for use by C programs.
STATUS:: Coded
Public Fortran and C INCLUDE files IODECL3.EXT and iodecl3.h: new public I/O API routines declared:
- GEDESC3()
- GEOPEN3()
STATUS:: Coded
Private Fortran INCLUDE file STATE3.EXT: new COMMON GSTATE3 with data structures to use accessing/managing GEODAT3-files:
- NVERT( MXFILE3 ): number of vertices in the geomety of the cell complex.
- NEDGE( MXFILE3 ): number of edges in the geomety of the cell complex.
- NFACE( MXFILE3 ): number of faces in the geomety of the cell complex.
- NCELL( MXFILE3 ): number of 3-cells in the geomety of the cell complex.
- N2BDY( MXFILE3 ): number of elements in the face::edge boundary relation of the cell complex.
- N3BDY( MXFILE3 ): number of elements in the cell::face boundary relation of the cell complex.
- VSKEL( MXVARS3, MXFILE3 ): skeleton on which each variable "lives": values are token parameters GESKEL[0-3]
- VSTEP( MXVARS3, MXFILE3 ): per-variable time step. Should be either zero or TSTEP3D.
- GINDX( 5, MXFILE3 ): netCDF ID's for the private variables used to implement the boundary relations.
STATUS:: Coded

The following Fortran routines are affected:

Public Fortran routine DDTVAR3: support for GEODAT3 file operation.
STATUS:: Coded
Public Fortran routine DESC3: support for GEODAT geometry descriptions.
STATUS:: Coded
Public Fortran routine INIT3: changes in internal state-variable initializations.
STATUS:: Coded
Public Fortran routine INTERP3: support for GEODAT3 file operation.
STATUS:: Coded
Public Fortran routine READ3: support for GEODAT3 file operation.
STATUS:: Coded
Public Fortran routine WRITE3: support for GEODAT3 file operation.
STATUS:: Coded
Public Fortran routine CHECK3: support for GEODAT3 file operation.
STATUS:: Coded
Existng private Fortran routines
- CRTFIL3
- INTERP3V
- OPNFIL3
- OPNKF
- RDBUF3
- UPDTVIR3
- WRBUF3
STATUS:: Coded
New private Fortran routines
- WRGEODAT
- RDGEODAT
STATUS:: Coded

The following C bindings are affected:

New C routine geopen3c(): C wrapper around public Fortran GEOPEN3(), used to open files of type GEODAT3.
STATUS:: Coded
New C routine gedesc3c(): C wrapper around public Fortran GEDESC3() used to return GECC geometry specification (boundary relations, etc.) for files of type GEODAT3.
STATUS:: Coded
?? desc3c():
STATUS:: under way... is it proper to require a change to the call interface to desc3c(), thus breaking existing C code that depends upon it? Otherwise, we need to invent a new routine that does this task specifically for GEODAT3 files.
STATUS:: needs design decision...
Others?: The entire remainder of the C bindings should be unaffected, by virtue of properly modular design and implementation.

Extensions of existing analysis and visualization tools:

PAVE
Models-3 Vis Tool
m3stat
m3diff
m3xtract
m3tshift
m3cple
mtxcple

STATUS:: not yet started.

GECC-related geometric utility routines:
STATUS:: !! TBD -- open issue !!

GECC-related geometric tool/support programs:
STATUS:: !! TBD -- open issue !!

Back to Contents

dummy section

Back to Contents

Send comments to

Carlie J. Coats, Jr.
carlie@jyarborough.com

Proposal: A Geo-Element File Type in the Models-3/EDSS I/O API

Carlie J. Coats, Jr., Ph.D. MCNC Environmental Programs carlie@jyarborough.com

Proposal: A Geo-Element File Type
in the Models-3/EDSS I/O API

Carlie J. Coats, Jr., Ph.D.
MCNC Environmental Programs
`carlie@jyarborough.com`