I/O API Troubleshooting

About Fortran error-status codes
gfortran version 10 and lots of spurious warning messages
CMAQ, SMOKE, and INCLUDE files.
Errors after retrofitting MODULE M3UTILIO
Instruction... not supported or Program Exception—illegal instruction issues
Seg-faults
Missing-symbol issues
Compiler/system-library compatibility issues
netCDF Version 4 issues
gfortran build issues
Warning messages with v16 or later Intel compilers
"Internal compiler error" Problems
Link errors with pgf90 and ifort on Linux
relocation error link issues for x86_64
gcc/g77 on x86_64 Problems
IRIX 7.4 Problems
netCDF Error Troubleshooting
netCDF Error Numbers list
Other Problems

Back to the I/O API User Manual

NOTE

If you run into troubles with I/O API related programs, it is useful to know the versions of all the software components. The CVS-related program ident can report to you versioning keywords in the various components of (binary) object, library, or executable files. For example, I can run the following sequence of commands on my desktop machine to find out the versioning information of various binary components:
% cd $HOME/apps/$BIN
% ident init3.o
    —reports the INIT3 version:  init3.F 87 2015-01-07 17:37:58Z coats
% ident libioapi.a
    —reports the INIT3 and M3UTILIO versions
% ident m3stat
    —reports the INIT3, M3UTILIO, and netCDF versions
    
Each I/O API source file will have its version embedded in the file's header-comment, e.g.
    !! Version "$Id: ERRORS.html 251 2023-03-28 20:44:27Z coats $"
    

About Fortran error-status codes

I/O error status numbers are compiler specific, so one needs to know what the underlying compiler is, and search for its error-code list (Google for the compiler and quot;Fortran runtime error codesquot;).
Note that for CMAQ, mpif90 is actually a script "wrapping" around conventional Fortran+C compilers; this script quot;knowsquot; which libraries and which include-files to use, so you are highly dependent upon the underlying compiler and need to search for its error-codes.

`gfortran` version 10 and lots of spurious warning messages

This version of gfortran takes a particularly idiosyncratic interpretation of the (latest) Fortran-2018 Standard.
AS of July 12, 2020, the relevant ioapi/Makeinclude.${BIN} files have been modified to add Fortran compile-flag
-std=legacy
so that this interpretation does not cause a compile-error.
However, using this compiler version will cause the generation of a huge number of spurious warning-messages, as the compiler is still trying to enforce its version of the Fortran-2018 (not Fortran-90, not Fortran-95, not Fortran-2008) Standard.
Thanks to Mrs. Indumathi S Iyer, (SO/D), BARC, for pointing out this compiler-problem and help with testing the fix.—CJC

Back to "Troubleshooting" Contents

Errors after retrofitting `MODULE M3UTILIO`

Since MODULE M3UTILIO itself INCLUDEs the standard I/O include-files and also has INTERFACE-blocks for (almost all of) the public I/O API functions, when you retrofit USE M3UTILIO into an old code, you must remove these INCLUDE-statements and declarations and EXTERNAL statements for the public I/O API functions. If you missed some of these, you may see compile errors like the following
...
/home/coats/ioapi-3.2/ioapi/PARMS3.EXT(66): error #6401: The attributes of this name conflict with those made accessible by a USE statement.   [NAMLEN3]
...
/home/coats/ioapi-3.2/m3tools/m3tproc.f90(102): error #6401: The attributes of this name conflict with those made accessible by a USE statement.   [GETNUM]
...
    
or
...
Error: Symbol 'getnum' at (1) conflicts with symbol from module 'm3utilio', use-associated at (2)
...
    
or... Back to "Troubleshooting" Contents
To fix these errors, remove the corresponding INCLUDE-statements, function-declarations, and EXTERNAL statements.

Back to "Troubleshooting" Contents

"Instruction... not supported" and "Program Exception—illegal instruction" issues

Thanks to Christopher G. Nolte, Ph.D., US EPA Office of Research and Development for his M3USER mailing-list comments this one.
Problem: at run-time, messages like
Please verify that both the operating system and the processor support Intel® X87, CMOV, MMX, FXSAVE, SSE, SSE2, SSE3, SSSE3, SSE4_1, SSE4_2, MOVBE, POPCNT, F16C, AVX, FMA, BMI, LZCNT and AVX2 instructions.
or
Program Exception - illegal instruction

This is probably the result of compiling either the library or the model (or both) for a different processor-model than you are running it on.
Starting with the Pentium II processor (1997), successive generations of Intel processors have introduced more and more powerful vector-style instructions (MMX, SSE, SSE2, SSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX-512, ...) that can substantially speed up array-style calculations (including, particularly, the I/O API INTERP3()). Note that each processor generation does support all the previous generations of instructions (but not, of course, vice versa).
Well designed modeling codes will get approximately a 20-25% performance boost for using SSE4.2 instructions, a further 70-80% boost for AVX, and a further 25-30% for AVX2. Because of its sloppy coding, WRF will get less than half that much speedup, and CMAQ even less than that (due to the fact that these codes are so bottlenecked by main-memory operations that improving the arithmetic doesn't help much) . Note that Intel and AMD have also improved the memory systems of the various processor generations, giving a further 5-10% performance boost per processor generation for that reason (independent of which instruction set you're using). In fact, the degree of speedup is a good measure of how well array based calculations are coded: good CFD applications will typically get an AVX speedup factor of about 1.8, whereas the (more poorly-coded) WRF gets only about 1.3 (which can be improved substantially by re-coding the advection and diffusion routines to be less memory-system-hostile).
See https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions and https://en.wikipedia.org/wiki/Advanced_Vector_Extensions for more information about the SSE and AVX families of new instructions.
On a Linux system, you can see what instructions are supported by running the following at the command line
cat /proc/cpuinfo
and then looking at the flags sections for the instructions listed below.
Use of these instructions is typically governed by command-line directives given to the compiler; different compilers use different flags to govern this, and have different defaults. GNU and Intel compilers typically default to SSE3; PGI compilers typically default to the instruction set for the processor on which the compiler itself is being run. See your compiler's documentation on how to control this. Some examples are:

Intel ifort/icc:
-x... directives:
-xHost: Use all the instructions for this machine
-xSSE4.2: Nehalem or later
-xAVX: SandyBridge or later
-xAVX2: Haswell or later
-xCORE-AVX512: Skylake-X or later

GNU gfortran/gcc
-march=... -mtune=... directives: the first of these governs instruction set use; the second controls how the optimizer uses it
-march=native -mtune=native: this machine's architecture
-march=corei7 -mtune=corei7: Nehalem or later (SSE4.2)
-march=corei7-avx -mtune=corei7-avx: SanyBridge or later (AVX)
-march=corei7-avx2 -mtune=corei7-avx2: Haswell or later (AVX2)

Portland Group ifort/icc
Default is this machine's architecture (dangerous if you have multiple different-generation machines!)
-tp=nehalem: Nehalem or later (SSE4.2)
-tp=sandybridge: SanyBridge or later (AVX)
-tp=haswell: Haswell or later (AVX2)

Recent Intel processors and their instruction sets

Nehalem (2008)
MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2
Sandy Bridge (2011)
MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX
Ivy Bridge (2012)
MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX
Haswell (2013)
MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, FMA3
Broadwell (2015)
MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, FMA3
(XEON server-processor) Skylake (2015)
MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, FMA3
(XEON server-processor) Skylake-X (2017)
MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX-512, FMA3
Kaby Lake (2017)
MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, FMA3
Coffee Lake (2018)
MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, FMA3

Back to "Troubleshooting" Contents

"Segmentation-Fault" Issues with `m3tools` programs

Generally, you may need to run these programs with
    limit stacksize unlimited
    
since they allocate scratch-variables "off the stack" (as is the usual/recommended practice in Fortran-90). You may possibly also need
    limit memoryuse unlimited
    

Back to "Troubleshooting" Contents

"Missing Symbol" issues

The system-utility nm is very useful for this category of problems; it can be used to list all of the symbols in a library .a, object .o or executable. When they contain machine-code for a routine, it will show up with its linker-name (as opposed to source-code name), with a U for each use and a U for the routine's definition. For example, to find out about OPEN3 in libioapi.a:
    nm $io/../$BIN/libioapi.a | grep -i open3
                 U open3_
                 U open3_
                 U open3_
                 U open3_
                 U open3_
open3.o:
0000000000000000 d open3.firstime_
0000000000000000 T open3_
open3c.o:
                 U open3_
0000000000000000 T open3c
    
says that OPEN3 is defined in open3.o and used 6 other times (including in open3c.o)
Generally, missing symbols with kmp, omp or openmp as parts of their name indicate that you may have compiled the I/O API with OpenMP parallelism enabled, but are not linking your program accordingly. Look in the relevant ioapi/Makeinclude.$BIN for the make-variables OMPFLAGS and OMPLIBS to see what you need to add to your program's Makefile.
Many other missing symbols (especially with nf_ or nc in them) are related to netCDF-library issues (and to the libraries which netCDF assumes); see the section on netCDF Version 4 issues.

Back to "Troubleshooting" Contents

Compiler/system-library compatibility issues

In general, you are best off if you can build the whole modeling system (libnetcdf.a, libpvm3.a, libioapi.a, and your model(s) CMAQ, SMOKE, etc. with a common compiler set and common set of compile-flags. When this is not done, there are a number of compatibility issues with mixed compiler sets, and with the GNU 3.x-4.x compiler set these get worse. Some of these problems show up at link time; others at run-time. In particular, the following are known to have problems:

Linux-distribution-supplied libnetcdf.a rarely works with CMAS-supported compiler sets. It is best to build your own netCDF library, with the same compilers and compiler-flags as your libioapi.a and models.

NetCDF Versions 4.x have lots of changes; see this note about it in the build instructions.

Compiler-version to compiler-version library troubles. These are known to happen in particular between versions for the Sun and Intel compiler sets. It is likely an issue with mixed GNU 3.x and 4.x systems as well.

Link errors with pgf90 and ifort on Linux (below)
The following are not relevant for I/O API-3.0 or later, since Fortan-77 support has been dropped:

Builds with mixed g77, g95, and/or gfortran: these seem to link correctly, but Fortran I/O gets messed up because they use different unit-number systems behind the scenes and give you troubles. Thanks to Erick Jones, BSI, for this one.

Builds with mixed f77 f90 on various systems including Sun and SGI (troubles similar to the above...)

Builds with mixed f77 f90 on various systems including Sun and SGI (troubles similar to the above...)

Back to "Troubleshooting" Contents

Warning messages with recent Intel compilers

Starting with their Version 16 compilers, Intel has introduced a new compiler directive -qopenmp to enable OpenMP, and has deprecated the previous -openmp. This previous-version flag now results in a "deprecated flag" warning from the compiler. Changing the Makeinclude.*ifort* to match this compiler-change can eliminate this compile-warning for the latest set of Intel compilers at the cost of making, for example, makes, Makeincludes, etc. incompatible with Intel-15 or earlier ones.

Back to "Troubleshooting" Contents

"Internal compiler error" Problems

At least some versions of the Intel compilers icc and ifort cannot handle the internal complexity of some routines (usually iobin3.c) when compiling with full optimization: one will see error messages like the following when running make for the I/O API (where I've used backslashes to fold the compile-line to make it readable):
cd /nas01/depts/ie/cempd/apps/CMAQ/v5.1/Linux2_x86_64ifortopenmpi;     \
  icc -c -DIOAPI_PNCF=1 -DAUTO_ARRAYS=1 -DF90=1 -DFLDMN=1 -DFSTR_L=int \
  -DIOAPI_NO_STDOUT=1 -DAVOID_FLUSH=1 -DBIT32=1  -O3 -traceback -xHost \
  -DVERSION='3.2-nocpl' /nas01/depts/ie/cempd/apps/CMAQ/v5.1/ioapi/iobin3.c
/nas01/depts/ie/cempd/apps/CMAQ/v5.1/ioapi/iobin3.c(1111) (col. 29): internal error: 0_1529

compilation aborted for /nas01/depts/ie/cempd/apps/CMAQ/v5.1/ioapi/iobin3.c (code 4)
make: *** [iobin3.o] Error 4
    
A measure that generally works is to re-do the last compile-command manually, but with a lower optimization, and then re-do the make. It is useful to cut-and-paste the last command into a sub-shell (enclosing the command by parentheses), with the "-O3" eliminated or reduced to "-O", as in the following example:
( cd /nas01/depts/ie/cempd/apps/CMAQ/v5.1/Linux2_x86_64ifortopenmpi;   \
  icc -c -DIOAPI_PNCF=1 -DAUTO_ARRAYS=1 -DF90=1 -DFLDMN=1 -DFSTR_L=int \
  -DIOAPI_NO_STDOUT=1 -DAVOID_FLUSH=1 -DBIT32=1 -traceback -xHost      \
  -DVERSION='3.2-nocpl' /nas01/depts/ie/cempd/apps/CMAQ/v5.1/ioapi/iobin3.c )
! make
    
This trick is also useful when trying to do highly-optimized builds of other models that contain large, complex routines (WRF, CMAQ, ...)

Back to "Troubleshooting" Contents

Link errors with `pgf90` and `ifort` on `Linux`

General Principle: various Fortran compilers "mangle" subroutine names (etc.) for the linker in various ways.
Note: 64-bit mode under Linux adds further issues.
Note added 2/24/2009:: Aparna Vemuri of EPRI reports troubles with recent gcc compiler systems and Portland Group pgf90: the gcc Fortran name mangling system has changed, requiring a change in compile flags. For mixed pgf90/gcc builds, one can either remove the -Msecond_underscore flag from FOPTFLAGS in the Makeinclude.Linux2_x86_64pg_gcc* or else change the line CC = pgcc to CC = gcc in the Makeinclude.Linux2_x86_64pg_pgcc* files. These modifications have been made to the 2/24/2009 release of the Makeinclude.Linux2_x86_64pg_gcc*, with the older flags commented out, for use by those who need them.
In particular, Gnu Fortrans (g77 and g95) have different name mangling behavior than is the default with Portland Group pgf90. Vendor supplied NetCDF librararies libnetcdf.a always use the Gnu Fortran conventions, and as such are incompatible with the default compilation flags for SMOKE or CMAQ. For the Linux/Portland Group/SMOKE or CMAQ combination, you have two choices:

Use the vendor supplied libnetcdf.a and default I/O API build, but fix the SMOKE or CMAQ compile flags, using ioapi/Makeinclude.Linux2_x86pg_gcc* as your guide; or
Build libnetcdf.a from scratch for yourself, using compile flags compatible with your SMOKE or CMAQ build; build the I/O API using ioapi/Makeinclude.Linux2_x86pg_pgcc*; and use these libraries.

This Portland Group inconsistency is exactly why the I/O API is supplied with multiple /Makeinclude.Linux2_x86pg* files in the first place... Note that the I/O API supplies a script nm_test.csh and a make target
make nametest
to help you identify such problems.

Back to "Troubleshooting" Contents

`gcc/g77` on `x86_64` problems

Added 4/4/2005
Internal compiler errors have shown with gcc/g77 on the some Linux distributions for x86_64, particularly with Fedora Core 3 and Red Hat Enterprise Linux Version 3 for x86_64: the symptom is a sequence of messages such as the following:
error: unable to find a register to spill in
class `AREG'
/work/IOAPI/ioapi/currec.f:93: error: this is the insn:
(insn:HI 145 171 170 8 (parallel [
            (set (reg:SI 3 bx [95])
                (div:SI (reg/v:SI 43 r14 [orig:67 secs ] [67])
                    (reg/v:SI 2 cx [orig:68 step ] [68])))
            (set (reg:SI 1 dx [96])
                (mod:SI (reg/v:SI 43 r14 [orig:67 secs ] [67])
                    (reg/v:SI 2 cx [orig:68 step ] [68])))
            (clobber (reg:CC 17 flags))
        ]) 264 {*divmodsi4_cltd} (insn_list:REG_DEP_ANTI 92
(insn_list:REG_DEP_OUTPUT 91 (insn_list 140 (insn_list 84
(insn_list:REG_DEP_ANTI 139 (nil))))))
    (expr_list:REG_DEAD (reg/v:SI 43 r14 [orig:67 secs ] [67])
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (expr_list:REG_UNUSED (reg:SI 1 dx [96])
                (nil)))))
...confused by earlier errors, bailing out
    
A workaround is to weaken architecture/optimization flags for binary type Linux2_x86_64 as described above to get around this compiler bug -- eliminating the -fschedule-insns and -march=opteron optimization flags from "Makeinclude.Linux2_x86_64" will tend to get rid of the problem. Note that this same compiler bug will bite you when trying to build lots of other stuff (TCL/TK, plplot, NCAR graphics), on FC3/gcc/g77 systems, and the same fix seems to work for many other problems as well.

Back to "Troubleshooting" Contents

`relocation error` link issues for `x86_64` Linux

If sizes of individual arrays or of COMMON blocks exceed 2GB on the x86_64 platforms, Intel ifort and icc will give you failures, with messages about relocation errors at link-time. The problem is that the default "memory model" doesn't support huge arrays and huge code-sets properly. The "medium" memory model supports huge arrays, and the "medium" memory model supports both huge arrays and huge code-sets. To get around this, you will need to add
-mcmodel=medium -shared-intel
to your compile and link flags (for the medium model), and then recompile everything including libioapi.a and libnetcdf.a using these flags. Note that this generates a new binary type that should not be mixed with the default-model binaries. There is a new binary type BIN=Linux2_x86_64ifort_medium for this binary type, and a is a sample Makeinclude file for it, to demonstrate these flags:
Makeinclude.Linux2_x86_64ifort_medium

Other compilers and other non-Linux x86_64 platforms will have similar problems, but the solutions are compiler specific.

Back to "Troubleshooting" Contents

IRIX 7.4 Problems

Added 12/18/2003
SGI F90 compiler-flag problems: It seems that SGI version 7.4 and later Fortran compilers demand a different set of TARG flags than do 7.3.x and before. For example, for an Origin 3800 (where hinv reports
24 400 MHZ IP35 Processors CPU: MIPS R12000 Processor Chip Revision: 3.5 ...
one would use the following sets of ARCHFLAGS compiler flags in Makeinclude.${BIN} with the different Fortran-90 compiler versions:

-TARG:platform=ip35,processor=r12000 for 7.3.x and before
-TARG:platform=ip35 -TARG:processor=r12000 for 7.3.x and before

There are a number of problems with both the I/O API and netCDF with the newer (version 7.4) SGI compilers:
Added 12/18/2003
SGI claims to have fixed this in the latest patch for F90 version 7.4.1 (bug # 895393); I haven't had time to test it yet, though. -- CJC

NetCDF and IRIX 7.4 compilers:
Experience indicates that the IRIX 7.4 compilers will not correctly build the netCDF library used by the I/O API. Although the make seems to succeed on that platform, make test fails almost immediately; attempts to use the libnetcdf.a that was built will also lead to program crashes.
At present, the only workaround we have is to use a libnetcdf.a built using IRIX 7.3 or earlier compilers.

I/O API and IRIX 7.4 f90:
The IRIX 7.4 f90 compiler refuses to recognize industry-standard practice for linking BLOCK DATA subprograms from libraries. For the upcoming I/O API Version 3, we have put into place a workaround-hack that puts a conditionally-compiled non-Fortran-conforming SGI-only
CALL INITBLK3
at the start of subroutine INIT3.
The IRIX 7.4 f90 compiler also thoroughly mangles the buffering of log-output in ways that we have not yet managed to decipher completely, much less repair. The outcome is that log output will show up in scrambled order. (Note that industry-standard mapping of WRITE(*,...) onto unbuffered UNIX standard output still happens with version 7.3 and must be preserved, but fails with version 7.4.)

Back to "Troubleshooting" Contents

NetCDF Error Troubleshooting

Multiply defined symbol nf_get_var_int64_ (etc.) errors on program builds:
Some configurations of netCDF-4 support INTEGER*8 (64-bit integer) variables, and some don't. I/O API-3.2 and later attempt to support these when they are available, and have to provide "hacks" when they're not. To detect netCDF-4 INTEGER*8 support:
nm libnetcdff.a | grep nf_get_var_int64_
If this turns up a result, then you need to add the definition -DIOAPI_NCF4=1 to the make-variable ARCHFLAGS in your MAKEINCLUDE.${BIN}. Otherewise, you will get "multiply defined symbol" errors when you attempt to compile programs.

NetCDF Error Troubleshooting Generalities:
All the netCDF "magic numbers" are defined in the I/O API NETCDF.EXT file (which is the I/O API name for the file netCDF calls src/fortran/netcdf.inc and also (for I/O API-3.2) in the modncfio.F90: look for parameters nf_noerr, etc. Errors defined in netCDF 2.x have positive values in the range 1...32 (except for NCSYSERR which is -1); errors newly defined for netCDF 3.x are in the range -60...-1. General methodology: find the error-number and then try to figure out what's wrong from the name of the corresponding PARAMETER.
Note that UCAR re-defined some of these errors between versions 3.3.1 and 3.4 of netCDF (while leaving the various library versions link-compatible), so you may have to look at the src/fortran/netcdf.inc for the version of the netCDF libnetcdf.a you are linking with, if this is different from the version used to build your libioapi.a
Martin Otte, US EPA, reports that there are similar errors encountered with netCDF Version 4, due to more stringent interpretation of flags for opening or creating files. This is fixed in the Oct. 28 I/O API distribution.

I get "netCDF error -1"
This is NCSYSERR, meaning the system wouldn't give you permission for what you wanted to do. Most probably it means you need to check permissions on either the file you're trying to create or access, or on the directories in its directory path.

I get "netCDF error 2"
"Not a netcdf id", which can happen both if the file honestly isn't a netCDF file, and also if it is a netCDF file, but wasn't shut correctly. (unless you've declared a file "volatile" by setenv <file> <path> -v, netCDF doesn't update the file header until you call SHUT3() or M3EXIT().)

I get "netCDF error 4"
"Invalid Argument", but almost certainly this means you're using netCDF library 2.x with an I/O API library built for netCDF version 3.x (NCAR accidentally changed one of the "magic numbers" used in opening files when they upgraded netCDF from 2.x to 3.x).

I get "netCDF error -31"
This is a variant of the system permission problem. A directory spec of with an extra nonexistent component, e.g., /foo/bar/qux/zorp when you really mean /foo/bar/zorp and the /foo/bar/qux doesn't exist seems to cause Error -31. Can also happen by trying to open too many netCDF files simultaneously (although the I/O API has additional traps around this).
Or on a Cray vector machine, this may mean you're running up against your memory limit. (On Crays, netCDF v3.x dynamically-allocates a fairly large buffer to optimize I/O for each file; this allocation may well push you over your (interactive or queue) memory limit. For netCDF v3.4, there are tricks you can play with environment variables to manipulate these buffer sizes. This error also has turned up with some of the more obscure file-permission problems.

I get "netCDF error -40"
Probably means you tried to read data past the last date-and-time on the file (the I/O API runs netCDF in "verbose mode", so that netCDF will always print all error messages, including this one. Also can happen when the calling program is running in parallel, but a non-MP-enabled version of the I/O API library was linked in.

List of netCDF errors, with attempted annotations

ncnoerr = nf_noerr = 0: : no error has been detected at this time.
ncenfile = nc_syserr = -31: see above
ncebadid = nf_ebadid = -33: not a netcdf ID (might indicate a bug in I/O API internals, or attempt to use a coupling-mode virtual file in a program linked to an I/O API library without coupling-mode enabled)
nceexist = nf_eexist = -35: attempting to create a new file when the file already exists (from OPEN3() with status argument FSNEW3)
nceinval = nf_einval = -36: invalid argument (see above about "incompatible netCDF and I/O API versions")
nceperm = nf_eperm = -37: attempted write to a read only file
nf_enotindefine = -38: operation not allowed in data mode (would indicate a bug in I/O API internals)
nceindef = nf_eindefine = -39: operation not allowed in define mode (would indicate a bug in I/O API internals)
ncecoord = nf_einvalcoords = -40: coordinates out of range -- probably, attempt to read past the last date-and-time on the file. Can also be caused by running a program in parallel with the non-MP-enabled version of the I/O API library. (Otherwise, would indicate a bug in I/O API internals)
ncemaxds = nf_emaxdims = -41: maxncdims exceeded (would indicate a bug in I/O API internals)
ncename = nf_enameinuse = -42: string match to name in use: indicates that you're trying to have two different variables with the same name when creating a file
ncenoatt = nf_enotatt = -43: attribute not found: would indicate that a file is not a correct I/O API file, because it is missing some of the required FDESC3 header-components
ncemaxat = nf_emaxatts = -44: maxncattrs exceeded (would indicate a bug in I/O API internals)
ncebadty = nf_ebadtype = -45: not a netcdf data type: you are trying to create a file for which some value of VGTYP3D(<variable>) in FDESC3 is not one of M3INT. M3REAL, or M3DBLE
ncebadd = nf_ebaddim = -46: invalid dimension ID (would indicate a bug in I/O API internals)
nceunlim = nf_eunlimpos = -47: ncunlimited in the wrong index: Could be caused by incorrectly-set (or un-set) grid dimensions NCOLS3D, NROWS3D, NLAYS3D, or NTHIK3D (else would indicate a bug in I/O API internals).
ncemaxvs = nf_emaxvars = -48: maxncvars exceeded (would indicate a bug in I/O API internals--probably means somebody changed INCLUDE-file PARMS3.EXT inappropriately for the target machine.)
ncenotvr = nf_enotvar = -49: variable not found (attempt to read or write a variable not actually in the file; would indicate a bug in I/O API internals)
ncenotvr = nf_eglobal = -50: action prohibited on ncglobal varid (would indicate a bug in I/O API internals)
ncenotnc = nf_enotnc = -51: not a netcdf file: File not recognized as a netCDF file (possibly empty; possibly not closed properly (e.g., no SHUT3() or M3EXIT(); possibly generated by a program that uses HDF-enabled netCDF but being read by a program with (the recommended) HDF-disabled netCDF).
ncests = nf_ests = -52: In Fortran, string too short (shouldn't happen with I/O API)
ncentool = nf_emaxname = -53: variable-name or attribute-name too long (would indicate a bug in I/O API internals)
nf_eunlimit = -54: something went wrong with the time dimension in a file; might indicate a bug in I/O API internals
nf_enorecvars = -55: attempting to time-step a time-independent file; would indicate a bug in I/O API internals
nf_echar = -56: Attempt to convert between text and numbers (would indicate a bug in I/O API internals)
nf_eedge = -57: subscript out-of-bounds error (would indicate a bug in I/O API internals)
nf_estride = -58: illegal stride (won't happen with I/O API)
nf_ebadname = -59: variable name contains illegal characters
nf_erange = -60: math result not representible (could not convert from native machine floating-point format to XDR/IEEE floating-point format; should be Cray PVP-only)
NF_ENOMEM = -61: internal netCDF memory allocation failure
NF_EVARSIZE = -62: Illegal variable-size: one or more variable sizes violate format constraints (possibly negative or zero)
NF_EDIMSIZE = -63: Invalid dimension-size (possibly negative or zero)
NF_ETRUNC = -64: File likely truncated or possibly corrupted
NCFOOBAR = 32 NetCDF-3: Something is messed up, and netCDF doesn't have an error number for it, or doesn't understand how/why the messup happened
other errors: should be OS errors, as defined in the system's /usr/include/sys/errno.h

Back to "Troubleshooting" Contents

Other Problems

What's this about notCDF?
This is only relevant for users at NCEP, where local politics forbids any copy of libnetcdf.a on their systems. libnotcdf.a is a library that satisfies linker references to libnetcdf.a with "stub" routines that merely report that the user is trying to use NCEP-forbidden netCDF file mode instead of NCEP-required native-binary file mode.

Why does something with log output, netCDF crashes, netCDF failures, etc. happen with the SGI Version 7.4 compilers?
See above.

Why does the I/O API "hang" inside env*() calls on my Linux box, using the Portland compilers?
Analysis due to Robert Elleman, Dept of Atmospheric Sciences, University of Washington: When programs are compiled with the Portland compilers, without the -mp flag (as is the default for mcip) but the I/O API is compiled with this flag (as is the I/O API default), the program will hang (i.e., appear to freeze, consuming all available computational resources but making no evident progress).
Solution: either use the -mp compile flag for all compiles -- both program and library, or use it for neither.
General principle: Make sure the program compile-flags and the I/O API compile-flags (and the netCDF compile-flags!) are consistent!

"Why do I have trouble with my LOGFILE on my SGI?"
There is a problem with SGI f90 Version 7.4 and initialization of COMMON blocks. The Fortran language standard specifies that COMMON blocks must be initialized by BLOCK DATA subprograms, but (since the actual operations of compiling and linking are not covered by the language standard, which considers them "implementation details") does not specify just how to ensure that the BLOCK DATA subprogram is linked in with the rest of the executable. Usual and customary industry practice is that the use of a statement
EXTERNAL FOOBLOCK
in either the main program or in other subroutines that are called should ensure that BLOCK DATA FOOBLOCK is linked into the final executable. This does not happen with SGI f90 Version 7.4, even in very simple test cases. Note that BLOCK DATA INITBLK3 is needed to initialize I/O API internal data structures, including the unit number for LOGFILE and the number of I/O API files currently open; fortuitously, the latter seems to be initialized to zero (which is correct); the former is not initialized correctly, leading to failures to open and use a LOGFILE when you try to specify one.
Note that this error does not seem to happen with SGI f90 Version 7.3 or earlier. I have submitted this problem to SGI in an error report. Their reply is to suggest the use of non-standard CALL DATA INITBLK3, which would need to be done by every internal I/O API routine that references the STATE3 internal data structures.
--CJC

"Why do I get messages about unresolved symbols with names like __mp_getlock, __mp_unlock, or something else with _mp or _kmp in it?"
This probably means that you are using a version of the libioapi.a that is enabled for OpenMP parallel usage, but have not activated the system parallel libraries in your model's build procedure. For Intel compilers this means that you need to add -openmp (for compiler-version 15 or earlier) or -qopenmp (for compiler-version 16 or later); for GNU compilers, -fopenmp, and for PGI compilers, -mp. See the variable OMPLIBS defined in your machine/compiler's Makeinclude.${BIN}.

"Why are my program log and my Fortran-style files missing or screwed up? And where did these fort.<nn> files come from?"
On some systems (notably Sun and SGI), there are incompatibilities in run-time libraries between f77 and f90. The upshot is that on these systems, you can link together Fortran-77 and C using f77, or Fortran-90 and C using f90, but you can't link together Fortran-77 and Fortran-90. The default I/O API distribution for I/O API-3.0 or later is built using f90 and runs into this problem when your model code is built using f77. The solution is to rebuild the model code using f90.

Problems with RedHat 7.0 Linux (thanks to Zion Wang for chasing this down):
RH7 uses quite-nonstandard gcc v2.96 and glibc versions; there are patches available at URL http://www.redhat.com/support/errata/rh7-errata-bugfixes.html
RH7's gcc v2.96 does not work with the standard edition of the Portland Group F90 compiler; there is a version which does work; see URL http://www.pgroup.com/faq.htm: (UPDATE on: RED HAT 7.0 and 3.2 RELEASE COMPILERS!)

"My program does a segmentation fault on the OPEN3 call when I attempt to create a new file!"
Probably the file description was not completely filled in. This has been observed, for example, when one of the variable names VNAME3D(I) in FDESC3.EXT was not set correctly. (What actually happens is that the FDESC3.EXT data structures are initialized to zero by the linker; then the netCDF internals don't handle strings that contain just ASCII zeros correctly).

"My program wrote the data out but I can't read/ncdump/PAVE it now!"
Probably the file wasn't shut correctly. (Unless you've declared a file "volatile" by setenv <file> <path> -v, netCDF doesn't update the file header until you call SHUT3() or M3EXIT().)

"The log says OPEN3() could not open the file, and specifies the logical name rather than the physical file name."
This usually means one of two things:

The program is opening the file with mode FSNEW3, which means that the file must not exist (and will be created anew by OPEN3()), but the file actually does exist.
Delete the file and re-run.
The script which ran your program failed to execute correctly the setenv to define the logical name for the file. Try using the env command in the script before you run the program, in order to get started debugging your script, and then check the value of the problem-file's logical name.

"Why does the linker say ncabor_ or open3_ (etc.) is an undefined symbol?"
There are four probable causes we've been observing:

NetCDF-4 Issues:
There are now two separate libraries (with the Fortran and the C parts of netCDF); you now need libraries-flags -lnetcdff -lnetcdf (in that specific order).
Netcdf-Fortran-4.4 and later seem to have dropped the older CALL NC*() interfaces in favor of the more recent IERR=NF_*() ones. I/O API Version 3.2 has been tediously recoded to replace the 790-odd older-style calls by the newer ones, so you need to either use that, or use an older netCDF version.

Link command-line order:
Probably, the command line that links your program has -lnetcdf before -lioapi instead of after. (Most UNIX linkers only try to resolve things in terms of libraries yet to be defined, and don't go backwards. E.g., if you have
!!! INCORRECT !!! f90 -o foo foo.o ... -lnetcdf -lioapi
the linker won't know where to go to find netCDF functions that are called in the I/O API; instead, if you use
!! CORRECT: f90 -o foo foo.o ... -lioapi -lnetcdf
then the linker will scan "-lnetcdf" to find functions called in "-lioapi"
Another possibility is that you are doing multilingual programming, and using maybe "cc" or "g++" or something else to do the link step. If so, you need to explicitly list the libraries that f90 would include. The list of these is vendor dependent but frequently looks something like
... -lf90 -lU90 -lm
One way to find out is to try to use the Fortrtan comp;iler in verbose mode, e.g., (f90 -v ... on most UNIX systems) to do the linking: it may not find the needed C++ libraries, but it will tell you what libraries it needed for the Fortran part of the I/O API and you can then modify your original link command to use them.

Compiler name inconsistencies
Compilers "mangle" the names of Fortran COMMON blocks, functions, and subroutines in various ways (usually turn them into lower case, and then prefix or postfix them by one (or, for gcc/g77, sometimes two) underscores. This will be a problem when you use the Intel or Portland Group compilers on Linux systems that come with a system-installed libnetcdf.a (which will have been built with gcc/g77).
The precise mangling behavior depends upon the compiler, your system defaults file for the compiler, and the compile/link command lines themselves. (It can also happen that netCDF was built without the expected Fortran or C++ support thay your model was expecting. A useful UNIX utility for diagnosing these problems is nm, which reports what linker visible symbols are present in binary executable, object (.o), or library (.so and .a) files. So if you see a linker error message like
symbol foo_ not found (referenced in bar.o)
then do the following sorts of things:
nm foo.o | grep -i foo nm libnetcdf.a | grep -i foo nm libioapi.a | grep -i foo etc, and maybe man -k foo
to try to find which program-component has the differently-mangled symbol that the linker needs. Then go back and review the compiler flags used in the build-process for that component.
I/O API Version 2.2 and later have a script nm_test.csh to help you with this: run
nm_test.csh <obj-file> <lib-file> <symbol>

Bad compiler installation/configuration
Sometimes you'll find that the missing symbol was in a system routine that the compiler should have known about but somehow (maybe bad compiler-installation) didn't. That one happened to me earlier this week (as I write this May 3, 2002) on an HP system.

"PAVE reports bad values -- -9.xxE37 or something!"
This is a PAVE bug, not an I/O API bug: the original person who wrote the file-reader for PAVE couldn't be bothered to use the I/O API, but instead used raw netCDF reads without proper data-structure and error checking. NetCDF fills in "holes" in its files with a particular fill-value that you are seeing, and this is an indication that the data for that variable and time step was never written to the file. This happens, for example, at the starting time for an MM5/MCPL run, for some of the variables which aren't calculated until after the run is in progress.
This is fixed in Pave Version 2 and later.
"I get an error message that looks something like"
     >>> WARNING in subroutine CHKFIL3 <<<
     Inconsistent file attribute NVARS for file FOO
     Value from file:          6
     Value from caller:        9
                
This means that

File FOO already exists
You are trying to open it as "unknown" (FSTATUS=FSUNKN3) in the call to OPEN3
The file description from within file FOO's header does not match the file description you have supplied in the FDESC3 COMMONs.

For the I/O API, you can't change a file's definition once it has already been created. What you probably want to do is to delete the existing file (or move it somewhere else), and re-run your program--this time creating a new file according to the description you supply.
"I get a compiler warning message that looks something like"
    PGF90-W-0006-Input file empty
    (<somewhere>/ioapi/ddtvar3v.F)
    PGF90/any Linux/x86 5.2-4: compilation completed with warnings
                
There are three worker routines that are empty after preprocessing for the non-coupling-mode compiles. Some compilers treat the attempt to compile an empty file as a problem situation... It isn't.

Back to "Troubleshooting" Contents

To: Models-3/EDSS I/O API: The Help Pages

Send comments to

Carlie J. Coats, Jr.
carlie@jyarborough.com