Spatially Allocating Attributes

7.1. Spatially Allocating Attributes

7.1.1 Modes of the Allocator Program

The allocator program supports ALLOCATE mode for operating on shapefiles, point files, polygon files, I/O API files and regular grid shapefiles. The ALLOCATE mode allows the user to specify a grid, polygon, or point file as input to allocate to an output grid or polygon file. The specified attributes of the input file will be written to the output file as a spatially weighted sum or average. For example, a user may want to aggregate county data to state data. After running the allocator, the state boundaries would be saved as a shapefile with attributes (e.g., population, housing) summed from the county level for each state, as indicated by this formula:

Using the average function, on the other hand, the attributes of a density-type property (such as population density) from the input file are averaged across the output polygons or grid cells. For example, a user may wish to average population density from census tract polygons to a grid. The average population density in each grid cell would be calculated from the tracts that intersect it using the following formula:

Other uses of the ALLOCATE mode are to convert data from one grid to another (e.g., map the data onto a different map projection or grid cell size), and to create the CMAQ OCEANfile from a Shapefile that contains land and surf zone information. Support for the CMAQ OCEANfile was added in version 3.3, which was released on October 26, 2006. In conjunction with this update, a new more general feature was added to compute the fraction of an output cell that is composed of various categories specified in an attribute of an input Shapfile. For example, if you have a Shapefile with a land use category specified for every polygon in the Shapefile, you can use this new feature to create an I/O API file that shows the fraction of each grid cell that was covered by each land use category. More information on this feature is given in Section 7.1.6.

7.1.2 Allocate Mode

When MIMS_PROCESSING is set to ALLOCATE (a mode that replaces both the AVERAGE and AGGREGATE modes from earlier versions of the Spatial Allocator), the Spatial Allocator responds to the following environment variables (required variables appear in bold text):

INPUT_FILE_NAME – Name of the input file to be allocated
INPUT_FILE_TYPE – Shapefile (point, polygon, or gridded data), RegularGrid (a grid generated in memory from the GRIDDESC file), PointFile, or IoapiFile
INPUT_FILE_MAP_PRJN – The map projection of the input file (or a grid name if the input file is gridded)
INPUT_FILE_ELLIPSOID – The ellipsoid of the input file
INPUT_FILE_DELIM – The delimiter that is used when INPUT_FILE_TYPE is PointFile
INPUT_FILE_XCOL – The name of the column containing x coordinates in the input file when INPUT_FILE_TYPE is PointFile
INPUT_FILE_YCOL – The name of the column containing y coordinates in the input file when INPUT_FILE_TYPE is PointFile
INPUT_GRID_NAME – Name of input grid when INPUT_FILE_TYPE is RegularGrid
ALLOCATE_ATTRS – Comma-separated list of attributes from the input file to allocate and propagate to the output file; can also be set to the keyword ALL
ALLOC_MODE_FILE – The name of a file that lists the attributes in the input file to carry through to the output file and the types of the attributes. As an alternative to specifying a file name, ALL_AGGREGATE, ALL_AVERAGE, ALL_AREAPERCENT, ALL_DISCRETEOVERLAP, and ALL_DISCRETECENTROID are recognized keywords that will apply the requested algorithm to all the attributes specified by ALLOCATE_ATTRS.
ALLOC_ATTR_TYPE - Specify SURF_ZONE if you are trying to create a CMAQ OCEANfile.
OUTPUT_FILE_NAME – The directory and name of the output file
OUTPUT_FILE_TYPE – The type of the output file; valid values are RegularGrid, Shapefile (polygons only), or IoapiFile
OUTPUT_FILE_ELLIPSOID – Ellipsoid of the polygons in the output file
OUTPUT_FILE_MAP_PRJN – The map projection of the output file (if output is not gridded)
OUTPUT_POLY_FILE – The name of a shapefile that specifies the geometry of the output polygons when OUTPUT_FILE_TYPE is Shapefile
OUTPUT_POLY_TYPE – The type of the file specifying the geometry of the output polygons. Currently, only Shapefile is supported.
OUTPUT_POLY_ATTRS – A list of attributes to be carried over from the OUTPUT_POLY_FILE to the output file.
OUTPUT_POLY_ELLIPSOID – The ellipsoid of the shapes in OUTPUT_POLY_FILE
OUTPUT_POLY_MAP_PRJN (formerly POLY_DATA_MAP_PRJN) – The map projection of the shapes in OUTPUT_POLY_FILE
GRIDDESC – The name of the GRIDDESC file that describes all grids
OUTPUT_GRID_NAME – The name of the output grid (when OUTPUT_FILE_TYPE is RegularGrid)
MAX_LINE_SEG - Specifies the maximum length of a line segment to use when re ading in a line or polygon Shapefile or creating the polygons for a grid. Any li ne segments longer than the specified length will be split to be no longer than the length specified by this variable. This could be useful when converting data on one grid to another, as the spatial mapping can be done more precisely when the grid is described by more points than just the four corners. Note that apply ing this feature will make the program run more slowly.
DEBUG_OUTPUT – Y or N (specifies whether to write the informational messages to standard output; setting this to N will make the program output only critical information)

7.1.3 Allocate Mode Examples

Example allocate scripts are provided for Unix/Linux in C-shell format (.csh extension appended). These scripts can be executed directly from the scripts directory. If desired, you may edit the aggregate script and set the SA_HOME to your new installation folder. The scripts place their output shapefiles in the output directory. The output files can be viewed with a GIS.

There are several different allocation examples provided in the sample scripts that cover various combinations of data and file types. The sample scripts for the allocate mode are:

alloc_census_tracts_to_grid
alloc_census_tracts_to_county
alloc_census_tracts_to_ioapi
alloc_gridpop_to_biggrid
alloc_emis_to_grid
alloc_ports_to_county

The alloc_census_tracts_to_county.csh script is presented below as an example of a fairly typical allocate mode script. The attributes input file (a shapefile in this case) is processed according to the entries in an allocate mode file, atts_pophous.txt, which has entries for each of the three attributes (POP2000, HOUSEHOLDS, and HDENS) specified by ALLOCATE_ATTRS that will be aggregated or averaged. The input file is intersected with OUTPUT_POLY_FILE, which is also a shapefile. The OUTPUT_POLY_FILE attributes specified in OUTPUT_POLY_ATTRS (e.g., FIPS_CODE and COUNTY) will carry over to the output file specified by OUTPUT_FILE_NAME (county_pophous).


# Set executable
setenv SA_HOME ..
setenv EXE "$SA_HOME/bin/allocator.exe"

# Set Input Directory
setenv DATADIR $SA_HOME/data
setenv OUTPUT $SA_HOME/output

# Select method of spatial analysis

setenv MIMS_PROCESSING ALLOCATE

setenv TIME time

# Set name and path of shapefile having data allocated to it
setenv OUTPUT_POLY_FILE $DATADIR/cnty_tn
setenv OUTPUT_POLY_MAP_PRJN "+proj=latlong"
setenv OUTPUT_POLY_ELLIPSOID "+a=6370997.0,+b=6370997.0"
setenv OUTPUT_POLY_ATTRS COUNTY,FIPS_CODE
setenv OUTPUT_POLY_TYPE ShapeFile

# Set Shapefile from which to allocate data
setenv INPUT_FILE_NAME $DATADIR/tn_pophous
setenv INPUT_FILE_TYPE ShapeFile
setenv INPUT_FILE_MAP_PRJN "+proj=lcc,+lat_1=33,+lat_2=45,+lat_0=40,+lon_0=-97"
setenv INPUT_FILE_ELLIPSOID "+a=6370997.0,+b=6370997.0"
setenv ALLOCATE_ATTRS POP2000,HOUSEHOLDS,HDENS
setenv ALLOC_MODE_FILE $DATADIR/atts_pophous.txt

# Set name and path of resulting shapefile
setenv OUTPUT_FILE_NAME $OUTPUT/county_pophous
setenv OUTPUT_FILE_TYPE ShapeFile
setenv OUTPUT_FILE_MAP_PRJN "+proj=latlong"
setenv OUTPUT_FILE_ELLIPSOID "+a=6370997.0,+b=6370997.0"


echo "Allocating census population tracts to counties"
$TIME $EXE

The above example is a fairly typical allocate mode script. The attributes input file (a shapefile in this case) is processed according to the entries in an allocate mode file, atts_pophous.txt, which has entries for each of the three attributes (POP2000, HOUSEHOLDS, and HDENS) specified by ALLOCATE_ATTRS that will be aggregated or averaged. The input file is intersected with OUTPUT_POLY_FILE, which is also a shapefile and whose attributes specified in OUTPUT_POLY_ATTRS (FIPS_CODE and COUNTY) will carry over to county_pophous, the output file specified by OUTPUT_FILE_NAME.

7.1.4 Allocating Discrete Values

Previous versions of the Spatial Allocator supported the allocation of only continuous data types, i.e., those that could have mathematical operations such as aggregation or averaging performed on them. Discrete attributes such as FIPS codes or county names could not previously be allocated due to this limitation. Starting with Spatial Allocator version 3.0, and subsequent versions, you can now allocate discrete attributes based on maximum area of overlap (the value of the attribute is obtained from the input shape that has the largest overlap with the output shape) or based on the centroid (the value of the attributes is obtained from the input shape that contains the centroid of the output shape). Allocation of discrete attributes is mainly for polygon data, but point and line shapes can be used with the maximum overlap (points are simply counted as an area of 1.0 each, so the first point encountered that overlaps the output shape will be used, while line length is substituted for area for line shapes). The centroid method does not make sense when applied to point or line data, so this condition will generate an error in the program.

The following C-shell script depicts how to allocate discrete variables from a county shapefile onto a grid. The ALLOC_MODE_FILE environment variable specifies the name of an ASCII mode file just as it did in the previous example. This time, the attributes of interest are discrete, so the processing modes used are DISCRETE_CENTROID and DISCRETE_OVERLAP. As an aside, when OUTPUT_FILE_TYPE is set to RegularGrid, the OUTPUT_POLY related environment variables are omitted because the grid is generated in memory from the grid description file and the column and row attributes are added automatically.


# Set executable
setenv SA_HOME ..
setenv EXE "$SA_HOME/bin/allocator.exe"

# Set Input Directory
setenv DATADIR $SA_HOME/data
setenv OUTPUT $SA_HOME/output

if(! -f $DATADIR/discrete_modes.txt) then
echo "Generating mode file"
echo "ATTRIBUTE=FIPS_CODE:DISCRETECENTROID" > $DATADIR/discrete_modes.txt
echo "ATTRIBUTE=COUNTY:DISCRETEOVERLAP" >> $DATADIR/discrete_modes.txt
endif

# Select method of spatial analysis

setenv MIMS_PROCESSING ALLOCATE

setenv TIME time

#set "data" shapefile parameters
setenv GRIDDESC $DATADIR/GRIDDESC.txt

#set parameters for file being allocated
setenv INPUT_FILE_NAME $DATADIR/cnty_tn
setenv INPUT_FILE_TYPE ShapeFile
setenv INPUT_FILE_MAP_PRJN "+proj=latlong"
setenv INPUT_FILE_ELLIPSOID "+a=6370997.0,+b=6370997.0"
setenv ALLOCATE_ATTRS FIPS_CODE,COUNTY
setenv ALLOC_MODE_FILE $DATADIR/discrete_modes.txt

# Set name and path of resulting shapefile
setenv OUTPUT_FILE_NAME $OUTPUT/gridded_county
setenv OUTPUT_FILE_TYPE RegularGrid
setenv OUTPUT_GRID_NAME M08_NASH
setenv OUTPUT_FILE_MAP_PRJN M08_NASH
setenv OUTPUT_FILE_ELLIPSOID "+a=6370997.0,+b=6370997.0"

#echo "Allocating counties to a grid"
$TIME $EXE

The mode file used for this example, excluding comments or whitespace appears as follows:

ATTRIBUTE=FIPS_CODE:DISCRETE_CENTROID

ATTRIBUTE=COUNTY:DISCRETE_OVERLAP

7.1.5 Discretization Interval

Using the MAX_LINE_SEG environment variable, users can specify a maximum line segment length for lines, polygons, and generated grid cells in the units of the output file. Allocations of attributes will appear no different. This feature comes into play when converting shapes between map projections where the conversion algorithms cause distortions in long line segments (those used in a grid, for example). To set a discretization interval, add

setenv MAX_LINE_SEG <length>

to your script, where <length> is an integer value in the same units as the shape being processed. Thus, an 8-km (8000-m) grid processed with a discretization interval of 1000 will break up any line segment of 1000 meters or more. Please note that there is a performance penalty incurred when using the discretization interval; the smaller the interval value, the longer the program will take to run because the line-splitting algorithm will be calculating where to add all the extra points.

7.1.6 Computing Area Percentages for the CMAQ OCEANfile and Other Uses

Support for the CMAQ OCEANfile and for area percentages in general was added in version 3.3, which was released on October 26, 2006. The purpose of this update, is to compute the fraction of an output cell that is composed of various categories specified in an attribute of an input Shapfile. For example, if you have a Shapefile with a land use category specified for every polygon in the Shapefile, you can use this new feature to create an I/O API file that shows the fraction of each grid cell that was covered by each land use category. Note that the only output format currently supported for this new mode is IoapiFile, but the feature could some day be extended to also output Shapefiles. There are two new scripts available with the distribution that illustrate the use of this new mode:


alloc_surf_zone_to_oceanfile.csh: allocates the surf zone Shapefile to create an OCEANfile

alloc_types_to_areapercent.csh: allocates the surf zone Shapefile using the standard area

allocation mode (i.e., variables will be Type_2 and Type_3 instead of SURF_ZONE and OCEAN).

To trigger an area percentage run:

Set the value of the ALLOC_MODE_FILE to ALL_AREAPERCENT. Note that we have not performed any tests that mixes the AREAPERCENT mode with other modes in the same mode file, so it may not work properly to do that, but using AREAPERCENT for all specified attributes does work.
Set the ALLOC_ATTRS variable to specify the name of the variable that specifies the types of polygons (e.g., this is set to TYPE in the case of the OCEANfile).
If you specifically wish to generate an OCEANfile, set the value of ALLOC_ATTR_TYPE variable to SURF_ZONE. This tells the program to output OCEAN and SURF_ZONE, and is not needed for a more generic area percentage run.

A surf zone input data file for North Carolina and South Carolina is available in the with the Jan. 2009 release in the $DATADIR directory:
$DATADIR/surfzone_NC_SC
For the data file that is input to the surf zone calculation for most of North America, check the site http://www.epa.gov/ttn/chief/emch/. The Spatial section has input files for spatial surrogates, and the Biogenic has inputs for biogenic processing and it is likely that the OCEANfile input will be posted in this section.

To Section 7.2: Overlaying Spatial Data