Emissions Modeling Framework v3.7 User's Guide

Last updated: November 13, 2019

1 Overview of the EMF

1.1 Introduction

The Emissions Modeling Framework (EMF) is a software system designed to solve many long-standing difficulties of emissions modeling identified at EPA. The overall process of emissions modeling involves gathering measured or estimated emissions data into emissions inventories; applying growth and controls information to create future year and controlled emissions inventories; and converting emissions inventories into hourly, gridded, chemically speciated emissions estimates suitable for input into air quality models such as the Community Multiscale Air Quality (CMAQ) model.

This User’s Guide focuses on the data management and analysis capabilities of the EMF. The EMF also contains a Control Strategy Tool (CoST) for developing future year and controlled emissions inventories and is capable of driving SMOKE to develop CMAQ inputs.

Many types of data are involved in the emissions modeling process including:

Quality assurance (QA) is an important component of emissions modeling. Emissions inventories and other modeling data must be analyzed and reviewed for any discrepancies or outlying data points. Data files need to be organized and tracked so changes can be monitored and updates made when new data is available. Running emissions modeling software such as the Sparse Matrix Operator Kernel Emissions (SMOKE) Modeling System requires many configuration options and input files that need to be maintained so that modeling output can be reproduced in the future. At all stages, coordinating tasks and sharing data between different groups of people can be difficult and specialized knowledge may be required to use various tools.

In your emissions modeling work, you may have found yourself asking questions like:

The EMF helps with these issues by using a client-server system where emissions modeling information is centrally stored and can be accessed by multiple users. The EMF integrates quality control processes into its data management to help with development of high quality emissions results. The EMF also organizes emissions modeling data and tracks emissions modeling efforts to aid in reproducibility of emissions modeling results. Additionally, the EMF strives to allow non-experts to use emissions modeling capabilities such as future year projections, spatial allocation, chemical speciation, and temporal allocation.

1.2 EMF Components

A typical installation of the EMF system is illustrated in Fig. 1.1. In this case, a group of users shares a single EMF server with multiple local machines running the client application. The EMF server consists of a database, file storage, and the server application which handles requests from the clients and communicates with the database. The client application runs on each user’s computer and provides a graphical interface for interacting with the emissions modeling data stored on the server (see Sec. 2). Each user has his or her own username and password for accessing the EMF server. Some users will have administrative privileges which allow them to access additional system data such as managing users or dataset types.

Figure 1.1: Typical EMF client-server setup
Figure 1.1: Typical EMF client-server setup

For a simpler setup, all of the EMF components can be run on a single machine: database, server application, and client application. With this “all-in-one” setup, the emissions data would generally not be shared between multiple users.

1.3 Basic Workflow

Fig. 1.2 illustrates the basic workflow of data in the EMF system.

Figure 1.2: Data workflow in EMF system
Figure 1.2: Data workflow in EMF system

Emissions modeling data files are imported into the EMF system where they are represented as datasets (see Sec. 3). The EMF supports many different types of data files including emissions inventories, allocation factors, cross-reference files, and reference data. Each dataset matches a dataset type which defines the format of the data to be loaded from the file (Sec. 3.2). In addition to the raw data values, the EMF stores various metadata about each dataset including the time period covered, geographic region, the history of the data, and data usage in model runs or QA analysis.

Once your data is stored as a dataset, you can review and edit the dataset’s properties (Sec. 3.5) or the data itself (Sec. 3.6) using the EMF client. You can also run QA steps on a dataset or set of datasets to extract summary information, compare datasets, or convert the data to a different format (see Sec. 4).

You can export your dataset to a file and download it to your local computer (Sec. 3.8). You can also export reports that you create with QA steps for further analysis in a spreadsheet program or to create charts (Sec. 4.5).

2 Desktop Client

2.1 Requirements

The EMF client is a graphical desktop application written in Java. While it is primarily developed and used in Windows, it will run under Mac OS X and Linux (although due to font differences the window layout may not be optimal). The EMF client can be run on Windows 7, Windows 8, or Windows 10.

2.1.1 Checking Your Java Installation

The EMF requires Java 8 or greater. The following instructions will help you check if you have Java installed on your Windows machine and what version is installed. If you need more details, please visit How to find Java version in Windows [java.com].

The latest version(s) of Java on your system will be listed as Java 8 with an associated Update number (eg. Java 8 Update 161). Older versions may be listed as Java(TM), Java Runtime Environment, Java SE, J2SE or Java 2.

Windows 10

  1. Click the Start button.
  2. Scroll through the applications and programs listed until you see the Java folder.
  3. Click on the Java folder, then About Java to see the Java version.

Windows 8

  1. Right-click on the screen at bottom-left corner and choose the Control Panel from the pop-up menu.
  2. When the Control Panel appears, select Programs
  3. Click Programs and Features
  4. The installed Java version(s) are listed.

Fig. 2.1 shows the About Java window on Windows 10 with Java installed. The installed version of Java is Version 8 Update 161; this version does not need to be updated to run the EMF client.

Figure 2.1: About Java
Figure 2.1: About Java

2.1.2 Installing Java

If you need to install Java, please follow the instructions for downloading and installing Java for a Windows computer [java.com]. Note that you will need administrator privileges to install Java on Windows. During the installation, make a note of the directory where Java is installed on your computer. You will need this information to configure the EMF client.

2.1.3 Updating Java

If Java is installed on your computer but is not version 8 or greater, you will need to update your Java installation. Start by opening the Java Control Panel from the Windows Control Panel. Fig. 2.2 shows the Java Control Panel.

Figure 2.2: Java Control Panel
Figure 2.2: Java Control Panel

Clicking the About button will display the Java version dialog seen in Fig. 2.3. In Fig. 2.3, the installed version of Java is Version 7 Update 45. This version of Java needs to be updated to run the EMF client.

Figure 2.3: Java Version Dialog
Figure 2.3: Java Version Dialog

To update Java, click the tab labeled Update in the Java Control Panel (see Fig. 2.4). Click the button labeled Update Now in the bottom right corner of the Java Control Panel to update your installation of Java.

Figure 2.4: Java Control Panel: Update Tab
Figure 2.4: Java Control Panel: Update Tab

2.2 Installing the EMF Client

How you install the EMF client depends on which EMF server you will be connecting to. To download and install an all-in-one package that includes all the EMF components, please visit https://www.cmascenter.org/cost/. Other users should contact their EMF server administrators for instructions on downloading and installing the EMF client.

To launch the EMF client, double-click the file named EMFClient.bat. You may see a security warning similar to Fig. 2.5. Uncheck the box labeled “Always ask before opening this file” to avoid the warning in the future.

Figure 2.5: EMF Client Security Warning
Figure 2.5: EMF Client Security Warning

2.3 Register as a New User and Log In

When you start the EMF client application, you will initially see a login window like Fig. 2.6.

Figure 2.6: Login to the Emissions Modeling Framework Window
Figure 2.6: Login to the Emissions Modeling Framework Window

If you are an existing EMF user, enter your EMF username and password in the login window and click the Log In button. If you forget your password, an EMF Administrator can reset it for you. Note: The Reset Password button is used to update your password when it expires; it can’t be used if you’ve lost your password. See Sec. 2.5 for more information on password expiration.

If you have never used the EMF before, click the Register New User button to bring up the Register New User window as shown in Fig. 2.7.

Figure 2.7: Register New User Window
Figure 2.7: Register New User Window

In the Register New User window, enter the following information:

Click OK to create your account. If there are any problems with the information you entered, an error message will be displayed at the top of the window as shown in Fig. 2.8.

Figure 2.8: Error Registering New User
Figure 2.8: Error Registering New User

Once you have corrected any errors, your account will be created and the EMF main window will be displayed (Fig. 2.9).

Figure 2.9: EMF Main Window
Figure 2.9: EMF Main Window

2.4 Update Your Profile

If you need to update any of your profile information or change your password, click the Manage menu and select My Profile to bring up the Edit User window shown in Fig. 2.10.

Figure 2.10: Edit User Profile
Figure 2.10: Edit User Profile

To change your password, enter your new password in the Password field and be sure to enter the same password in the Confirm Password field. Your password must be at least 8 characters long and must contain at least one digit.

Once you have entered any updated information, click the Save button to save your changes and close the Edit User window. You can close the window without saving changes by clicking the Close button. If you have unsaved changes, you will be asked to confirm that you want to discard your changes (Fig. 2.11).

Figure 2.11: Discard Changes Confirmation
Figure 2.11: Discard Changes Confirmation

2.5 Password Expiration

Passwords in the EMF expire every 90 days. If you try to log in and your password has expired, you will see the message “Password has expired. Reset Password.” as shown in Fig. 2.12.

Figure 2.12: Password Expired
Figure 2.12: Password Expired

Click the Reset Password button to set a new password as shown in Fig. 2.13. After entering your new password and confirming it, click the Save button to save your new password and you will be logged in to the EMF. Make sure to use your new password next time you log in.

Figure 2.13: Reset Expired Password
Figure 2.13: Reset Expired Password

2.6 Interface Concepts

As you become familiar with the EMF client application, you’ll encounter various concepts that are reused through the interface. In this section, we’ll briefly introduce these concepts. You’ll see specific examples in the following chapters of this guide.

2.6.1 Viewing vs. Editing

First, we’ll discuss the difference between viewing an item and editing an item. Viewing something in the EMF means that you are just looking at it and can’t change its information. Conversely, editing an item means that you have the ability to change something. Oftentimes, the interface for viewing vs. editing will look similar but when you’re just viewing an item, various fields won’t be editable. For example, Fig. 2.14 shows the Dataset Properties View window while Fig. 2.15 shows the Dataset Properties Editor window for the same dataset.

Figure 2.14: Viewing a dataset
Figure 2.14: Viewing a dataset
Figure 2.15: Editing a dataset
Figure 2.15: Editing a dataset

In the edit window, you can make various changes to the dataset like editing the dataset name, selecting the temporal resolution, or changing the geographic region. Clicking the Save button will save your changes. In the viewing window, those same fields are not editable and there is no Save button. Notice in the lower left hand corner of Fig. 2.14 the button labeled Edit Properties. Clicking this button will bring up the editing window shown in Fig. 2.15.

Similarly, Fig. 2.16 shows the QA tab of the Dataset Properties View as compared to Fig. 2.17 showing the same QA tab but in the Dataset Properties Editor.

Figure 2.16: Viewing QA tab
Figure 2.16: Viewing QA tab
Figure 2.17: Editing QA tab
Figure 2.17: Editing QA tab

In the View window, the only option is to view each QA step whereas the Editor allows you to interact with the QA steps by adding, editing, copying, deleting, or running the steps. If you are having trouble finding an option you’re looking for, check to see if you’re viewing an item vs. editing it.

2.6.2 Access Restrictions

Only one user can edit a given item at a time. Thus, if you are editing a dataset, you have a “lock” on it and no one else will be able to edit it at the same time. Other users will be able to view the dataset as you’re editing it. If you try to edit a locked dataset, the EMF will display a message like Fig. 2.18. For some items in the EMF, you may only be able to edit the item if you created it or if your account has administrative privileges.

Figure 2.18: Dataset Locked Message
Figure 2.18: Dataset Locked Message

2.6.3 Unsaved Changes

Generally you will need to click the Save button to save changes that you make. If you have unsaved changes and click the Close button, you will be asked if you want to discard your changes as shown in Fig. 2.11. This helps to prevent losing your work if you accidentally close a window.

2.6.4 Refresh

The EMF client application loads data from the EMF server. As you and other users work, your information is saved to the server. In order to see the latest information from other users, the client application needs to refresh its information by contacting the server. The latest data will be loaded from the server when you open a new window. If you are working in an already open window, you may need to click on the Refresh button to load the newest data. Fig. 2.19 highlights the Refresh button in the Dataset Manager window. Clicking Refresh will contact the server and load the latest list of datasets.

Figure 2.19: Refresh button in the Dataset Manager window
Figure 2.19: Refresh button in the Dataset Manager window

Various windows in the EMF client application have Refresh buttons, usually in either the top right corner as in Fig. 2.19 or in the row of buttons on the bottom right like in Fig. 2.17.

You will also need to use the Refresh button if you have made changes and return to a previously opened window. For example, suppose you select a dataset in the Dataset Manager and edit the dataset’s name as described in Sec. 3.5. When you save your changes, the previously opened Dataset Manager window won’t automatically display the updated name. If you close and re-open the Dataset Manager, the dataset’s name will be refreshed; otherwise, you can click the Refresh button to update the display.

2.6.5 Status Window

Many actions in the EMF are run on the server. For example, when you run a QA step, the client application on your computer sends a message to the server to start running the step. Depending on the type of QA step, this processing can take a while and so the client will allow you to do other work while it periodically checks with the server to find out the status of your request. These status checks are displayed in the Status Window shown in Fig. 2.20.

Figure 2.20: Status Window
Figure 2.20: Status Window

The status window will show you messages about tasks when they are started and completed. Also, error messages will be displayed if a task could not be completed. You can click the Refresh button in the Status Window to refresh the status. The Trash icon clears the Status Window.

2.6.6 The Sort-Filter-Select Table

Most lists of data within the EMF are displayed using the Sort-Filter-Select Table, a generic table that allows sorting, filtering, and selection (as the name suggests). Fig. 2.21 shows the sort-filter-select table used in the Dataset Manager. (To follow along with the figures, select the main Manage menu and then select Datasets. In the window that appears, find the Show Datasets of Type pull-down menu near the top of the window and select All.)

Figure 2.21: Sort-Filter-Select Table
Figure 2.21: Sort-Filter-Select Table

Row numbers are shown in the first column, while the first row displays column headers. The column labeled Select allows you to select individual rows by checking the box in the column. Selections are used for different activities depending on where the table is displayed. For example, in the Dataset Manager window you can select various datasets and then click the View button to view the dataset properties of each selected dataset. In other contexts, you may have options to change the status of all the selected items or copy the selected items. There are toolbar buttons to allow you to quickly select all items in a table (Sec. 2.6.12) and to clear all selections (Sec. 2.6.13).

The horizontal scroll bar at the bottom indicates that there are more columns in the table than fit in the window. Scroll to the right in order to see all the columns as in Fig. 2.22.

Figure 2.22: Sort-Filter-Select Table with Scrolled Columns
Figure 2.22: Sort-Filter-Select Table with Scrolled Columns

Notice the info line displayed at the bottom of the table. In Fig. 2.22 the line reads 35 rows : 12 columns: 0 Selected [Filter: None, Sort: None]. This line gives information about the total number of rows and columns in the table, the number of selected items, and any filtering or sorting applied.

Columns can be resized by clicking on the border between two column headers and dragging it right or left. Your mouse cursor will change to a horizontal double-headed arrow when resizing columns.

You can rearrange the order of the columns in the table by clicking a column header and dragging the column to a new position. Fig. 2.23 shows the sort-filter-select table with columns rearranged and resized.

Figure 2.23: Sort-Filter-Select Table with Rearranged and Resized Columns
Figure 2.23: Sort-Filter-Select Table with Rearranged and Resized Columns

To sort the table using data from a given column, click on the column header such as Last Modified Date. Fig. 2.24 shows the table sorted by Last Modified Date in descending order (latest dates first). The table info line now includes Sort: Last Modified Date(-).

Figure 2.24: Sort-Filter-Select Table with Column Sort
Figure 2.24: Sort-Filter-Select Table with Column Sort

If you click the Last Modified Date header again, the table will re-sort by Last Modified Date in ascending order (earliest dates first). The table info line also changes to Sort: Last Modified Date(+) as seen in Fig. 2.25.

Figure 2.25: Sort-Filter-Select Table with Reversed Column Sort
Figure 2.25: Sort-Filter-Select Table with Reversed Column Sort

The toolbar at the top of the table (as shown in Fig. 2.26) has buttons for the following actions (from left to right):

Figure 2.26: Toolbar for Sort-Filter-Select Table
Figure 2.26: Toolbar for Sort-Filter-Select Table
  1. Sort options
  2. Filter rows
  3. Show or hide columns
  4. Format data in columns
  5. Reset table’s sorting, filtering, and column layout
  6. Select all rows
  7. Clear all selections

If you hover your mouse over any of the buttons, a tooltip will pop up to remind you of each button’s function.

2.6.7 Sort Options

The Sort toolbar button brings up the Sort Columns dialog as shown in Fig. 2.27. This dialog allows you to sort the table by multiple columns and also allows case sensitive sorting. (Quick sorting by clicking a column header uses case insensitive sorting.)

Figure 2.27: Sort Columns Dialog
Figure 2.27: Sort Columns Dialog

In the Sort Columns Dialog, select the first column you would use to sort the data from the Sort By pull-down menu. You can also specify if the sort order should be ascending or descending and if the sort comparison should be case sensitive.

To add additional columns to sort by, click the Add button and then select the column in the new Then Sort By pull-down menu. When you have finished setting up your sort selections, click the OK button to close the dialog and re-sort the table. The info line beneath the table will show all the columns used for sorting like Sort: Creator(+), Last Modified Date(-).

To remove your custom sorting, click the Clear button in the Sort Columns dialog and then click the OK button. You can also use the Reset toolbar button to reset all custom settings as described in Sec. 2.6.11.

2.6.8 Filter Rows

The Filter Rows toolbar button brings up the Filter Rows dialog as shown in Fig. 2.28. This dialog allows you to create filters to “whittle down” the rows of data shown in the table. You can filter the table’s rows based on any column with several different value matching options.

Figure 2.28: Filter Rows Dialog
Figure 2.28: Filter Rows Dialog

To add a filter criterion, click the Add Criteria button and a new row will appear in the dialog window. Clicking the cell directly under the Column Name header displays a pull-down menu to pick which column you would like use to filter the rows. The Operation column allows you to select how the filter should be applied; for example, you can filter for data that starts with the given value or does not contain the value. Finally, click the cell under the Value header and type in the value to use. Note that the filter values are case-sensitive. A filter value of “nonroad” would not match the dataset type “ORL Nonroad Inventory”.

If you want to specify additional criteria, click Add Criteria again and follow the same process. To remove a filter criterion, click on the row you want to remove and then click the Delete Criteria button.

If the radio button labeled Match using: is set to ALL criteria, then only rows that match all the specified criteria will be shown in the filtered table. If Match using: is set to ANY criteria, then rows will be shown if they meet any of the criteria listed.

Once you are done specifying your filter options, click the OK button to close the dialog and return to the filtered table. The info line beneath the table will include your filter criteria like Filter: Creator contains rhc, Temporal Resolution starts with Ann.

To remove your custom filtering, you can delete the filter criteria from the Filter Rows dialog or uncheck the Apply Filter? checkbox to turn off the filtering without deleting your filter rules. You can also use the Reset toolbar button to reset all custom settings as described in Sec. 2.6.11. Note that clicking the Reset button will delete your filter rules.

2.6.9 Show or Hide Columns

The Show/Hide Columns toolbar button brings up the Show/Hide Columns dialog as shown in Fig. 2.29. This dialog allows you to customize which columns are displayed in the table.

Figure 2.29: Show/Hide Columns Dialog
Figure 2.29: Show/Hide Columns Dialog

To hide a column, uncheck the box next to the column name under the Show? column. Click the OK button to return to the table. The columns you unchecked will no longer be seen in the table. The info line beneath the table will also be updated with the current number of displayed columns.

To make a hidden column appear again, open the Show/Hide Columns dialog and check the Show? box next to the hidden column’s name. Click OK to close the Show/Hide Columns dialog.

To select multiple columns to show or hide, click on the first column name of interest. Then hold down the Shift key and click a second column name to select it and the intervening columns. Once rows are selected, clicking the Show or Hide buttons in the middle of the dialog will check or uncheck all the Show? boxes for the selected rows. To select multiple rows that aren’t next to each other, you can hold down the Control key while clicking each row. The Invert button will invert the selected rows. After checking/unchecking the Show? checkboxes, click OK to return to the table with the columns shown/hidden as desired.

The Show/Hide Columns dialog also supports filtering to find columns to show or hide. This is an infrequently used option most useful for locating columns to show or hide when there are many columns in the table. Fig. 2.30 shows an example where a filter has been set up to match column names that contain the value “Date”. Clicking the Select button above the filtering options selects matching rows which can then be hidden by clicking the Hide button.

Figure 2.30: Show/Hide Columns with Column Name Filter
Figure 2.30: Show/Hide Columns with Column Name Filter

2.6.10 Format Data in Columns

The Format Columns toolbar button displays the Format Columns dialog show in Fig. 2.31. This dialog allows you to customize the formatting of columns. In practice, this dialog is not used very often but it can be helpful to format numeric data by changing the number of decimal places or the number of significant digits shown.

Figure 2.31: Format Columns Dialog
Figure 2.31: Format Columns Dialog

To change the format of a column, first check the checkbox next to the column name in the Format? column. If you only select columns that contain numeric data, the Numeric Format Options section of the dialog will appear; otherwise, it will not be visible. The Format Columns dialog supports filtering by column name similar to the Show/Hide Columns dialog (Sec. 2.6.9).

From the Format Columns dialog, you can change the font, the style of the font (e.g. bold, italic), the horizontal alignment for the column (e.g. left, center, right), the text color, and the column width. For numeric columns, you can specify the number of significant digits and decimal places.

2.6.11 Reset Table

The Reset toolbar button will remove all customizations from the table: sorting, filtering, hidden columns, and formatting. It will also reset the column order and set column widths back to the default.

2.6.12 Select All Rows

The Select All toolbar button selects all the rows in the table. After clicking the Select All button, you will see that the checkboxes in the Select column are now all checked. You can select or deselect an individual item by clicking its checkbox in the Select column.

2.6.13 Clear All Selections

The Clear All Selections toolbar button unselects all the rows in the table.

3 Datasets

3.1 Introduction

Emissions inventories, reference data, and other types of data files are imported into the EMF and stored as datasets. A dataset encompasses both the data itself as well as various dataset properties such as the time period covered by the dataset and geographic extent of the dataset. Changes to a dataset are tracked as dataset revisions. Multiple versions of the data for a dataset can be stored in the EMF.

3.2 Dataset Types

Each dataset has a dataset type. The dataset type describes the format of the dataset’s data. For example, the dataset type for an ORL Point Inventory (PTINV) defines the various data fields of the inventory file such as FIPS code, SCC code, pollutant name, and annual emissions value. A different dataset type like Spatial Surrogates (A/MGPRO) defines the fields in the corresponding file: surrogate code, FIPS code, grid cell, and surrogate fraction.

The EMF also supports flexible dataset types without fixed format - Comma Separated Value and Line-based. These types allow for new kinds of data to be loaded into the EMF without requiring updates to the EMF software.

When importing data into the EMF, you can choose between internal dataset types where the data itself is stored in the EMF database and external dataset types where the data remains in a file on disk and the EMF only tracks the metadata. For internal datasets, the EMF provides data editing, revision and version tracking, and data analysis using SQL queries. External datasets can be used to track files that don’t need these features or data that can’t be loaded into the EMF like binary NetCDF files.

You can view the dataset types defined in the EMF by selecting Dataset Types from the main Manage menu. EMF administrators can add, edit, and remove dataset types; non-administrative users can view the dataset types. Fig. 3.1 shows the Dataset Type Manager.

Figure 3.1: Dataset Type Manager
Figure 3.1: Dataset Type Manager

To view the details of a particular dataset type, check the box next to the type you want to view (for example, “Flat File 2010 Nonpoint”) and then click the View button in the bottom left-hand corner.

Fig. 3.2 shows the View Dataset Type window for the Flat File 2010 Nonpoint dataset type. Each dataset type has a name and a description along with metadata about who created the dataset type and when, and also the last modified date for the dataset type.

Figure 3.2: View Dataset Type: Flat File 2010 Nonpoint
Figure 3.2: View Dataset Type: Flat File 2010 Nonpoint

The dataset type defines the format of the data file as seen in the File Format section of Fig. 3.2. For the Flat File 2010 Nonpoint dataset type, the columns from the raw data file are mapped into columns in the database when the data is imported. Each data column must match the type (string, integer, floating point) and can be mandatory or optional.

Keyword-value pairs can be used to give the EMF more information about a dataset type. Tbl. 3.1 lists some of the keywords available. Sec. 3.5.3 provides more information about using and adding keywords.

Table 3.1: Dataset Type Keywords
Keyword Description Example
EXPORT_COLUMN_LABEL Indicates if columns labels should be included when exporting the data to a file FALSE
EXPORT_HEADER_COMMENTS Indicates if header comments should be included when exporting the data to a file FALSE
EXPORT_INLINE_COMMENTS Indicates if inline comments should be included when exporting the data to a file FALSE
EXPORT_PREFIX Filename prefix to include when exporting the data to a file ptinv_
EXPORT_SUFFIX Filename suffix to use when exporting the data to a file .csv
INDICES Tells the system to create indices in the database on the given columns region_cd|country_cd|scc
REQUIRED_HEADER Indicates a line that must occur in the header of a data file #FORMAT=FF10_ACTIVITY

Each dataset type can have QA step templates assigned. These are QA steps that apply to any dataset of the given type. More information about using QA step templates in given in Sec. 4.

3.2.1 Common Dataset Types

Dataset types can be added, edited, or deleted by EMF administrators. In this section, we list dataset types that are commonly used. Your EMF installation may not include all of these types or may have additional types defined.

3.2.1.1 Common Inventory Dataset Types

Table 3.2: Inventory Dataset Types
Dataset Type Name Description Link to File Format
Flat File 2010 Activity Onroad mobile activity data (VMT, VPOP, speed) in Flat File 2010 (FF10) format SMOKE documentation
Flat File 2010 Activity Nonpoint Nonpoint activity data in FF10 format Same format as Flat File 2010 Activity
Flat File 2010 Activity Point Point activity data in FF10 format Not available
Flat File 2010 Nonpoint Nonpoint or nonroad emissions inventory in FF10 format SMOKE documentation
Flat File 2010 Nonpoint Daily Nonpoint or nonroad day-specific emissions inventory in FF10 format SMOKE documentation
Flat File 2010 Point Point emissions inventory in FF10 format SMOKE documentation
Flat File 2010 Point Daily Point day-specific emissions inventory in FF10 format SMOKE documentation
ORL Day-Specific Fires Data Inventory (PTDAY) Day-specific fires inventory SMOKE documentation
ORL Fire Inventory (PTINV) Wildfire and prescribed fire inventory SMOKE documentation
ORL Nonpoint Inventory (ARINV) Nonpoint emissions inventory in ORL format SMOKE documentation
ORL Nonroad Inventory (ARINV) Nonroad emissions inventory in ORL format SMOKE documentation
ORL Onroad Inventory (MBINV) Onroad mobile emissions inventory in ORL format SMOKE documentation
ORL Point Inventory (PTINV) Point emissions inventory in ORL format SMOKE documentation

3.2.1.2 Common Reference Data Dataset Types

Table 3.3: Reference Data Dataset Types
Dataset Type Name Description Link to File Format
Country, state, and county names and data (COSTCY) List of region names and codes with default time zones and daylight-saving time flags SMOKE documentation
Grid Descriptions (Line-based) List of projections and grids I/O API documentation
Holiday Identifications (Line-based) Holidays date list SMOKE documentation
Inventory Table Data (INVTABLE) Pollutant reference data SMOKE documentation
MACT description (MACTDESC) List of MACT codes and descriptions SMOKE documentation
NAICS description file (NAICSDESC) List of NAICS codes and descriptions SMOKE documentation
ORIS Description (ORISDESC) List of ORIS codes and descriptions SMOKE documentation
Point-Source Stack Replacements (PSTK) Replacement stack parameters SMOKE documentation
SCC Descriptions (Line-based) List of SCC codes and descriptions SMOKE documentation
SIC Descriptions (Line-based) List of SIC codes and descriptions SMOKE documentation
Surrogate Descriptions (SRGDESC) List of surrogate codes and descriptions SMOKE documentation

3.2.1.3 Common Emissions Modeling Cross-Reference and Factors Dataset Types

Table 3.4: Emissions Modeling Dataset Types
Dataset Type Name Description Link to File Format
Area-to-point Conversions (Line-based) Point locations to assign to stationary area and nonroad mobile sources SMOKE documentation
Chemical Speciation Combo Profiles (GSPRO_COMBO) Multiple speciation profile combination data SMOKE documentation
Chemical Speciation Cross-Reference (GSREF) Cross-reference data to match inventory sources to speciation profiles SMOKE documentation
Chemical Speciation Profiles (GSPRO) Factors to allocate inventory pollutant emissions to model species SMOKE documentation
Gridding Cross Reference (A/MGREF) Cross-reference data to match inventory sources to spatial surrogates SMOKE documentation
Pollutant to Pollutant Conversion (GSCNV) Conversion factors when inventory pollutant doesn’t match speciation profile pollutant SMOKE documentation
Spatial Surrogates (A/MGPRO) Factors to allocate emissions to grid cells SMOKE documentation
Spatial Surrogates (External Multifile) External dataset type to point to multiple surrogates files on disk Individual files have same format as Spatial Surrogates (A/MGPRO)
Temporal Cross Reference (A/M/PTREF) Cross-reference data to match inventory sources to temporal profiles SMOKE documentation
Temporal Profile (A/M/PTPRO) Factors to allocate inventory emissions to hourly estimates SMOKE documentation

3.2.1.4 Common Growth and Controls Dataset Types

Table 3.5: Growth and Controls Dataset Types
Dataset Type Name Description Link to File Format
Allowable Packet Allowable emissions cap or replacement values SMOKE documentation
Allowable Packet Extended Allowable emissions cap or replacement values; supports monthly values Download CSV
Control Packet Control efficiency, rule effectiveness, and rule penetration rate values SMOKE documentation
Control Packet Extended Control percent reduction values; supports monthly values Download CSV
Control Strategy Detailed Result Extended Output from CoST Download CSV
Control Strategy Least Cost Control Measure Worksheet Output from CoST Not available
Control Strategy Least Cost Curve Summary Output from CoST Not available
Facility Closure Extended Facility closure dates Download CSV
Projection Packet Factors to grow emissions values into the past or future SMOKE documentation
Projection Packet Extended Projection factors; supports monthly values Download CSV
Strategy County Summary Output from CoST Not available
Strategy Impact Summary Output from CoST Not available
Strategy Measure Summary Output from CoST Not available
Strategy Messages (CSV) Output from CoST Not available

3.3 The Dataset Manager

The main interface for finding and interacting with datasets is the Dataset Manager. To open the Dataset Manager, select the Manage menu at the top of the EMF main window, and then select the Datasets menu item. It may take a little while for the window to appear. As shown in Fig. 3.3, the Dataset Manager initially does not show any datasets. This is to avoid loading a potentially large list of datasets from the server.

Figure 3.3: Empty Dataset Manager Window
Figure 3.3: Empty Dataset Manager Window

From the Dataset Manager you can:

To quickly find datasets of interest, you can use the Show Datasets of Type pull-down menu at the top of the Dataset Manager window. Select “ORL Point Inventory (PTINV)” and the datasets matching that Dataset Type are loaded into the Dataset Manager as shown in Fig. 3.4.

Figure 3.4: Dataset Manager Window with Datasets
Figure 3.4: Dataset Manager Window with Datasets

The matching datasets are shown in a table that lists some of their properties, including the dataset’s name, last modified date, dataset type, status indicating how the dataset was created, and the username of the dataset’s creator. Tbl. 3.6 describes each column in the Dataset Manager window. In the Dataset Manager window, use the horizontal scroll bar to scroll the table to the right to see all the columns.

Table 3.6: Dataset Manager Columns
Column Description
Name A unique name or label for the dataset. You choose this name when importing data and it can be edited by users with appropriate privileges.
Last Modified Date The most recent date and time when the data (not the metadata) of the dataset was modified. When the dataset is initially imported, the Last Modified Date is set to the file’s timestamp.
Type The Dataset Type of this dataset. The Dataset Type incorporates information about the structure of the data and information regarding how the data can be sorted and summarized.
Status Shows whether the dataset was imported from disk or created in some other way such as an output from a control strategy.
Creator The username of the person who originally created the dataset.
Intended Use Specifies whether the dataset is intended to be public (accessible to any user), private (accessible only to the creator), or to be used by a specific group of users.
Project The name of a study or set of work for which this dataset was created. The project field can help you organize related files.
Region The name of a geographic region to which the dataset applies.
Start Date The start date and time for the data contained in the dataset.
End Date The end date and time for the data contained in the dataset.
Temporal Resolution The temporal resolution of the data contained in the dataset (e.g. annual, daily, or hourly).

Using the Dataset Manager, you can select datasets of interest by checking the checkboxes in the Select column and then perform various actions related to those datasets. Tbl. 3.7 lists the buttons along the bottom of the Dataset Manager window and describes the actions for each button.

Table 3.7: Dataset Manager Actions
Command Description
View Displays a read-only Dataset Properties View for each of the selected datasets. You can view a dataset even when someone else is editing that dataset’s properties or data.
Edit Properties Opens a writeable Dataset Properties Editor for each of the selected datasets. Only one user can edit a dataset at any given time.
Edit Data Opens a Dataset Versions Editor for each of the selected datasets.
Remove Marks each of the selected datasets for deletion. Datasets are not actually deleted until you click purge.
Import Opens the Import Datasets window where you can import data files into the EMF as new datasets.
Export Opens the Export window to write the data for one version of the selected dataset to a file.
Purge Permanently removes any datasets that are marked for deletion from the EMF.
Close Closes the Dataset Manager window.

3.4 Finding Datasets

There are several ways to find datasets using the Dataset Manager. First, you can show all datasets with a particular dataset type by choosing the dataset type from the Show Datasets of Type menu. If there are more than a couple hundred datasets matching the type you select, the system will warn you and suggest you enter something in the Name Contains field to limit the list.

3.4.1 Dataset Name Matching

The Name Contains field allows you to enter a search term to match dataset names. For example, if you type 2020 in the textbox and then hit Enter, the Dataset Manager will show all the datasets with “2020” in their names. You can also use wildcards in your keyword. Using the keyword pt*2020 will show all datasets whose name contains “pt” followed at some point by “2020” as shown in Fig. 3.5. The Name Contains search is not case sensitive.

Figure 3.5: Using the Name Contains Keyword
Figure 3.5: Using the Name Contains Keyword

If you want to search for datasets using attributes other than the dataset’s name or using multiple criteria, click the Advanced button. The Advanced Dataset Search dialog as shown in Fig. 3.6 will be displayed.

Figure 3.6: Using the Advanced Search on the Dataset Manager
Figure 3.6: Using the Advanced Search on the Dataset Manager

You can use the Advanced Dataset Search to search for datasets based on the contents of the dataset’s description, the dataset’s creator, project, and more. Tbl. 3.8 lists the options for the advanced search.

Table 3.8: Advanced Dataset Search Options
Search option Description
Name contains Performs a case-insensitive search of the dataset name; supports wildcards
Description contains Performs a case-insensitive search of the dataset description; supports wildcards
Creator Matches datasets created by the specified user
Dataset type Matches datasets of the specified type
Keyword Matches datasets that have the specified keyword
Keyword value Matches datasets where the specified keyword has the specified value; must exactly match the dataset’s keyword value (case-insensitive)
QA name contains Performs a case-insensitive search of the names of the QA steps associated with datasets
Search QA arguments Searches the arguments to QA steps associated with datasets
Project Matches datasets assigned to the specified project
Used by Case Inputs Finds datasets by case (not described in this User’s Guide)
Data Value Filter Matches datasets using SQL like “FIPS='37001' and SCC like '102005%'”; must be used with the dataset type criterion

After setting your search criteria, click OK to perform the search and update the Dataset Manager window. The Advanced Dataset Search dialog will remain visible until you click Close. This allows you to refine your search or perform additional searches if needed. If you specify multiple search criteria, a dataset must satisfy all of the specified criteria to be shown in the Dataset Manager.

3.4.3 Dataset Filtering

Another option for finding datasets is to use the filtering options of the Dataset Manager. (See Sec. 2.6.8 for a complete description of the Filter Rows dialog.) Filtering helps narrow down the list of datasets already shown in the Dataset Manager. Click the Filter Rows button in the toolbar to bring up the Filter Rows dialog. In the dialog, you can create a filter to show only datasets whose dataset type contains the word “Inventory” (see Fig. 3.7).

Figure 3.7: Create Filter by Dataset Type
Figure 3.7: Create Filter by Dataset Type

Once you’ve entered the filter criteria, click OK to return to the Dataset Manager. The list of datasets has now been reduced to only those matching the filter as shown in Fig. 3.8.

Figure 3.8: Datasets Filtered by Dataset Type
Figure 3.8: Datasets Filtered by Dataset Type

Using filtering allows you to search for datasets using any column shown in the Dataset Manager. Remember that filtering will only apply to the datasets already shown in the table - it doesn’t search the database for additional datasets like the Advanced Dataset Search feature.

3.5 Viewing and Editing Dataset Properties

To view or edit the properties of a dataset, select the dataset in the Dataset Manager and then click either the View or Edit Properties button at the bottom of the window. The Dataset Properties View or Editor window will be displayed with the Summary tab selected as shown in Fig. 3.9. If multiple datasets are selected, separate Dataset Properties windows will be displayed for each selected dataset.

Figure 3.9: Dataset Properties Editor - Summary Tab
Figure 3.9: Dataset Properties Editor - Summary Tab

The interface for viewing dataset properties is very similar to the editing interface except that the values are all read-only. In this section, we will show the editing versions of the interface so that all available options are shown. In general, if you don’t need to edit a dataset, it’s better to just view the properties since viewing the dataset doesn’t lock it for editing by another user.

The Dataset Properties window divides its data into several tabs. Tbl. 3.9 gives a brief description of each tab.

Table 3.9: Dataset Properties Tabs
Tab Description
Summary Shows high-level properties of the dataset
Data Provides access to the actual data stored for the dataset
Keywords Shows additional types of metadata not found on the Summary tab
Notes Shows comments that users have made about the dataset and questions they may have
Revisions Shows the revisions that have been made to the dataset
History Shows how the dataset has been used in the past
Sources Shows where the data came from and where it is stored in the database, if applicable
QA Shows QA steps that have been run using the dataset

There are several buttons at the bottom of the editor window that appear on all tabs:

3.5.1 Summary

The Summary tab of the Dataset Properties Editor (Fig. 3.9) displays high level summary information about the Dataset. Many of these properties are shown in the list of datasets displayed by the Dataset Manager and as a result are described in Tbl. 3.6. The additional properties available in the Summary tab are described in Tbl. 3.10.

Table 3.10: Summary Tab Dataset Properties (not included in Dataset Manager)
Column Description
Description Descriptive information about the dataset. The contents of this field are initially populated from the full-line comments found in the header and other sections of the file used to create the dataset when it is imported. Users are free to add on to the contents of this field which is written to the top of the resulting file when the data is exported from the EMF.
Sector The emissions sector to which this data applies.
Country The country to which the data applies.
Last Accessed Date The date/time the data was last exported.
Creation Date The date/time the dataset was created.
Default Version Indicates which version of the dataset is considered to be the default. The default version of a dataset is important in that it indicates to other users and to some quality assurance queries the appropriate version of the dataset to be used.

Values of text fields (boxes with white background) are changed by typing into the fields. Other properties are set by selecting items from pull-down menus.

Some notes about updating the various editable fields follow:

3.5.2 Data

The Data tab of the Dataset Properties Editor (Fig. 3.10) provides access to the actual data stored for the dataset. If the dataset has multiple versions, they will be listed in the Versions table.

Figure 3.10: Dataset Properties Editor - Data Tab
Figure 3.10: Dataset Properties Editor - Data Tab

To view the data associated with a particular version, select the version and click the View button. For more information about viewing the raw data, see Sec. 3.6. The Copy button allows you to copy any version of the data marked as final to a new dataset.

3.5.3 Keywords

The Keywords tab of the Dataset Properties Editor (Fig. 3.11) shows additional types of metadata about the dataset stored as keyword-value pairs.

Figure 3.11: Dataset Properties Editor - Keywords Tab
Figure 3.11: Dataset Properties Editor - Keywords Tab

The Keywords Specific to Dataset Type section show keywords associated with the dataset’s type. These keywords are described in Sec. 3.2.

Additional dataset-specific keywords can be added by clicking the Add button. A new entry will be added to the Keyword Specific to Dataset section of the window. Type the keyword and its value in the Keyword and Value cells.

3.5.4 Notes

The Notes tab of the Dataset Properties Editor (Fig. 3.12) shows comments that users have made about the dataset and questions they may have. Each note is associated with a particular version of a dataset.

Figure 3.12: Dataset Properties Editor - Notes Tab
Figure 3.12: Dataset Properties Editor - Notes Tab

To create a new note about a dataset, click the Add button and the Create New Note dialog will open (Fig. 3.13). Notes can reference other notes so that questions can be answered. Click the Set button to display other notes for this dataset and select any referenced notes.

Figure 3.13: Create New Note
Figure 3.13: Create New Note

The Add Existing button in the Notes tab opens a dialog to add existing notes to the dataset. This feature is useful if you need to add the same note to a set of files. Add a new note for the first dataset and then for subsequent datasets, use the “Note name contains:” field to search for the newly added note. In the list of matched notes, select the note to add and click the OK button.

Figure 3.14: Add Existing Notes to Dataset
Figure 3.14: Add Existing Notes to Dataset

3.5.5 Revisions

The Revisions tab of the Dataset Properties Editor (Fig. 3.15) shows revisions that have been made to the data contained in the dataset. See Sec. 3.7 for more information about editing the raw data.

Figure 3.15: Dataset Properties Editor - Revisions Tab
Figure 3.15: Dataset Properties Editor - Revisions Tab

3.5.6 History

The History tab of the Dataset Properties Editor (Fig. 3.16) shows the export history of the dataset. When the dataset is exported, a history record is automatically created containing the name of the user who exported the data, the version that was exported, the location on the server where the file was exported, and statistics about how many lines were exported and the export time.

Figure 3.16: Dataset Properties Editor - History Tab
Figure 3.16: Dataset Properties Editor - History Tab

3.5.7 Sources

The Sources tab of the Dataset Properties Editor (Fig. 3.17) shows where the data associated with the dataset came from and where it is stored in the database, if applicable. For datasets where the data is stored in the EMF database, the Table column shows the name of the table in the EMF database and Source lists the original file the data was imported from.

Figure 3.17: Dataset Properties Editor - Sources Tab
Figure 3.17: Dataset Properties Editor - Sources Tab

Fig. 3.18 shows the Sources tab for a dataset that references external files. In this case, there is no Table column since the data is not stored in the EMF database. The Source column lists the current location of the external file. If the location of the external file changes, you can click the Update button to browse for the file in its new location.

Figure 3.18: Sources for External Dataset
Figure 3.18: Sources for External Dataset

3.5.8 QA

The QA tab of the Dataset Properties Editor (Fig. 3.19) shows the QA steps that have been run using the dataset. See Sec. 4 for more information about setting up and running QA steps.

Figure 3.19: Dataset Properties Editor - QA Tab
Figure 3.19: Dataset Properties Editor - QA Tab

3.6 Viewing Raw Data

The EMF allows you to view and edit the raw data stored for each dataset. To work with the data, select a dataset from the Dataset Manager and click the Edit Data button to open the Dataset Versions Editor (Fig. 3.20). This window shows the same list of versions as the Dataset Properties Data tab (Sec. 3.5.2).

Figure 3.20: Dataset Versions Editor
Figure 3.20: Dataset Versions Editor

To view the data, select a version and click the View Data button. The raw data is displayed in the Data Viewer as shown in Fig. 3.21.

Figure 3.21: Data Viewer
Figure 3.21: Data Viewer

Since the data stored in the EMF may have millions of rows, the client application only transfers a small amount of data (300 rows) from the server to your local machine at a time. The area in the top right corner of the Data Viewer displays information about the currently loaded rows along with controls for paging through the data. The single left and right arrows move through the data one chunk at a time while the double arrows jump to the beginning and end of the data. If you hover your mouse over an arrow, a tooltip will pop up to remind you of its function. The slider allows you to quickly jump to different parts of the data.

You can control how the data are sorted by entering a comma-separated list of columns in the Sort Order field and then clicking the Apply button. A descending sort can be specified by following the column name with desc.

The Row Filter field allows you to enter criteria and filter the rows that are displayed. The syntax is similar to a SQL WHERE clause. Tbl. 3.11 shows some example filters and the syntax for each.

Table 3.11: Examples of Row Filter Syntax
Filter Purpose Row Filter Syntax
Filter on a particular set of SCCs scc like '101%' or scc like '102%'
Filter on a particular set of pollutants poll in ('PM10', 'PM2_5')
Filter sources only in NC (State FIPS = 37), SC (45), and VA (51);
note that FIPS column format is State + County FIPS code (e.g., 37001)
substring(FIPS,1,2) in ('37', '45', '51')
Filter sources only in CA (06) and include only NOx and VOC pollutants fips like '06%' and (poll = 'NOX' or poll = 'VOC')

Fig. 3.22 shows the data sorted by the column “ratio” in descending order and filtered to only show rows where the FIPS code is “13013”.

Figure 3.22: Data Viewer with Custom Sort and Row Filter
Figure 3.22: Data Viewer with Custom Sort and Row Filter

The Row Filter syntax used in the Data Viewer can also be used when exporting datasets to create filtered export files (Sec. 3.8.1. If you would like to create a new dataset based on a filtered existing dataset, you can export your filtered dataset and then import the resulting file as a new dataset. Sec. 3.8 describes exporting datasets and Sec. 3.9 explains how to import datasets.

3.7 Editing Raw Data

The EMF does not allow data to be edited after a version has been marked as final. If a dataset doesn’t have a non-final version, first you will need to create a new version. Open the Dataset Versions Editor as shown in Fig. 3.20. Click the New Version button to bring up the Create a New Version dialog window like Fig. 3.23.

Figure 3.23: Create New Dataset Version
Figure 3.23: Create New Dataset Version

Enter a name for the new version and select the base version. The base version is the starting point for the new version and can only be a version that is marked as final. Click OK to create the new version. The Dataset Versions Editor will show your newly created version (Fig. 3.24).

Figure 3.24: Dataset Versions Editor with Non-Final Version
Figure 3.24: Dataset Versions Editor with Non-Final Version

You can now select the non-final version and click the Edit Data button to display the Data Editor as shown in Fig. 3.25.

Figure 3.25: Data Editor
Figure 3.25: Data Editor

The Data Editor uses the same paging mechanisms, sort, and filter options as the Data Viewer described in Sec. 3.6. You can double-click a data cell to edit the value. The toolbar shown in Fig. 3.26 provides options for adding and deleting rows.

Figure 3.26: Data Editor Toolbar
Figure 3.26: Data Editor Toolbar

The functions of each toolbar button are described below, listed left to right:

  1. Insert Above: Inserts a new row above the currently selected row.
  2. Insert Below: Inserts a new row below the currently selected row.
  3. Delete: Deletes the selected rows. When you click this button, you will be prompted to confirm the deletion.
  4. Copy Selected Rows: Copies the selected rows.
  5. Insert Copied Rows Below: Pastes the copied rows below the currently selected row.
  6. Select All: Selects all rows.
  7. Clear All: Clears all selections.
  8. Find and Replace Column Values: Opens the Find and Replace Column Values dialog shown in Fig. 3.27.
Figure 3.27: Find and Replace Column Values Dialog
Figure 3.27: Find and Replace Column Values Dialog

In the Data Editor window, you can undo your changes by clicking the Discard button. Otherwise, click the Save button to save your changes. If you have made changes, you will need to enter Revision Information before the EMF will allow you to close the window. Revisions for a dataset are shown in the Dataset Properties Revisions tab (see Sec. 3.5.5).

3.8 Exporting Datasets

When you export a dataset, the EMF will generate a file containing the data in the format defined by the dataset’s type. To export a dataset, you can either select the dataset in the Dataset Manager window and click the Export button or you can click the Export button in the Dataset Properties window. Either way will open the Export dialog as shown in Fig. 3.28. If you have multiple datasets selected in the Dataset Manager when you click the Export button, the Export dialog will list each dataset in the Datasets field.

Figure 3.28: Export Dialog
Figure 3.28: Export Dialog

Typically, you will check the Download files to local machine? checkbox. With this option, the EMF will export the dataset to a file on the EMF server and then automatically download it to your local machine. When downloading files to your local machine, the folder input field is not active. The downloaded files will be placed in a temporary directory on your local computer. The EMF property local.temp.dir controls the location of the temporary directory. EMF properties can be edited in the EMFPrefs.txt file. Note that the Overwrite files if they exit? checkbox isn’t functional at this point.

You can enter a prefix to be added to the names of the exported files in the File Name Prefix field. Exported files will be named based on the dataset name and may have prefixes or suffixes attached based on keywords associated with the dataset or dataset type.

If you are exporting a single dataset and that dataset has multiple versions, the Version pull-down menu will allow you to select which version you would like to export. If you are exporting multiple datasets, the default version of each dataset will be exported.

The Row Filter, Filter Dataset, and Filter Dataset Join Condition fields allow for filtering the dataset during export to reduce the total number of rows exported. See Sec. 3.8.1 for more information about these settings.

Before clicking the Export button, enter a Purpose for your export. This will be logged as part of the history for the dataset. If you do not enter any text in the Purpose field, the fact that you exported the dataset will still be logged as part of the dataset’s history. At this time, history records are only created when the Download files to local machine? checkbox is not checked.

After clicking the Export button, check the Status window to see if any problems arise during the export. If the export succeeds, you will see a status message like

Completed export of nonroad_caps_2005v2_jul_orl_nc.txt to <server directory>/nonroad_caps_2005v2_jul_orl_nc.txt in 2.137 seconds. The file will start downloading momentarily, see the Download Manager for the download status.

You can bring up the Downloads window as shown in Fig. 3.29 by opening the Window menu at the top of the EMF main window and selecting Downloads.

Figure 3.29: Downloads Window
Figure 3.29: Downloads Window

As your file is downloading, the progress bar on the right side of the window will update to show you the progress of the download. Once it reaches 100%, your download is complete. Right click on the filename in the Downloads window and select Open Containing Folder to open the folder where the file was downloaded.

3.8.1 Export Filtering Options

The export filtering options allow you to select and export portions of a dataset based on your matching criteria.

The Row Filter field shown in the Export Dialog in Fig. 3.28 uses the same syntax as the Data Viewer window (Sec. 3.6) and allows you to export only a subset of the data. Example filters are shown in Tbl. 3.11.

Filter Dataset and Filter Dataset Join Condition, also shown in Fig. 3.28, allow for advanced filtering of the dataset using an additional dataset. For example, if you are exporting a nonroad inventory, you can choose to only export rows that match a different inventory by FIPS code or SCC. When you click the Add button, the Select Datasets dialog appears as in Fig. 3.30.

Figure 3.30: Select Filter Datasets
Figure 3.30: Select Filter Datasets

Select the dataset type for the dataset you want to use as a filter from the pull-down menu. You can use the Dataset name contains field to further narrow down the list of matching datasets. Click on the dataset name to select it and then click OK to return to the Export dialog.

The selected dataset is now shown in the Filter Dataset box. If the filter dataset has multiple versions, click the Set Version button to select which version to use for filtering. You can remove the filter dataset by clicking the Remove button.

Next, you will enter the criteria to use for filtering in the Filter Dataset Join Condition textbox. The syntax is similar to a SQL JOIN condition where the left hand side corresponds to the dataset being exported and the right hand side corresponds to the filter dataset. You will need to know the column names you want to use for each dataset.

Table 3.12: Examples of Filter Dataset Join Conditions
Type of Filter Filter Dataset Join Condition
Export records where the FIPS, SCC, and plant IDs are the same in both datasets;
both datasets have the same column names
fips=fips
scc=scc
plantid=plantid
Export records where the SCC, state codes, and pollutants are the same in both datasets;
the column names differ between the datasets
scc=scc_code
substring(fips,1,2)=state_cd
poll=poll_code

Once your filter conditions are set up, click the Export button to begin the export. Only records that match all of the filter conditions will be exported. Status messages in the Status window will contain additional information about your filter. If no records match your filter condition, the export will fail and you will see a status message like:

Export failure. ERROR: nonroad_caps_2005v2_jul_orl_nc.txt will not be exported because no records satisfied the filter

If the export succeeds, the status message will include a count of the number of records in the database and the number of records exported:

No. of records in database: 150845; Exported: 26011

3.9 Importing Datasets

Importing a dataset is the process where the EMF reads a data file or set of data files from disk, stores the data in the database (for internal dataset types), and creates metadata about the dataset. To import a dataset, start by clicking the Import button in the bottom right corner of the Dataset Manager window (Fig. 3.4). The Import Datasets dialog will be displayed as shown in Fig. 3.31. You can also bring up the Import Datasets dialog from the main EMF File menu, then select Import.

Figure 3.31: Import Datasets Dialog
Figure 3.31: Import Datasets Dialog

An advantage to opening the Import Datasets dialog from the Dataset Manager as opposed to using the File menu is that if you have a dataset type selected in the Dataset Manager Show Datasets of Type pull-down menu, then that dataset type will automatically be selected for you in the Import Datasets dialog.

In the Import Datasets dialog, first use the Dataset Type pull-down menu to select the dataset type corresponding to the file you want to import. For example, if your data file is a annual point-source emissions inventory in Flat File 2010 (FF10) format, you would select the dataset type “Flat File 2010 Point”. Sec. 3.2.1 lists commonly used dataset types. Keep in mind that your EMF installation may have different dataset types available.

Most dataset types specify that datasets of that type will use data from a single file. For example, for the Flat File 2010 Point dataset type, you will need to select exactly one file to import per dataset. Other dataset types can require or optionally allow multiple files to import into a single dataset. Some dataset types can use a large number of files like the Day-Specific Point Inventory (External Multifile) dataset type which allows up to 366 files for a single dataset. Thus, the Import Datasets dialog will allow you to select multiple files during the import process and has tools for easily matching multiple files.

Next, select the folder where the data files to import are located on the EMF server. You can either type or paste (using Ctrl-V) the folder name into the field labeled Folder, or you can click the Browse button to open the remote file browser as shown in Fig. 3.32. Important! To import data files, the files must be accessible by the machine that the EMF server is running on. If the data files are on your local machine, you will need to transfer them to the EMF server before you can import them.

Figure 3.32: Remote File Browser
Figure 3.32: Remote File Browser

To use the remote file browser, you can navigate from your starting folder to the file by either typing or pasting a directory name into the Folder field or by using the Subfolders list on the left side of the window. In the Subfolders list, double-click on a folder’s name to go into that folder. If you need to go up a level, double-click the .. entry.

Once you reach the folder that contains your data files, select the files to import by clicking the checkbox next to each file’s name in the Files section of the browser. The Files section uses the Sort-Filter-Select Table described in Sec. 2.6.6 to list the files. If you have a large number of files in the directory, you can use the sorting and filtering options of the Sort-Filter-Select Table to help find the files you need.

You can also use the Pattern field in the remote file browser to only show files matching the entered pattern. By default the pattern is just the wildcard character * to match all files. Entering a pattern like arinv*2002*txt will match filenames that start with “arinv”, have “2002” somewhere in the filename, and then end with “txt”.

Once you’ve selected the files to import, click OK to save your selections and return to the Import Datasets dialog. The files you selected will be listed in the Filenames textbox in the Import Datasets dialog as shown in Fig. 3.33. If you selected a single file, the Dataset Names field will contain the filename of the selected file as the default dataset name.

Figure 3.33: Import Dataset from Single File
Figure 3.33: Import Dataset from Single File

Update the Dataset Names field with your desired name for the dataset. If the dataset type has EXPORT_PREFIX or EXPORT_SUFFIX keywords assigned, these values will be automatically stripped from the dataset name. For example, the ORL Nonpoint Inventory (ARINV) dataset type defines EXPORT_PREFIX as “arinv_” and EXPORT_SUFFIX as “_orl.txt”. Suppose you select an ORL nonpoint inventory file named “arinv_nonpt_pf4_cap_nopfc_2017ct_ref_orl.txt” to import. By default the Dataset Names field in the Import Datasets dialog will be populated with “arinv_nonpt_pf4_cap_nopfc_2017ct_ref_orl.txt” (the filename). On import, the EMF will automatically convert the dataset name to “nonpt_pf4_cap_nopfc_2017ct_ref” removing the EXPORT_PREFIX and EXPORT_SUFFIX.

Click the Import button to start the dataset import. If there are any problems with your import settings, you’ll see a red error message displayed at the top of the Import Datasets window. Tbl. 3.13 shows some example error messages and suggested solutions.

Table 3.13: Dataset Import Error Messages
Example Error Message Solution
A Dataset Type should be selected Select a dataset type from the Dataset Type pull-down menu.
A Filename should be specified Select a file to import.
A Dataset Name should be specified Enter a dataset name in the Dataset Names textbox.
The ORL Nonpoint Inventory (ARINV) importer can use at most 1 files You selected too many files to import for the dataset type. Select the correct number of files for the dataset type. If you want to import multiple files of the same dataset type, see Sec. 3.9.1.
The NIF3.0 Nonpoint Inventory importer requires at least 2 files You didn’t select enough files to import for the dataset type. Select the correct number of files for the dataset type.
Dataset name nonpt_pf4_cap_nopfc_2017ct_ref has been used. Each dataset in the EMF needs a unique dataset name. Update the dataset name to be unique. Remember that the EMF will automatically remove the EXPORT_PREFIX and EXPORT_SUFFIX if defined for the dataset type.

If your import settings are good, you will see the message “Started import. Please monitor the Status window to track your import request.” displayed at the top of the Import Datasets window as shown in Fig. 3.34.

Figure 3.34: Import Datasets: Started Import
Figure 3.34: Import Datasets: Started Import

In the Status window, you will see a status message like:

Started import of nonpt_pf4_cap_nopfc_2017ct_nc_sc_va_18jan2012_v0 [ORL Nonpoint Inventory (ARINV)] from arinv_nonpt_pf4_cap_nopfc_2017ct_nc_sc_va_18jan2012_v0.txt

Depending on the size of your file, the import can take a while to complete. Once the import is complete, you will see a status message like:

Completed import of nonpt_pf4_cap_nopfc_2017ct_nc_sc_va_18jan2012_v0 [ORL Nonpoint Inventory (ARINV)] in 57.6 seconds from arinv_nonpt_pf4_cap_nopfc_2017ct_nc_sc_va_18jan2012_v0.txt

To see your newly imported dataset, open the Dataset Manager window and find your dataset by dataset type or using the Advanced search. You may need to click the Refresh button in the upper right corner of the Dataset Manager window to get the latest dataset information from the EMF server.

3.9.1 Importing Multiple Datasets

You can use the Import Datasets window to import multiple datasets of the same type at once. In the remote file browser (shown in Fig. 3.32), select all the files you would like to import and click OK. In the Import Datasets window, check the checkbox Create Multiple Datasets as shown in Fig. 3.35. The Dataset Names textbox goes away.

Figure 3.35: Import Multiple Datasets
Figure 3.35: Import Multiple Datasets

For each dataset, the EMF will automatically name the dataset using the corresponding filename. If the keywords EXPORT_PREFIX or EXPORT_SUFFIX are defined for the dataset type, the keyword values will be stripped from the filenames when generating the dataset names. If these keywords are not defined for the dataset type, then the dataset name will be identical to the filename.

Click the Import button to start importing the datasets. The Status window will display Started and Completed status messages for each dataset as it is imported.

3.10 Suggestions for Dataset Organization

Figure 3.36: Edit User Profile - Hide Dataset Types
Figure 3.36: Edit User Profile - Hide Dataset Types

4 Dataset Quality Assurance

4.1 Introduction

The EMF allows you to perform various types of analyses on a dataset or set of datasets. For example, you can summarize the data by different aspects such as geographic region like county or state, SCC code, pollutant, or plant ID. You can also compare or sum multiple datasets. Within the EMF, running an analysis like this is called a QA step.

A dataset can have many QA steps associated with it. To view a dataset’s QA steps, first select the dataset in the Dataset Manager and click the Edit Properties button. Switch to the QA tab to see the list of QA steps as in Fig. 4.1.

Figure 4.1: QA Steps for a Dataset
Figure 4.1: QA Steps for a Dataset

At the bottom of the window you will see a row of buttons for interacting with the QA steps starting with Add from Template, Add Custom, Edit, etc. If you do not see these buttons, make sure that you are editing the dataset’s properties and not just viewing them.

4.2 Add QA Step From Template

Each dataset type can have predefined QA steps called QA Step Templates. QA step templates can be added to a dataset type and configured by EMF Administrators using the Dataset Type Manager (see Sec. 3.2). QA step templates are easy to run for a dataset because they’ve already been configured.

To see a list of available QA step templates for your dataset, open your dataset’s QA tab in the Dataset Properties Editor (Fig. 4.1). Click the Add from Template button to open the Add QA Steps dialog. Fig. 4.2 shows the available QA step templates for an ORL Nonroad Inventory.

Figure 4.2: Add QA Steps From Template
Figure 4.2: Add QA Steps From Template

The ORL Nonroad Inventory has various QA step templates for generating different summaries of the inventory.

Summaries “with Descriptions” include more information than those without. For example, the results of the “Summarize by SCC and Pollutant with Descriptions” QA step will include the descriptions of the SCCs and pollutants. Because these summaries with descriptions need to retrieve data from additional tables, they are a bit slower to generate compared to summaries without descriptions.

Select a summary of interest (for example, Summarize by County and Pollutant) by clicking the QA step name. If your dataset has more than one version, you can choose which version to summarize using the Version pull-down menu at the top of the window. Click OK to add the QA step to the dataset.

The newly added QA step is now shown in the list of QA steps for the dataset (Fig. 4.3).

Figure 4.3: QA Steps with New Step Added
Figure 4.3: QA Steps with New Step Added

To see the details of the QA step, select the step and click the Edit button. This brings up the Edit QA Step window like Fig. 4.4.

Figure 4.4: Edit New QA Step from Template
Figure 4.4: Edit New QA Step from Template

The QA step name is shown at the top of the window. This name was automatically set by the QA step template. You can edit this name if needed to distinguish this step from other QA steps.

The Version pull-down menu shows which version of the data this QA step will run on.

The pull-down menu to the right of the Version setting indicates what type of program will be used for this QA step. In this case, the program type is “SQL” indicating that the results of this QA step will be generated using a SQL query. Most of the summary QA steps are generated using SQL queries. The EMF allows other types of programs to be run as QA steps including Python scripts and various built-in analyses like converting average-day emissions to an annual inventory.

The Arguments textbox shows the arguments used by the QA step program. In this case, the QA step is a SQL query and the Arguments field shows the query that will be run. The special SQL syntax used for QA steps is discussed in Sec. 4.10.

Other items of interest in the Edit QA Step window include the description and comment textboxes where you can enter a description of your QA step and any comments you have about running the step.

The QA Status field shows the overall status of the QA step. Right now the step is listed as “Not Started” because it hasn’t been run yet. Once the step has been run, the status will automatically change to “In Progress”. After you’ve reviewed the results, you can mark the step as “Complete” for future reference.

The Edit QA Step window also includes options for exporting the results of a QA step to a file. This is described in Sec. 4.5.

At this point, the next step is to actually run the QA step as described in Sec. 4.4.

4.3 Adding Custom QA Steps

In addition to using QA steps from templates, you can define your own custom QA steps. From the QA tab of the Dataset Properties Editor (Fig. 4.1), click the Add Custom button to bring up the Add Custom QA Step dialog as shown in Fig. 4.5.

Figure 4.5: Add Custom QA Step Dialog
Figure 4.5: Add Custom QA Step Dialog

In this dialog, you can configure your custom QA step by entering its name, the program to use, and the program’s arguments.

Creating a custom QA step from scratch is an advanced feature. Oftentimes, you can start by copying an existing step and tweaking it through the Edit QA Step interface.

Sec. 4.7 shows how to create a custom QA step that uses the built-in QA program “Average day to Annual Inventory” to calculate annual emissions from average-day emissions. Sec. 4.8 demonstrates using the Compare Datasets QA program to compare two inventories. Sec. 4.9 gives an example of creating a custom QA step based on a SQL query from an existing QA step.

4.4 Running QA Steps

To run a QA step, open the QA tab of the Dataset Properties Editor and select the QA step you want to run as shown in Fig. 4.6.

Figure 4.6: Select a QA Step to Run
Figure 4.6: Select a QA Step to Run

Click the Run button at the bottom of the window to run the QA step. You can also run a QA step from the Edit QA Step window. The Status window will display messages when the QA step begins running and when it completes:

Started running QA step ‘Summarize by County and Pollutant’ for Version ‘Initial Version’ of Dataset ‘nonroad_caps_2005v2_jul_orl_nc.txt’

Completed running QA step ‘Summarize by County and Pollutant’ for Version ‘Initial Version’ of Dataset ‘nonroad_caps_2005v2_jul_orl_nc.txt’

In the QA tab, click the Refresh button to update the table of QA steps as shown in Fig. 4.7.

Figure 4.7: Refreshed QA Steps
Figure 4.7: Refreshed QA Steps

The overall QA step status (the QA Status column) has changed from “Not Started” to “In Progress” and the Run Status is now “Success”. The list of QA steps also shows the time the QA step was run in the When column.

To view the results of the QA step, select the step in the QA tab and click the View Results button. A dialog like Fig. 4.8 will pop-up asking how many records of the results you would like to preview.

Figure 4.8: View QA Results: Select Number of Records
Figure 4.8: View QA Results: Select Number of Records

Enter the number of records to view or click the View All button to see all records. The View QA Step Results window will display the results of the QA step as shown in Fig. 4.9.

Figure 4.9: View QA Results
Figure 4.9: View QA Results

4.5 Exporting QA Step Results

In addition to viewing the results of a QA step in the EMF client application, you can export the results as a comma-separated values (CSV) file. CSV files can be directly opened by Microsoft Excel or other spreadsheet programs to make charts or for further analysis.

To export the results of a QA step, select the QA step of interest in the QA tab of the Dataset Properties Editor. Then click the Edit button to bring up the Edit QA Step window as shown in Fig. 4.10.

Figure 4.10: Export QA Step Results
Figure 4.10: Export QA Step Results

Typically, you will want to check the Download result file to local machine? checkbox so the exported file will automatically be downloaded to your local machine. You can type in a name for the exported file in the Export Name field. Then click the Export button. If you did not enter an Export Name, the application will confirm that you want to use an auto-generated name with the dialog shown in Fig. 4.11.

Figure 4.11: Export Name Not Specified
Figure 4.11: Export Name Not Specified

Next, you’ll see the Export QA Step Results customization window (Fig. 4.12).

Figure 4.12: Export QA Step Results Customization Window
Figure 4.12: Export QA Step Results Customization Window

The Row Filter textbox allows you to limit which rows of the QA step results to include in the exported file. Tbl. 3.11 provides some examples of the syntax used by the row filter. Available Columns lists the column names from the results that could be used in a row filter. In Fig. 4.12, the columns fips, poll, and ann_emis are available. To export only the results for counties in North Carolina (state FIPS code = 37), the row filter would be fips like '37%'.

Click the Finish button to start the export. At the top of the Edit QA Step window, you’ll see the message “Started Export. Please monitor the Status window to track your export request.” like Fig. 4.13

Figure 4.13: Export QA Step Results Started
Figure 4.13: Export QA Step Results Started

Once your export is complete, you will see a message in the Status window like

Completed exporting QA step ‘Summarize by SCC and Pollutant’ for Version ‘Initial Version’ of Dataset ‘nonpt_pf4_cap_nopfc_2017ct_nc_sc_va’ to <server directory>avg_day_scc_poll_summary.csv. The file will start downloading momentarily, see the Download Manager for the download status.

You can bring up the Downloads window as shown in Fig. 4.14 by opening the Window menu at the top of the EMF main window and selecting Downloads.

Figure 4.14: Downloads Window: QA Step Results
Figure 4.14: Downloads Window: QA Step Results

As your file is downloading, the progress bar on the right side of the window will update to show you the progress of the download. Once it reaches 100%, your download is complete. Right click on the filename in the Downloads window and select Open Containing Folder to open the folder where the file was downloaded.

If you have Microsoft Excel or another spreadsheet program installed, you can double-click the downloaded CSV file to open it.

4.6 Exporting KMZ Files

QA step results that include latitude and longitude information can be mapped with geographic information systems (GIS), mapping tools, and Google Earth. Many summaries that have “with Descriptions” in their names include latitude and longitude values. For plant-level summaries, the latitude and longitude in the output are the average of all the values for the specific combination of FIPS and plant ID. For county- and state-level summaries, the latitude and longitude are the centroid values specified in the “fips” table of the EMF reference schema.

To export a KMZ file that can be loaded into Google Earth, you will first need to view the results of the QA step. You can view a QA step’s results by either selecting the QA step in the QA tab of the Dataset Properties Editor (see Fig. 4.1) and then clicking the View Results button, or you can click View Results from the Edit QA Step window. Fig. 4.15 shows the View QA Step Results window for a summary by county and pollutant with descriptions. The summary includes latitude and longitude values for each county.

Figure 4.15: View QA Step Results with Latitude and Longitude Values
Figure 4.15: View QA Step Results with Latitude and Longitude Values

From the File menu in the top left corner of the View QA Step Results window, select Google Earth. Make sure to look at the File menu for the View QA Step Results window, not the main EMF application. The Create Google Earth file window will be displayed as shown in Fig. 4.16.

Figure 4.16: Create Google Earth File
Figure 4.16: Create Google Earth File

In the Create Google Earth file window, the Label Column pull-down menu allows you to select which column will be used to label the points in the KMZ file. This label will appear when you mouse over a point in Google Earth. For a plant summary, this would typically be “plant_name”; county or state summaries would use “county” or “state_name” respectively.

If your summary has data for multiple pollutants, you will often want to specify a filter so that data for only one pollutant is included in the KMZ file. To do this, specify a Filter Column (e.g. “poll”) and then type in a Filter Value (e.g. "EVP__VOC").

The Data Column pull-down menu specifies the column to use for the value displayed when you mouse over a point in Google Earth such as annual emissions (“ann_emis”). The mouse over information will have the form: <value from Label Column> : <value from Data Column>.

The Maximum Data Cutoff and Minimum Data Cutoff fields allow you to exclude data points above or below certain thresholds.

If you want to control the size of the points, you can adjust the value of the Icon Scale setting between 0 and 1. The default setting is 0.3; values smaller than 0.3 result in smaller circles and values larger than 0.3 will result in larger circles.

Tooltips are available for all of the settings in the Create Google Earth file window by mousing over each field.

Once you have specified your settings, click the Generate button to create the KMZ file. The location of the generated file is shown in the Output File field. If your computer has Google Earth installed, you can click the Open button to open the file in Google Earth.

If you find that you need to repeatedly create similar KMZ files, you can save your settings to a file by clicking the Save button. The next time you need to generate a Google Earth file, click the Load button next to the Properties File field to load your saved settings.

4.7 Average Day to Annual Inventory QA Program

In addition to analyzing individual datasets, the EMF can run QA steps that use multiple datasets. In this section, we’ll show how to create a custom QA step that calculates an annual inventory from 12 month-specific average-day emissions inventories.

To get started, we’ll need to select a dataset to associate the QA step with. As a best practice, add the QA step to the January-specific dataset in the set of 12 month-specific files. This isn’t required by the EMF but it can make finding multi-file QA steps easier later on. If you have more than 12 month-specific files to use (e.g. 12 non-California inventories and 12 California inventories), add the QA step to the “main” January inventory file (e.g. the non-California dataset).

After determining which dataset to add the QA step to, create a new custom QA step as described in Sec. 4.3. Fig. 4.17 shows the Add Custom QA Step dialog. We’ve entered a name for the step and used the Program pull-down menu to select “Average day to Annual Inventory”.

Figure 4.17: Add Custom QA Step Using Average Day to Annual Inventory QA Program
Figure 4.17: Add Custom QA Step Using Average Day to Annual Inventory QA Program

“Average day to Annual Inventory” is a QA program built into the EMF that takes a set of average-day emissions inventories as input and outputs an annual inventory by calculating monthly total emissions and summing all months. Click the OK button in the Add Custom QA Step dialog to save the new QA step. We’ll enter the QA program arguments in a minute. Back in the QA tab of the Dataset Properties Editor, select the newly created QA step and click Edit to open the Edit QA Step window shown in Fig. 4.18.

Figure 4.18: Edit Custom QA Step
Figure 4.18: Edit Custom QA Step

We need to define the arguments that will be sent to the QA program that this QA step will run. The QA program is “Average day to Annual Inventory” so the arguments will be a list of month-specific inventories. Click the Set button to the right of the Arguments box to open the Set Inventories dialog as shown in Fig. 4.19.

Figure 4.19: Set Inventories for Average Day to Annual Inventory QA Program
Figure 4.19: Set Inventories for Average Day to Annual Inventory QA Program

The Set Inventories dialog is specific to the “Average day to Annual Inventory” QA program. Other QA programs have different dialogs for setting up their arguments. The January inventory that we added the QA step to is already listed. We need to add the other 11 month-specific inventory files. Click the Add button to open the Select Datasets dialog shown in Fig. 4.20.

Figure 4.20: Select Datasets for QA Program
Figure 4.20: Select Datasets for QA Program

In the Select Datasets dialog, the dataset type is automatically set to ORL Nonroad Inventory (ARINV) matching our January inventory. The other ORL nonroad inventory datasets are shown in a list. We can use the Dataset name contains: field to enter a search term to narrow the list. We’re using 2005 inventories so we’ll enter 2005 as our search term to match only those datasets whose name contains “2005”. Then we’ll select all the inventories in the list as shown in Fig. 4.21.

Select inventories by clicking on the dataset name. You can select a range of datasets by clicking on the first dataset you want to select in the list. Then hold down the Shift key while clicking on the last dataset you want to select. All of the datasets in between will also be selected. If you hold down the Ctrl key while clicking on datasets, you can select multiple items from the list that aren’t next to each other.

Figure 4.21: Select Filtered Datasets for QA Program
Figure 4.21: Select Filtered Datasets for QA Program

Click the OK button in the Select Datasets dialog to save the selected inventories and return to the Set Inventories dialog. As shown in Fig. 4.22, the list of emission inventories now contains all 12 month-specific datasets.

Figure 4.22: Inventories for Average Day to Annual Inventory QA Program
Figure 4.22: Inventories for Average Day to Annual Inventory QA Program

Click the OK button in the Set Inventories dialog to return to the Edit QA Step window shown in Fig. 4.23. The Arguments textbox now lists the 12 month-specific inventories and the flag (-inventories) needed for the “Average day to Annual Inventory” QA program.

Figure 4.23: Custom QA Step with Arguments Set
Figure 4.23: Custom QA Step with Arguments Set

Click the Save button at the bottom of the Edit QA Step window to save the QA step. This QA step can now be run as described in Sec. 4.4.

4.8 Compare Datasets QA Program

The Compare Datasets QA program allows you to aggregate and compare datasets using a variety of grouping options. You can compare datasets with the same dataset type or different types. In this section, we’ll set up a QA step to compare the average day emissions from two ORL nonroad inventories by SCC and pollutant.

First, we’ll select a dataset to associate the QA step with. In this example, we’ll be comparing January and February emissions using the January dataset as the base inventory. The EMF doesn’t dictate which dataset should have the QA step associated with it so we’ll choose the base dataset as a convention. From the Dataset Manager, select the January inventory (shown in Fig. 4.24) and click the Edit Properties button.

Figure 4.24: Select Dataset to Add QA Step
Figure 4.24: Select Dataset to Add QA Step

Open the QA tab (shown in Fig. 4.25) and click Add Custom to add a new QA step.

Figure 4.25: Dataset Editor QA Tab for Selected Dataset
Figure 4.25: Dataset Editor QA Tab for Selected Dataset

In the Add Custom QA Step dialog shown in Fig. 4.26, enter a name for the new QA step like “Compare to February”. Use the Program pull-down menu to select the QA program “Compare Datasets”.

Figure 4.26: Select QA Program for New QA Step
Figure 4.26: Select QA Program for New QA Step

You can enter a description of the QA step as shown in Fig. 4.27. Then click OK to save the QA step. We’ll be setting up the arguments to the Compare Datasets QA program in just a minute.

Figure 4.27: Add Description to New QA Step
Figure 4.27: Add Description to New QA Step

Back in the QA tab of the Dataset Properties Editor, select the newly created QA step and click the Edit button (see Fig. 4.28).

Figure 4.28: Select New QA Step from QA Tab
Figure 4.28: Select New QA Step from QA Tab

In the Edit QA Step window (shown in Fig. 4.29), click the Set button to the right of the Arguments textbox.

Figure 4.29: Edit New QA Step
Figure 4.29: Edit New QA Step

A custom dialog is displayed (Fig. 4.30) to help you set up the arguments needed by the Compare Datasets QA program.

Figure 4.30: Set Up Compare Datasets QA Step
Figure 4.30: Set Up Compare Datasets QA Step

To get started, we’ll set the base datasets. Click the Add button underneath the Base Datasets area to bring up the Select Datasets dialog shown in Fig. 4.31.

Figure 4.31: Select Base Datasets
Figure 4.31: Select Base Datasets

Select one or more datasets to use as the base datasets in the comparison. For this example, we’ll select the January inventory by clicking on the dataset name. Then click OK to close the dialog and return to the setup dialog. The setup dialog now shows the selected base dataset as in Fig. 4.32.

Figure 4.32: Base Dataset Set for Compare Datasets
Figure 4.32: Base Dataset Set for Compare Datasets

Next, we’ll add the dataset we want to compare against by clicking the Add button underneath the Compare Datasets area. The Select Datasets dialog is displayed like in Fig. 4.33. We’ll select the February inventory and click the OK button.

Figure 4.33: Select Compare Datasets
Figure 4.33: Select Compare Datasets

Returning to the setup dialog, the comparison dataset is now set as shown in Fig. 4.34.

Figure 4.34: Compare Dataset Set for Compare Datasets
Figure 4.34: Compare Dataset Set for Compare Datasets

The list of base and comparison datasets includes which version of the data will be used in the QA step. For example, the base dataset 2007JanORLTotMARAMAv3.txt [0 (Initial Version)] indicates that version 0 (named “Initial Version”) will be used. When you select the base and comparison datasets, the EMF automatically uses each dataset’s Default Version. If any of the datasets have a different version that you would like to use for the QA step, select the dataset name and then click the Set Version button underneath the selected dataset. The Set Version dialog shown in Fig. 4.35 lets you pick which version of the dataset you would like to use.

Figure 4.35: Set Dataset Version for Compare Datasets QA Program
Figure 4.35: Set Dataset Version for Compare Datasets QA Program

Next, we need to tell the Compare Datasets QA program how to compare the two datasets. We’re going to sum the average-day emissions in each dataset by SCC and pollutant and then compare the results from January to February. In the ORL Nonroad Inventory dataset type, the SCCs are stored in a field called scc, the pollutant codes are stored in a column named poll, and the average-day emissions are stored in a field called avd_emis. In the Group By Expressions textbox, type scc, press Enter, and then type poll. In the Aggregate Expressions textbox, type avd_emis. Fig. 4.36 shows the setup dialog with the arguments entered.

Figure 4.36: Arguments Set for Compare Datasets
Figure 4.36: Arguments Set for Compare Datasets

In this example, we’re comparing two datasets of the same type (ORL Nonroad Inventory). This means that the data field names will be consistent between the base and comparison datasets. When you compare datasets with different types, the field names might not match. The Matching Expressions textbox allows you to define how the fields from the base dataset should be matched to the comparison dataset. For this case, we don’t need to enter anything in the Matching Expressions textbox or any of the remaining fields in the setup dialog. The Compare Datasets arguments are described in more detail in Sec. 4.8.1.

In the setup dialog, click OK to save the arguments and return to the Edit QA Step window. The Arguments textbox now lists the arguments that we set up in the previous step (see Fig. 4.37).

Figure 4.37: QA Step with Arguments Set
Figure 4.37: QA Step with Arguments Set

The QA step is now ready to run. Click the Run button to start running the QA step. A message is displayed at the top of the window as shown in Fig. 4.38.

Figure 4.38: Started Running QA Step
Figure 4.38: Started Running QA Step

In the Status window, you’ll see a message about starting to run the QA step followed by a completion message once the QA step has finished running. Fig. 4.39 shows the two status messages.

Figure 4.39: QA Step Running in Status Window
Figure 4.39: QA Step Running in Status Window

Once the status message

Completed running QA step ‘Compare to February’ for Version ‘Initial Version’ of Dataset ‘2007JanORLTotMARAMAv3.txt’

is displayed, the QA step has finished running. In the Edit QA Step window, click the Refresh button to display the latest information about the QA step. The fields Run Status and Run Date will be populated with the latest run information as shown in Fig. 4.40.

Figure 4.40: QA Step with Run Status
Figure 4.40: QA Step with Run Status

Now, we can view the QA step results or export the results. First, we’ll view the results inside the EMF client. Click the View Results button to open the View QA Step Results window as shown in Fig. 4.41.

Figure 4.41: View Compare Datasets QA Step Results
Figure 4.41: View Compare Datasets QA Step Results

Tbl. 4.1 describes each column in the QA step results.

Table 4.1: QA Step Results Columns
Column Name Description
poll Pollutant code
scc SCC code
avd_emis_b Summed average-day emissions from base dataset (January) for this pollutant and SCC
avd_emis_c Summed average-day emissions from comparison dataset (February) for this pollutant and SCC
avd_emis_diff avd_emis_c - avd_emis_b
avd_emis_absdiff Absolute value of avd_emis_diff
avd_emis_pctdiff 100 * (avd_emis_diff / avd_emis_b)
avd_emis_abspctdiff Absolute value of avd_emis_pctdiff
count_b Number of records from base dataset included in this row’s results
count_c Number of records from comparison dataset included in this row’s results

To export the QA step results, return to the Edit QA Step window as shown in Fig. 4.42. Select the checkbox labeled Download result file to local machine?. In this example, we have entered an optional Export Name for the output file. If you don’t enter an Export Name, the output file will use an auto-generated name. Click the Export button.

Figure 4.42: Ready to Export QA Step Results
Figure 4.42: Ready to Export QA Step Results

The Export QA Step Results dialog will be displayed as shown in Fig. 4.43. For more information about the Row Filter option, see Sec. 4.5. To export all the result records, click the Finish button.

Figure 4.43: Export QA Step Results Options
Figure 4.43: Export QA Step Results Options

Back in the Edit QA Step window, a message is displayed at the top of the window indicating that the export has started. See Fig. 4.44.

Figure 4.44: Export Started for QA Step Results
Figure 4.44: Export Started for QA Step Results

Check the Status window to see the status of the export as shown in Fig. 4.45.

Figure 4.45: Export Messages in Status Window
Figure 4.45: Export Messages in Status Window

Once the export is complete, the file will start downloading to your computer. Open the Downloads window to check the download status. Once the progress bar reaches 100%, the download is complete. Right click on the results file and select Open Containing Folder as shown in Fig. 4.46.

Figure 4.46: QA Step Results in Downloads Window
Figure 4.46: QA Step Results in Downloads Window

Fig. 4.47 shows the downloaded file in Windows Explorer. By default, files are downloaded to a temporary directory on your computer. Some disk cleanup programs can automatically delete files in temporary directories; you should move any downloads you want to keep to a more permanent location on your computer.

Figure 4.47: Downloaded QA Step Results in Windows Explorer
Figure 4.47: Downloaded QA Step Results in Windows Explorer

The downloaded file is a CSV (comma-separated values) file which can be opened in Microsoft Excel or other spreadsheet programs. Double-click the filename to open the file. Fig. 4.48 shows the QA step results in Microsoft Excel.

Figure 4.48: Downloaded QA Step Results in Microsoft Excel
Figure 4.48: Downloaded QA Step Results in Microsoft Excel

4.8.1 Details of Compare Datasets Arguments

4.8.1.1 Group By Expressions

The Group By Expressions are a list of columns/expressions that are used to group the dataset records for aggregation. The expressions must contain valid columns from either the base or comparison datasets. If a column exists only in the base or compare dataset, then a Matching Expression must be specified in order for a proper mapping to happen during the comparison analysis. A group by expression can be aliased by adding the AS <alias> clause to the expression; this alias is used as the column name in the QA step results. A group by expression can also contain SQL functions such as substring or string concatenation using ||.

Sample Group By Expressions

scc AS scc_code
substring(fips, 1, 2) as fipsst

or

fipsst||fipscounty as fips
substring(scc, 1, 5) as scc_lv5

4.8.1.2 Aggregate Expressions

The Aggregate Expressions are a list of columns/expressions that will be aggregated (summed) using the specified group by expressions. The expressions must contain valid columns from either the base or comparison datasets. If a column exists only in the base or compare dataset, then a Matching Expression must be specified in order for a proper mapping to happen during the comparison analysis.

Sample Aggregate Expressions

ann_emis
avd_emis

4.8.1.3 Matching Expressions

The Matching Expressions are a list of expressions used to match base dataset columns/expressions to comparison dataset columns/expressions. A matching expression consists of three parts: the base dataset expression, the equals sign, and the comparison dataset expression (i.e. base_expression=comparison_expression).

Sample Matching Expressions

substring(fips, 1, 2)=substring(region_cd, 1, 2)
scc=scc_code
ann_emis=emis_ann
avd_emis=emis_avd
fips=fipsst||fipscounty

4.8.1.4 Join Type

The Join Type specifies which type of SQL join should be used when performing the comparison.

Join Type Description
INNER JOIN Only include rows that exist in both the base and compare datasets based on the group by expressions
LEFT OUTER JOIN Include all rows from the base dataset, only include rows from the compare dataset that meet the group by expressions
RIGHT OUTER JOIN Include all rows from the compare dataset, only include rows from the base dataset that meet the group by expressions
FULL OUTER JOIN Include all rows from both the base and compare datasets

The default join type is FULL OUTER JOIN.

4.8.1.5 Where Filter

The Where Filter is a SQL WHERE clause that is used to filter both the base and comparison datasets. The expressions in the WHERE clause must contain valid columns from either the base or comparison datasets. If a column exists only in the base or compare dataset, then a Matching Expression must be specified in order for a proper mapping to happen during the comparison analysis.

Sample Row Filter

substring(fips, 1, 2) = '37' and SCC_code in ('10100202', '10100203')

or

fips like '37%' and SCC_code like '101002%'

4.8.1.6 Base Field Suffix

The Base Field Suffix is appended to the base aggregate expression name that is returned in the output. For example, an Aggregate Expression ann_emis with a Base Field Suffix 2005 will be returned as ann_emis_2005 in the QA step results.

4.8.1.7 Compare Field Suffix

The Compare Field Suffix is appended to the comparison aggregate expression name that is returned in the output. For example, an Aggregate Expression ann_emis with a Compare Field Suffix 2008 will be returned as ann_emis_2008 in the QA step results.

4.8.2 More Examples

Fig. 4.49 shows the setup dialog for the following example of the Compare Datasets QA program. We are setting up a plant level comparison of a set of two inventories (EGU and non-EGU) versus another set of two inventories (EGU and non-EGU). All four inventories are the same dataset type. The annual emissions will be grouped by FIPS code, plant ID, and pollutant. There is no mapping required because the dataset types are identical; the columns fips, plantid, poll, and ann_emis exist in both sets of datasets. This comparison is limited to the state of North Carolina via the Where Filter:

substring(fips, 1, 2)='37'

The QA step results will have columns named ann_emis_base, ann_emis_compare, count_base, and count_compare using the Base Field Suffix and Compare Field Suffix.

Figure 4.49: Compare Datasets Example 1
Figure 4.49: Compare Datasets Example 1

Fig. 4.50 shows the setup dialog for a second example of the Compare Datasets QA program. This example takes a set of ORL nonpoint datasets and compares it to a single FF10 nonpoint inventory. We are grouping by state (first two digits of the FIPS code) and pollutant. A mapping expression is needed between the ORL column fips and the FF10 column region_cd:

substring(fips, 1, 2)=substring(region_cd, 1, 2)

Another mapping expression is needed between the columns ann_emis and ann_value:

ann_emis=ann_value

No mapping is needed for pollutant because both dataset types use the same column name poll. This comparison is limited to three states and to sources that have annual emissions greater than 1000 tons. These constraints are specified via the Where Filter:

substring(fips, 1, 2) in ('37','45','51') and ann_emis > 1000

In the QA step results, the base dataset column will be named ann_emis_2002 and the compare dataset column will be named ann_emis_2008.

Figure 4.50: Compare Datasets Example 2
Figure 4.50: Compare Datasets Example 2

4.9 Creating a Custom SQL QA Step

Suppose you have an ORL nonroad inventory that contains average-day emissions instead of annual emissions. The QA step templates that can generate inventory summaries report summed annual emissions. If you want to get a report of the average-day emissions, you can create a custom SQL QA step.

First, let’s look at the structure of a SQL QA step created from a QA step template. Fig. 4.51 shows a QA step that generates a summary of the annual emissions by county and pollutant.

Figure 4.51: QA Step Reference
Figure 4.51: QA Step Reference

This QA step uses a custom SQL query shown in the Arguments textbox:

select FIPS, POLL, sum(ann_emis) as ann_emis from $TABLE[1] e group by FIPS, POLL order by FIPS, POLL

For the ORL nonroad inventory dataset type, the annual emission values are stored in a database column named ann_emis while the average-day emissions are in a column named avd_emis. For any dataset you can see the names of the underlying data columns by viewing the raw data as described in Sec. 3.6.

To create an average-day emissions report, we’ll need to switch ann_emis in the above SQL query to avd_emis. In addition, the annual emissions report sums the emissions across the counties and pollutants. For average-day emissions, it might make more sense to compute the average emissions by county and pollutant. In the SQL query we can change sum(ann_emis) to avg(avd_emis) to call the SQL function which computes averages.

Our final revised SQL query is

select FIPS, POLL, avg(avd_emis) as avd_emis from $TABLE[1] e group by FIPS, POLL order by FIPS, POLL

Once we know what SQL query to run, we’ll create a custom QA step. Sec. 4.3 describes how to add a custom QA step to a dataset. Fig. 4.52 shows the new custom QA step with a name assigned and the Program pull-down menu set to SQL so that the custom QA step will run a SQL query. Our custom SQL query is pasted into the Arguments textbox.

Figure 4.52: Custom SQL QA Step Setup
Figure 4.52: Custom SQL QA Step Setup

Click the OK button to save the QA step. The newly added QA step is now shown in the list of QA steps for the dataset (Fig. 4.53).

Figure 4.53: Custom SQL QA Step Ready
Figure 4.53: Custom SQL QA Step Ready

At this point, you can run the QA step as described in Sec. 4.4 and view and export the QA step results (Sec. 4.5) just like any other QA step.

What if our custom SQL had a typo? Suppose we accidently entered the average-day emissions column name as avg_emis instead of avd_emis. When the QA step is run, it will fail to complete successfully. The Status window will display a message like

Failed to run QA step Avg. Day by County and Pollutant for Version ‘Initial Version’ of Dataset <dataset name>. Check the query -ERROR: column “avg_emis” does not exist

Other types of SQL errors will be displayed in the Status window as well. If the SQL query uses an invalid function name like average(avd_emis) instead of avg(avd_emis), the Status window message is

Failed to run QA step Avg. Day by County and Pollutant for Version ‘Initial Version’ of Dataset <dataset name>. Check the query -ERROR: function average(double precision) does not exist

4.10 Special SQL Syntax for QA Steps

Each of the QA steps that create summaries use a customized SQL syntax that is very similar to standard SQL, except that it includes some EMF-specific concepts that allow the queries to be defined generally and then applied to specific datasets as needed. For example, the EMF syntax for the “Summarize by SCC and Pollutant” query is:

select SCC, POLL, sum(ann_emis) as ann_emis from $TABLE[1] e group by SCC, POLL order by SCC, POLL

The only difference between this and standard SQL is the use of the $TABLE[1] syntax. When this query is run, the $TABLE[1] portion of the query is replaced with the table name that contains the dataset’s data in the EMF database. Most datasets have their own tables in the EMF schema, so you do not normally need to worry about selecting only the records for the specific dataset of interest. The customized syntax also has extensions to refer to another dataset and to refer to specific versions of other datasets using tokens other than $TABLE. For the purposes of this discussion, it is sufficient to note that these other extensions exist.

Some of the summaries are constructed using more complex queries that join information from other tables, such as the SCC and pollutant descriptions, and to account for any missing descriptions. For example, the syntax for the “Summarize by SCC and Pollutant with Descriptions” query is:

select e.SCC, 
       coalesce(s.scc_description,'AN UNSPECIFIED DESCRIPTION')::character varying(248) as scc_description, 
       e.POLL, 
       coalesce(p.descrptn,'AN UNSPECIFIED DESCRIPTION')::character varying(11) as pollutant_code_desc, 
       coalesce(p.name,'AN UNSPECIFIED SMOKE NAME')::character varying(11) as smoke_name,
       p.factor, 
       p.voctog, 
       p.species, 
       coalesce(sum(ann_emis), 0) as ann_emis, 
       coalesce(sum(avd_emis), 0) as avd_emis 
from $TABLE[1] e 
left outer join reference.invtable p on e.POLL=p.cas 
left outer join reference.scc s on e.SCC=s.scc 
group by e.SCC,e.POLL,p.descrptn,s.scc_description,p.name,p.factor,p.voctog,p.species 
order by e.SCC, p.name

This query is quite a bit more complex, but is still supported by the EMF QA step processing system.

5 Case Management

In the EMF, cases are used to organize data and settings needed for model runs. For example, a case might run MOVES2014 to generate emission factors for a set of reference counties, or a case may run SMOKE to create inputs for CMAQ. Cases are a flexible concept to accommodate many different types of processing. Cases are organized into:

When a job is run, it can produce messages that are stored as the history for the job. A job may also produce data files that are automatically imported into the EMF; these datasets are referred to as outputs for the job.

To work with cases in the EMF, select the Manage menu and then Cases. This opens the Case Manager window, which will initially be empty as shown in Fig. 5.1.

Figure 5.1: Case Manager (no category selected)
Figure 5.1: Case Manager (no category selected)

To show all cases currently in the EMF, use the Show Cases of Category pull-down to select All. The Case Manager window will then list all the cases as shown in Fig. 5.2.

Figure 5.2: Case Manager showing all cases
Figure 5.2: Case Manager showing all cases

The Case Manager window shows a summary of each case. Tbl. 5.1 lists each column in the window. Many of the values are optional and may or may not be used depending on the specific model and type of case.

Table 5.1: Case Manager Columns
Column Description
Name The unique name for the case.
Last Modified Date The most recent date and time when the case was modified.
Last Modified By The user who last modified the case.
Abbrev. The unique abbreviation assigned to the case.
Run Status The overall run status of the case. Values are Not Started, Running, Failed, and Complete.
Base Year The base year of the case.
Future Year The future year of the case.
Start Date The starting date and time of the case.
End Date The ending date and time of the case.
Regions A list of modeling regions assigned to the case.
Model to Run The model that the case will run.
Downstream The model that the case is creating output for.
Speciation The speciation mechanism used by the case.
Category The category assigned to the case.
Project The project assigned to the case.
Is Final Indicates if the case has been marked as final.

In the Case Manager window, the Name Contains textbox can be used to quickly find cases by name. The search term is not case sensitive and the wildcard character * (asterisk) can be used in the search.

To work with a case, select the case by checking the checkbox in the Select column, then click the desired action button in the bottom of the window. Tbl. 5.2 describes each button.

Table 5.2: Case Manager Actions
Command Description
View Opens the Case Viewer window to view the details of the case in read-only mode.
Edit Opens the Case Editor window to edit the details of the case.
New Opens the Create a Case window to start creating a new case.
Remove Removes the selected case; a prompt is displayed confirming the deletion.
Copy Copies the selected case to a new case named “Copy of case name”.
Sensitivity Opens the sensitivity tool, used to make emissions adjustments to existing SMOKE cases.
Compare Generates a report listing the details of two or more cases and whether the settings match.
Compare Reports Opens the Compare Case window which can be used to compare the outputs from different cases.
Import Opens the Import Cases window where case information that was previously exported from the EMF can be imported from text files.
Close Closes the Case Manager window.
Refresh Refreshes the list of cases and information about each case. (This button is in the top right corner of the Case Manager window.)

5.1 Viewing and Editing Case Details

To view or edit the details of a case, select the case in the Case Manager window, then click the View or Edit button. Fig. 5.3 shows the Case Viewer window, while Fig. 5.4 shows the Case Editor window for the same case. Data in the Case Viewer window is not editable, and the Case Viewer window does not have a Save button.

Figure 5.3: Case Viewer - Summary Tab
Figure 5.3: Case Viewer - Summary Tab
Figure 5.4: Case Editor - Summary Tab
Figure 5.4: Case Editor - Summary Tab

The Case Viewer and Case Editor windows split the case details into six tabs. Tbl. 5.3 gives a brief description of each tab.

Table 5.3: Case Viewer and Editor Tabs
Tab Description
Summary Shows an overview of the case and high-level settings
Jobs Work with the individual jobs that make up the case
Inputs Select datasets that will be used as inputs to the case’s jobs
Parameters Configure settings and other information needed to run the jobs
Outputs View and export the output datasets created by the case’s jobs
History View log and status messages generated by individual jobs

There are several buttons that appear at the bottom of the Case Viewer and Case Editor windows. The actions for each button are described in Tbl. 5.4.

Table 5.4: Case Viewer and Editor Actions
Command Description
Describe Shows the case description in a larger window. If opened from the Case Editor window, the description can be edited (see Fig. 5.5).
Refresh Reload the case details from the server.
Load (Case Editor only) Manually load data created by CMAQ jobs into the EMF.
Export Exports the case settings to text files. See Sec. 5.1.1.
Save (Case Editor only) Save the current case.
View Parent If the case was copied from another case, opens the Case Viewer showing the original case.
View Related View other cases that either produce inputs used by the current case, or use outputs created by the current case.
Close Closes the Case Viewer or Case Editor window
Figure 5.5: Edit Case Description
Figure 5.5: Edit Case Description

5.1.1 Exporting a Case

The Export button at the bottom of the Case Viewer or Case Editor window can be used to export the current case. Clicking the Export button will open the Export Case dialog shown in Fig. 5.6.

Figure 5.6: Export Case
Figure 5.6: Export Case

The case can be exported to text files either on the EMF server or directly to a local folder. After selecting the export location, click OK to export the case. The export process will create three text files, each named with the case’s name and abbreviation. Tbl. 5.5 describes the contents of the three files.

Table 5.5: Case Export Files
File Name Description
case_name_abbrev_Summary_Parameters.csv Settings from the Summary tab, and a list of parameters for the case
case_name_abbrev_Jobs.csv List of jobs for the case with settings for each job
case_name_abbrev_Inputs.csv List of inputs for the case including the dataset name associated with each input

The exported case data can be loaded back into the EMF using the Import button in the Case Manager window.

5.1.2 Summary Tab

Fig. 5.7 shows the Summary tab in the Case Editor window.

Figure 5.7: Case Editor - Summary Tab
Figure 5.7: Case Editor - Summary Tab

The Summary tab shows a high-level overview of the case including the case’s name, abbreviation, and assigned category. Many of the fields on the Summary tab are listed in the Case Manager window as described in Tbl. 5.1.

The Is Final checkbox indicates that the case should be considered final and should not have any changes made to it. The Is Template checkbox indicates that the case is meant as a template for additional cases and should not be run directly. The EMF does not enforce any restrictions on cases marked as final or templates.

The Description textbox allows a detailed description of the case to be entered. The Describe button at the bottom of the Case Editor window will open the case description in a larger window for easier editing.

The Sectors box lists the sectors that have been associated with the case. Click the Add or Remove buttons to add or remove sectors from the list.

A case can optionally be assigned to a project using the Project pull-down menu.

If the case was copied from a different case, the parent case name will be listed by the Copied From label. This value is not editable. Clicking the View Parent button will open the copied from case.

The overall status of the case can be set using the Run Status pull-down menu. Available statuses are Not Started, Running, Failed, and Complete.

The Last Modified By field shows who last modified the case and when. This field is not editable.

The lower section of the Summary tab has various fields to set technical details about the case such as which model will be run, the downstream model (i.e. which model will be using the output from the case), and the speciation mechanism in use. These values will be available to the scripts that are run for each case job; see Sec. 5.2 for more information.

For the case shown in Fig. 5.7, the Start Date & Time is January 1, 2011 00:00 GMT and the End Date & Time is December 31, 2011 23:59 GMT. The EMF client has automatically converted these values from GMT to the local time zone of the client which is Eastern Daylight Time (GMT-5). Thus the values shown in the screenshot are correct, but confusing.

5.1.3 Jobs Tab

Fig. 5.8 shows the Jobs tab in the Case Editor window.

Figure 5.8: Case Editor - Jobs Tab
Figure 5.8: Case Editor - Jobs Tab

At the top of the Jobs tab is the Output Job Scripts Folder. When a job is run, the EMF creates a shell script in this folder. See Sec. 5.2 for more information about the script that the EMF writes and executes. Click the Browse button to set the scripts folder location on the EMF server. Otherwise, the folder location can be typed in the text field.

As shown in Fig. 5.8, the Output Job Scripts Folder can use variables to refer to case settings or parameters. In this case, the folder location is set to $PROJECT_ROOT/$CASE/scripts. PROJECT_ROOT is a case parameter defined in the Parameters tab with the value /data/em_v6.2/2011platform. The CASE variable refers to the case’s abbreviation: test_2011eh_cb05_v6_11g. Thus, the scripts for the jobs in the case will be written to the folder /data/em_v6.2/2011platform/test_2011eh_cb05_v6_11g/scripts.

To view the details of a particular job, select the job, then click the Edit button to bring up the Edit Case Job window (Fig. 5.9).

Figure 5.9: Edit Case Job
Figure 5.9: Edit Case Job

Tbl. 5.6 describes each field in the Edit Case Job window.

Table 5.6: Case Job Fields
Name Description
Name The name of the job. When setting up a job, the combination of the job’s name, region, and sector must be unique.
Purpose A short description of the job’s purpose or functionality.
Executable The script or program the job will run.
Setup
Version Can be used to mark the version of a particular job.
Arguments A string of arguments to pass to the executable when the job is run.
Job Order The position of this job in the list of jobs.
Job Group Can be used to label related jobs.
Queue Options Any commands that are needed when submitting the job to run (i.e. queueing system options, or a wrapper script to call).
Parent case ID If this job was copied from a different case, shows the parent case’s ID.
Local Can be used to indicate to other users if the job runs locally vs. remotely.
Depends on TBA
Region Indicates the region associated with the job.
Sector Indicates the sector associated with the job.
Host If set to anything other than localhost, the job is executed via SSH on the remote host.
Run Status Shows the run status of the job.
Run Results
Queue ID Shows the queueing system ID, if the job is run on a system that provides this information.
Date Started The date and time the job was last started.
Date Completed The date and time the job completed.
Job Notes User editable notes about the job run.
Last Message The most recent message received while running the job.

After making any edits to the job, click the Save button to save the changes. The Close button closes the Edit Case Job window.

To create a new job, click the Add button to open the Add a Job window as shown in Fig. 5.10.

Figure 5.10: Add a New Case Job
Figure 5.10: Add a New Case Job

The Add a Job window has the same fields as the Edit Case Job window except that the Run Results section is not shown. See Tbl. 5.6 for more information about each input field. Once the job information is complete, click the Save button to save the new job. Click Cancel to close the Add a Job window without saving the new job.

An existing job can be copied to a different case or the same case using the Copy button. Fig. 5.11 shows the window that opens when copying a job.

Figure 5.11: Copy a Case Job
Figure 5.11: Copy a Case Job

If multiple jobs need to be edited with the same changes, the Modify button can be used. This action opens the window shown in Fig. 5.12.

Figure 5.12: Modify One or More Case Jobs
Figure 5.12: Modify One or More Case Jobs

In the Modify Jobs window, check the checkbox next to each property to be modified. Enter the new value for the property. After clicking OK, the new value will be set for all selected jobs.

In the Jobs tab of the Case Editor window, the Validate button can be used to check the inputs for a selected job. The validation process will check each input for the job and report if any inputs use a non-final version of their dataset, or if any datasets have later versions available. If no later versions are found, the validation message “No new versions exist for selected inputs.” is displayed.

5.1.4 Inputs Tab

When the Inputs tab is initially viewed, the list of inputs will be empty as seen in Fig. 5.13.

Figure 5.13: Case Editor - Inputs Tab (Initial View)
Figure 5.13: Case Editor - Inputs Tab (Initial View)

To view the inputs, use the Sector pull-down menu to select a sector associated with the case. In Fig. 5.14, the selected sector is All, so that all inputs for the case are displayed.

Figure 5.14: Case Editor - Inputs Tab
Figure 5.14: Case Editor - Inputs Tab

To view the details of an existing input, select the input, then click the Edit button to open the Edit Case Input window as shown in Fig. 5.15.

Figure 5.15: Edit Case Input
Figure 5.15: Edit Case Input

To create a new input, click the Add button to bring up the Add Input to Case window (Fig. 5.16).

Figure 5.16: Add Case Input
Figure 5.16: Add Case Input

The Copy button can be used to copy an existing input to a different case. Fig. 5.17 shows the Copy Case Input window that opens when the Copy button is clicked.

Figure 5.17: Copy Case Input
Figure 5.17: Copy Case Input

To view the dataset associated with a particular input, click the View Dataset button to open the Dataset Properties View window for the selected input.

5.1.5 Parameters Tab

Like the Inputs tab, the Parameters tab will be empty when initially viewed, as shown in Fig. 5.18.

Figure 5.18: Case Editor - Parameters Tab (Initial View)
Figure 5.18: Case Editor - Parameters Tab (Initial View)

To view the parameters, use the Sector pull-down menu to select a sector. Fig. 5.19 shows the Parameters tab with the sector set to All, so that all parameters for the case are shown.

Figure 5.19: Case Editor - Parameters Tab
Figure 5.19: Case Editor - Parameters Tab

To view or edit the details of an existing parameter, select the parameter, then click the Edit button. This opens the parameter editing window as shown in Fig. 5.20.

Figure 5.20: Edit Case Parameter
Figure 5.20: Edit Case Parameter

To create a new parameter, click the Add button and the Add Parameter to Case window will be displayed (Fig. 5.21).

Figure 5.21: Add Case Parameter
Figure 5.21: Add Case Parameter

5.1.6 Outputs Tab

When initially viewed, the Outputs tab will be empty, as seen in Fig. 5.22.

Figure 5.22: Case Editor - Outputs Tab (Initial View)
Figure 5.22: Case Editor - Outputs Tab (Initial View)

Use the Job pull-down menu to select a particular job and see the outputs for that job, or select “All (All sectors, All regions)” to view all the available outputs. Fig. 5.23 shows the Outputs tab with All selected.

Figure 5.23: Case Editor - Outputs Tab
Figure 5.23: Case Editor - Outputs Tab

Tbl. 5.7 lists the columns in the table of case outputs. Most outputs are automatically registered when a case job is run, and the job script is responsible for setting the output name, dataset information, message, etc.

Table 5.7: Case Outputs Colums
Column Description
Output Name The name of the case output.
Job The case job that created the output.
Sector The sector associated with the job that created the output.
Dataset Name The name of the dataset for the output.
Dataset Type The dataset type associated with the output dataset.
Import Status The status of the output dataset import.
Creator The user who created the output.
Creation Date The date and time when the output was created.
Exec Name If set, indicates the executable that created the output.
Message If set, a message about the output.

5.1.7 History Tab

Like the Outputs tab, the History tab is empty when initially viewed (Fig. 5.24).

Figure 5.24: Case Editor - History Tab (Initial View)
Figure 5.24: Case Editor - History Tab (Initial View)

The history of a single job can be viewed by selecting that job from the Job pull-down menu, or the history of all jobs can be viewed by selecting “All (All sectors, All regions)”, as seen in Fig. 5.25.

Figure 5.25: Case Editor - History Tab
Figure 5.25: Case Editor - History Tab

Messages in the History tab are automatically generated by the scripts that run for each case job. Each message will be associated with a particular job and the History tab will show when the message was received. Additionally, each message will have a type: i (info), e (error), or w (warning). The case job may report a specific executable and executable path associated with the message.

5.2 Script Integration

When a job is run, the EMF creates a shell script that will call the job’s executable. This script is created in the Output Job Scripts Folder specified in the Jobs tab of the Case Editor.

If the case includes an EMF_JOBHEADER input, the contents of this dataset are put at the beginning of the shell script. Next, all the environment variables associated with the job are exported in the script. Finally, the script calls the job’s executable with any arguments and queue options specified in the job.

In addition to the environment variables associated with a job’s inputs and parameters, Tbl. 5.8 and Tbl. 5.9 list the case and job settings that are automatically added to the script written by the EMF.

Table 5.8: Environment Variables for Case Settings
Case Setting Env. Var. Example
abbreviation $CASE test_2011eh_cb05_v6_11g
base year $BASE_YEAR 2011
future year $FUTURE_YEAR 2011
model name and version $MODEL_LABEL SMOKE3.6
downstream model $EMF_AQM CMAQ v5.0.1
speciation $EMF_SPC cmaq_cb05_soa
start date & time $EPI_STDATE_TIME 2011-01-01 00:00:00.0
end date & time $EPI_ENDATE_TIME 2011-12-31 23:59:00.0
parent case $PARENT_CASE 2011eh_cb05_v6_11g_onroad_no_ca
Table 5.9: Environment Variables for Job Settings
Job Setting Env. Var. Example
sector $SECTOR onroad
job group $JOB_GROUP
region $REGION OTC 12 km
region abbreviation $REGION_ABBREV M_12_OTC
region gridname $REGION_IOAPI_GRIDNAME M_12_OTC

6 Temporal Allocation

6.1 Introduction

The temporal allocation module in the Emissions Modeling Framework allows you to estimate inventory emissions for different time periods and resolutions. The module supports input inventories with annual totals, monthly totals, monthly average-day emissions, or daily totals. Using temporal allocation factors, the module can estimate monthly totals, monthly average-day values, daily totals, episodic totals, or episodic average-day values.

6.2 Creating a Temporal Allocation Run

Under the main Manage menu, select Temporal Allocation to open the Temporal Allocation Manager. The Temporal Allocation Manager window will list existing temporal allocations as shown in Fig. 6.1.

Figure 6.1: Temporal Allocation Manager window
Figure 6.1: Temporal Allocation Manager window

From the Temporal Allocation Manager, click the New button. The Edit Temporal Allocation window will open with the Summary tab selected (Fig. 6.2).

Figure 6.2: Summary tab for new temporal allocation
Figure 6.2: Summary tab for new temporal allocation

In the Edit Temporal Allocation window, the four tabs labeled Summary, Inventories, Time Period, and Profiles are used to enter the temporal allocation inputs. This information can be entered in any order; this guide goes through the tabs in order.

6.2.1 Summary Tab

On the Summary tab, enter a unique name for the temporal allocation. You can optionally enter a description and select a project. The EMF will automatically set the last modified date and creator. Fig. 6.3 shows the Summary tab with details of the new temporal allocation entered.

Figure 6.3: New temporal allocation with summary information entered
Figure 6.3: New temporal allocation with summary information entered

You can click the Save button from any tab in the Edit Temporal Allocation window to save the information you have entered. If you don’t enter a unique name, an error message will be displayed at the top of the window as shown in Fig. 6.4.

Figure 6.4: Temporal allocation with duplicate name
Figure 6.4: Temporal allocation with duplicate name

If you enter or update information and then try to close the edit window without saving, you will be asked if you would like to discard your changes. The prompt is shown in Fig. 6.5.

Figure 6.5: Discard changes prompt
Figure 6.5: Discard changes prompt

When your temporal allocation is successfully saved, a confirmation message is displayed at the top of the window.

Figure 6.6: Successfully saved temporal allocation
Figure 6.6: Successfully saved temporal allocation

6.2.2 Inventories Tab

The Inventory tab of the Edit Temporal Allocation lists the inventories that will be processed by the temporal allocation. For a new temporal allocation, the list is initially empty as shown in Fig. 6.7.

Figure 6.7: Inventories tab for new temporal allocation
Figure 6.7: Inventories tab for new temporal allocation

Click the Add button to select inventory datasets. A Select Datasets window will appear with the list of supported dataset types (Fig. 6.8).

Figure 6.8: Select Datasets window
Figure 6.8: Select Datasets window

The temporal allocation module supports the following inventory dataset types:

Use the Choose a dataset type pull-down menu to select the dataset type you are interested in. A list of matching datasets will be displayed in the window as shown in Fig. 6.9.

Figure 6.9: Datasets matching selected dataset type
Figure 6.9: Datasets matching selected dataset type

You can use the Dataset name contains field to filter the list of datasets as shown in Fig. 6.10.

Figure 6.10: Filtered datasets matching selected dataset type
Figure 6.10: Filtered datasets matching selected dataset type

Click on the dataset names to select the datasets you want to add and then click the OK button. Fig. 6.11 shows the Select Datasets window with one dataset selected.

Figure 6.11: Dataset selected to add
Figure 6.11: Dataset selected to add

Your selected datasets will be displayed in the Inventories tab of the Edit Temporal Allocation window (Fig. 6.12).

Figure 6.12: Inventories added to temporal allocation
Figure 6.12: Inventories added to temporal allocation

The module will automatically use the default version of each dataset. To change the dataset version, check the box next to the inventory and then click the Set Version button. A Set Version dialog will be displayed for each selected inventory as shown in Fig. 6.13.

Figure 6.13: Set version for selected inventory
Figure 6.13: Set version for selected inventory

To remove an inventory dataset, check the box next to the dataset and then click the Remove button. The View Properties button will open the Dataset Properties View Sec. 3.5 for each selected dataset and the View Data button opens the Data Viewer (Fig. 3.21).

The Inventories tab also allows you to specify an inventory filter to apply to the input inventories. This is a general filter mechanism to reduce the total number of sources to be processed in the temporal allocation run. Fig. 6.14 shows an inventory filter that will match sources in Wake County, North Carolina and only consider CO emissions from the inventory.

Figure 6.14: Inventory filtering
Figure 6.14: Inventory filtering

6.2.2.1 Annual vs. Monthly Input

The temporal allocation module can process annual and monthly data from ORL and FF10 datasets. To determine if a given ORL inventory contains annual totals or monthly average-day values, the temporal allocation module first looks at the time period stored for the inventory dataset. (These dates are set using the Dataset Properties Editor [see Sec. 3.5] and are shown in the Time Period Start and Time Period End fields on the Summary tab.) If the dataset’s start and end dates are within the same month, then the inventory is treated as monthly data.

As a fallback from using the dataset time period settings, the module also looks at the dataset’s name. If the dataset name contains the month name or abbreviation like “_january” or “_jan”, then the dataset is treated as monthly data.

For FF10 inventories, the temporal allocation module will check if the inventory dataset contains any values in the monthly data columns (i.e. jan_value, feb_value, etc.). If any data is found, then the dataset is treated as monthly data.

6.2.3 Time Period Tab

The Time Period tab of the Edit Temporal Allocation window is used to set the desired output resolution and time period. Fig. 6.15 shows the Time Period tab for the new temporal allocation.

Figure 6.15: Time period tab for new temporal allocation
Figure 6.15: Time period tab for new temporal allocation

The temporal allocation module supports the following resolutions:

To set the time period for the temporal allocation, enter the start and end dates in the fields labeled Time Period Start and Time Period End. The dates should be formatted as MM/DD/YYYY. For example, to set the time period as May 1, 2008 thorugh October 31, 2008, enter “05/01/2008” in the Time Period Start text field and enter “10/31/2008” in the Time Period End text field. For monthly output, only the year and month of the time period dates will be used.

In Fig. 6.16, the output resolution has been set to Episodic weekend average and the time period is June 1, 2011 through August 31, 2011.

Figure 6.16: Time period tab with information entered
Figure 6.16: Time period tab with information entered

6.2.4 Profiles Tab

The Profiles tab of the Edit Temporal Allocation window is used to select the temporal cross-reference dataset and various profile datasets. The cross-reference dataset is used to assign temporal allocation profiles to each source in the inventory. A profile dataset contains factors to estimate emissions for different temporal resolutions. For example, a year-to-month profile will have 12 factors, one for each month of the year.

When editing a new temporal allocation, no datasets are selected initially as shown in Fig. 6.17.

Figure 6.17: Profiles tab for new temporal allocation
Figure 6.17: Profiles tab for new temporal allocation

The Cross-Reference Dataset pull-down menu is automatically populated with datasets of type “Temporal Cross Reference (CSV)”. The format of this dataset is described in Sec. 6.4.

For annual input, year-to-month profiles are needed. The Year-To-Month Profile Dataset pull-down menu lists datasets of type “Temporal Profile Monthly (CSV)”.

For daily or episodic output, the inventory data will need estimates of daily data. The temporal allocation module supports using week-to-day profiles or month-to-day profiles. The Week-To-Day Profile Dataset pull-down menu lists available datasets of type “Temporal Profile Weekly (CSV)”. The Month-to-Day Profile Dataset pull-down shows datasets of type “Temporal Profile Daily (CSV)”.

The formats of the various profile datasets are described in Sec. 6.4.

Fig. 6.18 shows the Profiles tab with cross-reference, year-to-month profile, and week-to-day profile datasets selected.

Figure 6.18: Profiles tab with datasets selected
Figure 6.18: Profiles tab with datasets selected

For each dataset, the default version will be selected automatically. The Version pull-down menu lists available versions for each dataset if you want to use a non-default version.

The View Properties button will open the Dataset Properties View (Sec. 3.5) for the associated dataset. The View Data button opens the Data Viewer (Fig. 3.21).

6.2.5 Output Tab

The Output tab will display the result datasets created when you run a temporal allocation. For a new temporal allocation, this window is empty as shown in Fig. 6.19.

Figure 6.19: Output tab for new temporal allocation
Figure 6.19: Output tab for new temporal allocation

6.3 Running a Temporal Allocation

All temporal allocation runs are started from the Edit Temporal Allocation window. To run a temporal allocation, first open the Temporal Allocation Manager window from the main Manage menu. Check the box next to the temporal allocation you want to run and then click the Edit button.

Figure 6.20: Select temporal allocation to run in Temporal Allocation Manager
Figure 6.20: Select temporal allocation to run in Temporal Allocation Manager

The Edit Temporal Allocation window will open for the temporal allocation you selected. Click the Run button at the bottom of the window to start running the temporal allocation.

Figure 6.21: Run button in the Edit Temporal Allocation window
Figure 6.21: Run button in the Edit Temporal Allocation window

6.3.1 Error Messages

If any problems are detected, an error message is displayed at the top of the Edit Temporal Allocation window (see Fig. 6.22 for an example). The following requirements must be met before a temporal allocation can be run:

Figure 6.22: Temporal allocation run error
Figure 6.22: Temporal allocation run error

6.3.2 Run Steps and Status Messages

After starting the run, you’ll see a message at the top of the Edit Temporal Allocation window as shown in Fig. 6.23.

Figure 6.23: Temporal allocation run started
Figure 6.23: Temporal allocation run started

The EMF Status window (Sec. 2.6.5) will display updates as the temporal allocation is run. There are several steps in running a temporal allocation. First, any existing outputs for the temporal allocation are removed, indexes are created for the inventory datasets to speed up processing in the database, and the cross-reference dataset is cleaned to make sure the data is entered in a standard format.

Next, monthly totals and monthly average-day values are calculated from the input inventory data. The monthly values are stored in the monthly result output dataset which uses the “Temporal Allocation Monthly Result” dataset type. For annual input data, the year-to-month profiles are used to estimate monthly values. For monthly data from FF10 inventories, a monthly average-day value is calculated by dividing the monthly total value by the number of days in the month. For monthly data from ORL inventories, the monthly total is calculated by multiplying the monthly average-day value by the number of days in the month.

For daily and episodic output (i.e. the temporal allocation’s output resolution is not “Monthly average” or “Monthly total”), the next step is to calculate daily emissions. If a month-to-day profile is used, the monthly total value is multiplied by the appropriate factor from the month-to-day profile to calculate the emissions for each day.

Instead of month-to-day profiles, week-to-day profiles can be used. Week-to-day profiles contain 7 factors, one for each day of the week. To apply a weekly profile, the monthly average-day value is multiplied by 7 to get a weekly total value. Then, the weekly total is multiplied by the appropriate factor from the week-to-day profile to calculate the emissions for each day of the week. The calculated daily emission are stored in the daily result dataset which uses the dataset type “Temporal Allocation Daily Result”.

If the temporal allocation resolution is episodic totals or average-day, an episodic result dataset is created using the dataset type “Temporal Allocation Episodic Result”. This dataset will contain episodic totals and average-day values for the sources in the inventory. This values are calculated by summing the appropriate daily values and then dividing by the number of days to calculate the average-day values.

Once the temporal allocation has finished running, a status message “Finished Temporal Allocation run.” will be displayed. Fig. 6.24 shows the Status window after the temporal allocation has finished running.

Figure 6.24: Status messages for completed temporal allocation run
Figure 6.24: Status messages for completed temporal allocation run

The Summary tab of the Edit Temporal Allocation window includes an overview of the run listing the status (Running, Finished, or Failed) and the start and completion date for the most recent run.

Figure 6.25: Summary tab after temporal allocation is run
Figure 6.25: Summary tab after temporal allocation is run

6.3.3 Run Outputs

The Output tab of the Edit Temporal Allocation window will show the three result datasets from the run - monthly, daily, and episodic results.

Figure 6.26: Output tab after temporal allocation is run
Figure 6.26: Output tab after temporal allocation is run

From the Output tab, you can select any of the result datasets and click the View Properties button to open the Dataset Properties View window (Sec. 3.5) for the selected dataset.

Figure 6.27: Dataset Properties View for episodic result dataset
Figure 6.27: Dataset Properties View for episodic result dataset

You can also access the result datasets from the Dataset Manager.

The View Data button will open the Data Viewer window (Fig. 3.21) for the selected dataset. Clicking the Summarize button will open the QA tab of the Dataset Properties Editor window (Sec. 3.5.8).

You can use QA steps to analyze the result datasets; see Sec. 4 for information on creating and running QA steps. The formats of the three types of result datasets are described in Sec. 6.5.

6.4 Input Dataset Formats

6.4.1 Temporal Cross Reference (CSV)

Column Name Type Description
1 SCC VARCHAR(20) Source Category Code (optional; enter zero for entry that is not SCC-specific)
2 FIPS VARCHAR(12) Country/state/county code (optional)
3 PLANTID VARCHAR(20) Plant ID/facility ID (optional - applies to point sources only; leave blank for entry that is not facility-specific)
4 POINTID VARCHAR(20) Point ID/unit ID (optional - applies to point sources only)
5 STACKID VARCHAR(20) Stack ID/release point ID (optional - applies to point sources only)
6 PROCESSID VARCHAR(20) Segment/process ID (optional - applies to point sources only)
7 POLL VARCHAR(20) Pollutant name (optional; enter zero for entry that is not pollutant-specific)
8 PROFILE_TYPE VARCHAR(10) Code indicating which type of profile this entry is for. Values used by the EMF are ‘MONTHLY’, ‘WEEKLY’, or ‘DAILY’. The format also supports hourly indicators ‘MONDAY’, ‘TUESDAY’, … ‘SUNDAY’, ‘WEEKEND’, ‘WEEKDAY’, ‘ALLDAY’, and ‘HOURLY’.
9 PROFILE_ID VARCHAR(15) Temporal profile ID
10 COMMENT TEXT Comments (optional; must be double quoted)

6.4.2 Temporal Profile Monthly (CSV)

Column Name Type Description
1 PROFILE_ID VARCHAR(15) Monthly temporal profile ID
2 JANUARY REAL Temporal factor for January
3 FEBRUARY REAL Temporal factor for February
4 MARCH REAL Temporal factor for March
11 OCTOBER REAL Temporal factor for October
12 NOVEMBER REAL Temporal factor for November
13 DECEMBER REAL Temporal factor for December
14 COMMENT TEXT Comments (optional; must be double quoted)

6.4.3 Temporal Profile Weekly (CSV)

Column Name Type Description
1 PROFILE_ID VARCHAR(15) Weekly temporal profile ID
2 MONDAY REAL Temporal factor for Monday
3 TUESDAY REAL Temporal factor for Tuesday
4 WEDNESDAY REAL Temporal factor for Wednesday
5 THURSDAY REAL Temporal factor for Thursday
6 FRIDAY REAL Temporal factor for Friday
7 SATURDAY REAL Temporal factor for Saturday
8 SUNDAY REAL Temporal factor for Sunday
9 COMMENT TEXT Comments (optional; must be double quoted)

6.4.4 Temporal Profile Daily (CSV)

Column Name Type Description
1 PROFILE_ID VARCHAR(15) Daily temporal profile ID
2 MONTH INTEGER Calendar month
3 DAY1 REAL Temporal factor for day 1 of month
4 DAY2 REAL Temporal factor for day 2 of month
5 DAY3 REAL Temporal factor for day 3 of month
31 DAY29 REAL Temporal factor for day 29 of month
32 DAY30 REAL Temporal factor for day 30 of month
33 DAY31 REAL Temporal factor for day 31 of month
34 COMMENT TEXT Comments (optional; must be double quoted)

6.5 Output Dataset Formats

6.5.1 Column Naming

The temporal allocation output datasets may contain sources from ORL or FF10 inventories. These two sets of inventory formats don’t use consistent names for the source characteristic columns. The temporal allocation formats use the ORL column names. Tbl. 6.1 shows how the column names map between FF10 and ORL inventories.

Table 6.1: Column Name Mapping
FF10 Column Name ORL Column Name Description
REGION_CD FIPS State/county code, or state code
FACILITY_ID PLANTID Plant ID for point sources
UNIT_ID POINTID Point ID for point sources
REL_POINT_ID STACKID Stack ID for point sources
PROCESS_ID SEGMENT Segment for point sources

6.5.2 Temporal Allocation Monthly Result

Column Description
SCC The source SCC from the inventory
FIPS The source FIPS code from the inventory
PLANTID For point sources, the plant ID/facility ID from the inventory
POINTID For point sources, the point ID/unit ID from the inventory
STACKID For point sources, the stack ID/release point ID from the inventory
PROCESSID For point sources, the segment/process ID from the inventory
POLL The source pollutant from the inventory
PROFILE_ID The matched monthly temporal profile ID for the source; for monthly input data, this column will be blank
FRACTION The temporal fraction applied to the source’s annual emissions for the current month; for monthly input data, the fraction will be 1
MONTH The calendar month for the current record
TOTAL_EMIS (tons/month) The total emissions for the source and pollutant in the current month
DAYS_IN_MONTH The number of days in the current month
AVG_DAY_EMIS (tons/day) The average-day emissions for the source and pollutant in the current month
INV_RECORD_ID The record number from the input inventory for this source
INV_DATASET_ID The numeric ID of the input inventory dataset
Figure 6.28: Example monthly result data
Figure 6.28: Example monthly result data

6.5.3 Temporal Allocation Daily Result

Column Description
SCC The source SCC from the inventory
FIPS The source FIPS code from the inventory
PLANTID For point sources, the plant ID/facility ID from the inventory
POINTID For point sources, the point ID/unit ID from the inventory
STACKID For point sources, the stack ID/release point ID from the inventory
PROCESSID For point sources, the segment/process ID from the inventory
POLL The source pollutant from the inventory
PROFILE_TYPE The type of temporal profile used for the source; currently only the WEEKLY type is supported
PROFILE_ID The matched temporal profile ID for the source
FRACTION The temporal fraction applied to the source’s monthly emissions for the current day
DAY The date for the current record
TOTAL_EMIS (tons/day) The total emissions for the source and pollutant for the current day
INV_RECORD_ID The record number from the input inventory for this source
INV_DATASET_ID The numeric ID of the input inventory dataset
Figure 6.29: Example daily result data
Figure 6.29: Example daily result data

6.5.4 Temporal Allocation Episodic Result

Column Description
SCC The source SCC from the inventory
FIPS The source FIPS code from the inventory
PLANTID For point sources, the plant ID/facility ID from the inventory
POINTID For point sources, the point ID/unit ID from the inventory
STACKID For point sources, the stack ID/release point ID from the inventory
PROCESSID For point sources, the segment/process ID from the inventory
POLL The source pollutant from the inventory
TOTAL_EMIS (tons) The total emissions for the source and pollutant in the episode
DAYS_IN_EPISODE The number of days in the episode
AVG_DAY_EMIS (tons/day) The average-day emissions for the source and pollutant in the episode
INV_RECORD_ID The record number from the input inventory for this source
INV_DATASET_ID The numeric ID of the input inventory dataset
Figure 6.30: Example episodic result data
Figure 6.30: Example episodic result data

6.5.5 Temporal Allocation Messages

Column Description
SCC The source SCC from the inventory
FIPS The source FIPS code from the inventory
PLANTID For point sources, the plant ID/facility ID from the inventory
POINTID For point sources, the point ID/unit ID from the inventory
STACKID For point sources, the stack ID/release point ID from the inventory
PROCESSID For point sources, the segment/process ID from the inventory
POLL The source pollutant from the inventory
PROFILE_ID The matched temporal profile ID for the source
MESSAGE Message describing the issue with the source

7 Inventory Projection

7.1 Introduction

The inventory projection process involves taking a base year inventory and projecting it to a future year inventory based on expected future activity levels and emissions controls. Within the EMF, inventory projection is accomplished using the “Project Future Year Inventory” (PFYI) strategy in the Control Strategy Tool (CoST) module. The Project Future Year Inventory control strategy matches a set of user-defined Control Programs to selected emissions inventories to estimate the emissions reductions in the target future year specified by the user. The output of the PFYI strategy can be used to generate a future year emissions inventory.

Control programs are used to describe the expected changes to the base year inventory in the future. The data includes facility/plant closure information, control measures and their associated emissions impacts, growth or reduction factors to account for changes in activity levels, and other adjustments to emissions such as caps or replacements.

The CoST module is primarily used to estimate emissions reductions and costs incurred by applying different sets of control measures to emissions sources in a given year. CoST allows users to choose from several different algorithms (Control Strategies) for matching control measures to emission sources. Control strategies include “Maximum Emissions Reduction” (what is the maximum emissions reduction possible regardless of cost?) and “Least Cost” (what combination of control measures achieves a targeted emissions reduction at the least cost?).

Inventory projection has some underlying similarities to the “what if” control scenario processing available in CoST. For example, projecting an inventory requires a similar inventory source matching process and applying various factors to base emissions. However, there are some important differences between the two types of processing:

“What if” control strategies Inventory projection
Estimates emissions reductions and costs for the same year as the input inventory Estimates emissions changes for the selected future year
More concerned with cost estimates incurred by applying different control measures Minimal support for cost estimates; primary focus is emissions changes
Matches sources with control measures from the Control Measure Database (CMDB) Matches sources to data contained in user-created Control Programs

This section will detail the “Project Future Year Inventory” control strategy available in CoST. More information on general use of CoST is available in the CoST User’s Guide.

Fig. 7.1 shows the various datasets and processing steps used for inventory projection within the EMF.

Figure 7.1: Data workflow for inventory projection
Figure 7.1: Data workflow for inventory projection

One or more base year inventories are imported into the EMF as inventory datasets. Files containing the control program data such as plant closures, growth or reduction factors (projection data), controls, and caps and replacements (allowable data) are also imported as datasets.

For each growth or control dataset, the user creates a Control Program. A Control Program specifies the type of program (i.e. plant closures, control measures to apply, growth or reduction factors) and the start and end date of the program. The dataset associated with the program identifies the inventory sources affected by the program and the factors to apply (e.g. the control efficiency of the associated control measure or the expected emissions reduction in the future year).

To create a Project Future Year Inventory control strategy, the user selects the input base year inventories and control programs to consider. The primary output of the control strategy is a Strategy Detailed Result dataset for each input inventory. The Strategy Detailed Result dataset consists of pairings of emission sources and control programs, each of which contains information about the emission adjustment that would be achieved if the control program were to be applied to the source.

The Strategy Detailed Result dataset can optionally be combined with the input inventory to create a future year inventory dataset. This future year inventory dataset can be exported to an inventory data file. The future year inventory dataset can also be used as input for additional control strategies to generate controlled future year emissions.

7.2 Control Programs

7.2.1 Introduction

The Project Future Year Inventory strategy uses various types of Control Programs to specify the expected changes to emissions between the base year and the future year. Each Control Program has a start date indicating when the control program takes effect, an optional end date, and an associated dataset which contains the program-specific factors to apply and source-matching information. There are four major types of control programs: Plant Closure, Projection, Control, and Allowable.

7.2.1.1 Plant Closure

A Plant Closure Control Program identifies specific plants to close. Each record in the plant closure dataset consists of:

Using the source matching options, you can specify particular stacks to close or close whole plants.

7.2.1.2 Projection

A Projection Control Program is used to apply growth or reduction factors to inventory emissions. Each record in the projection dataset consists of:

7.2.1.3 Control

A Control-type Control Program is used to apply replacement or add-on control measures to inventory emissions. Each record in the control dataset consists of:

7.2.1.4 Allowable

An Allowable Control Program is used to apply caps on inventory emissions or replacements to inventory emissions. Allowable Control Programs are applied after the other types of programs so that the impacts of the other programs can be accounted for when checking for emissions over the specified cap. Each record in the allowable dataset consists of:

7.2.2 Control Program Datasets

Each Control Program is associated with a dataset. Tbl. 7.1 lists the EMF dataset types corresponding to each Control Program type. The Control Program datasets were designed to be compatible with the SMOKE GCNTL (growth and controls) input file which uses the term “packet” to refer to the different types of control program data; the same term is used in the EMF.

Table 7.1: Control Program Types and Datasets
Control Program Type Dataset Types
Allowable Allowable Packet, Allowable Packet Extended
Control Control Packet, Control Packet Extended
Plant Closure Plant Closure Packet (CSV), Facility Closure Extended
Projection Projection Packet, Projection Packet Extended

The dataset formats named with “Extended” add additional options beyond the SMOKE-based formats. These extended formats use the same source information fields as Flat File 2010 inventories and also support monthly factors in addition to annual values. Tbl. 7.2 shows how the column names map between the extended and non-extended dataset formats.

Table 7.2: Extended Format Mapping
Extended Format Column Name Non-Extended Format Column Name Description
REGION_CD FIPS State/county code, or state code
FACILITY_ID PLANTID Plant ID for point sources
UNIT_ID POINTID Point ID for point sources
REL_POINT_ID STACKID Stack ID for point sources
PROCESS_ID SEGMENT Segment for point sources
MACT REG_CD Maximum Achievable Control Technology (MACT) code

The file formats for each control program dataset are listed in Sec. 7.7.

7.2.2.1 Source Matching Hierarchy

When building Control Program dataset records, you can use various combinations of source matching information depending on the level of specificity needed. For example, you could create a projection factor that applies to all sources with a particular SCC in the inventory regardless of geographic location. In this case, the SCC code would be specified but the region code would be left blank. If you need a different factor for particular regions, you can add additional records that specify both the SCC and region code with the more specific factor.

When matching the Control Program dataset records to inventory sources, more specific matches will be used over less specific ones. In the case of ties, a defined hierarchy is used to rank the matches. This hierarchy is listed in Sec. 7.8.

7.2.3 Control Program Manager

The main interface for creating and editing Control Programs is the Control Program Manager. To open the Control Program Manager, select Control Programs from the main Manage menu at the top of the EMF window. A list of existing control programs is displayed as shown in Fig. 7.2.

Figure 7.2: Control Program Manager
Figure 7.2: Control Program Manager

Tbl. 7.3 describes each column in the Control Program Manager window.

Table 7.3: Control Program Manager Columns
Column Description
Name A unique name or label for the control program.
Type The type of this control program. Options are Allowable, Control, Plant Closure, or Projection.
Start The start date of the control program. Used when selecting control programs to apply in a strategy’s target year.
Last Modified The most recent date and time when the control program was modified.
End The end date of the control program. Used when selecting control programs to apply in a strategy’s target year. If not specified, N/A will be displayed.
Dataset The name of the dataset associated with the control program.
Version The version of the associated dataset that the control program will use.

Using the Control Program Manager, you can select the control programs you want to work with by clicking the checkboxes in the Select column and then perform various actions related to those control programs. Tbl. 7.4 lists the buttons along the bottom of the Control Program Manager window and describes the action for each button.

Table 7.4: Control Program Manager Actions
Command Description
View Not currently active.
Edit Opens an Edit Control Program window for each of the selected control programs.
New Opens a New Control Program window to create a new control program.
Remove Deletes the selected control programs. Only the control program’s creator or an EMF administrator can delete a control program.
Copy Creates a copy of each selected control program with a unique name.
Close Closes the Control Program Manager window.

7.2.4 Creating a New Control Program

From the Control Program Manager, click the New button at the bottom of the window. The window to create a new control program is displayed as shown in Fig. 7.3.

Figure 7.3: New Control Program window
Figure 7.3: New Control Program window

On the Summary tab, you can enter the details of the control program. Tbl. 7.5 describes each field.

Table 7.5: Control Program Summary Tab
Field Description
Name Enter a unique name or label for this control program; required.
Description Enter a description of the control program; optional.
Start Date The start date for the control program formatted as MM/DD/YYYY; required. When running a Project Future Year Inventory strategy, only control programs whose start date falls within the strategy’s Target Year will be considered.
End Date The end date for the control program formatted as MM/DD/YYYY; optional. If specified, the end date will be compared to the control strategy’s Target Year when deciding which control programs to consider.
Last Modified Date Last modification date and time of the control program; automatically set by the EMF.
Creator The EMF user who created the control program; automatically set by the EMF.
Type of Control Program Select from the list of four control program types: Allowable, Control, Plant Closure, or Projection; required.
Dataset Type Select the dataset type corresponding to the dataset you want to use for this control program.
Dataset Click the Select button to open the dataset selection window as shown in Fig. 7.4. Only datasets matching the selected dataset type are displayed. Select the dataset you want to use for this Control Program and click the OK button. You can use the Dataset name contains search box to narrow down the list of datasets if needed.
Version After you’ve selected the dataset, the Version pull-down lists the available versions of the dataset with the default version selected. You can select a different version of the dataset if appropriate.
Figure 7.4: Control Program dataset selection
Figure 7.4: Control Program dataset selection

Fig. 7.5 shows the New Control Program window with the data fields filled out. Once you’ve finished entering the details of the new control program, click the Save button to save the control program.

Figure 7.5: New Control Program window with data entered
Figure 7.5: New Control Program window with data entered

Once a dataset has been selected for a control program, the View Data and View buttons to the right of the dataset name will open the Data Viewer (Fig. 3.21) or Dataset Properties View (Sec. 3.5) for the selected dataset.

7.2.4.1 Control Measures and Technologies

The Measures and Technologies tabs in the Edit Control Program window are only used when working with Control-type Control Programs.

When a Control-type control program is used in a Project Future Year Inventory control strategy, CoST will try to match each applied control packet record to a control measure in the Control Measure Database in order to estimate associated costs. You can specify a list of probable control measures or control technologies when you define the control program to limit the potential matches.

In the Edit Control Program window, the Measures tab (Fig. 7.6) lets you specify the control measures to include.

Figure 7.6: Control measures associated with a control program
Figure 7.6: Control measures associated with a control program

Click the Add button to open the Select Control Measures window. As shown in Fig. 7.7, the Select Control Measures window lists all the defined control measures including the control measure’s name, abbreviation, and major pollutant.

Figure 7.7: Select Control Measures for Control Program
Figure 7.7: Select Control Measures for Control Program

You can use the filtering and sorting options to find the control measures of interest. Select the control measures you want to add then click the OK button to add the control measures to the Control Program and return to the Edit Control Program window.

To remove control measures, select the appropriate control measures, then click the Remove button.

The Technologies tab in the Edit Control Program window (Fig. 7.8) allows you to specify particular control technologies associated with the control program.

Figure 7.8: Control technologies associated with a control program
Figure 7.8: Control technologies associated with a control program

Click the Add button to open the Select Control Technologies window. As shown in Fig. 7.9, the Select Control Technologies window lists all the defined control technologies by name and description.

Figure 7.9: Select Control Technologies for Control Program
Figure 7.9: Select Control Technologies for Control Program

You can use the filtering and sorting options to find the control technologies of interest. Select the control technologies you want to add then click the OK button to add the control technologies to the Control Program and return to the Edit Control Program window.

To remove control technologies, select the appropriate control technologies, then click the Remove button.

7.3 Creating a Project Future Year Inventory Control Strategy

To create a Project Future Year Inventory Control Strategy, first open the Control Strategy Manager by selecting Control Strategies from the main Manage menu. Fig. 7.10 shows the Control Strategy Manager window.

Figure 7.10: Control Strategy Manager
Figure 7.10: Control Strategy Manager

Click the New button to start creating the control strategy. You will first be prompted to enter a unique name for the control strategy as shown in Fig. 7.11.

Figure 7.11: New control strategy name
Figure 7.11: New control strategy name

Almost all of the strategy parameters for the Project Future Year Inventory strategy have the same meaning and act in the same way as they do for the Maximum Emissions Reduction strategy, such as cost year, inventory filter, and county dataset. This section focuses on parameters or inputs that differ for the Project Future Year Inventory strategy type.

7.3.1 Summary Information

The Summary tab displays high-level parameters about the control strategy (Fig. 7.12).

Figure 7.12: Project Future Year Inventory Summary tab
Figure 7.12: Project Future Year Inventory Summary tab

Parameters of interest for the Project Future Year Inventory strategy:

7.3.2 Inventories

The Project Future Year Inventory strategy can use inventories in the following dataset types: Flat File 2010 Point, Flat File 2010 Nonpoint, ORL point, ORL nonpoint, ORL nonroad, or ORL nonroad. Multiple inventories can be processed in a single strategy. Note that multiple versions of the inventories may be available, and the appropriate version of each inventory must be selected prior to running a control strategy.

7.3.3 Control Programs

The Programs tab in the Edit Control Strategy window is used to select which control programs should be considered in the strategy. Fig. 7.13 shows the Programs tab for an existing control strategy.

Figure 7.13: Project Future Year Inventory Programs tab
Figure 7.13: Project Future Year Inventory Programs tab

Click the Add button to bring up the Select Control Programs window as shown in Fig. 7.14.

Figure 7.14: Select Control Programs for PFYI strategy
Figure 7.14: Select Control Programs for PFYI strategy

In the Select Control Programs window, you can select which control programs to use in your PFYI control strategy. The table displays the name, control program type, and description for all defined control programs. You can use the filter and sorting options to help find the control programs you are interested in. Select the checkbox next to each control program to add and then click the OK button to return to the Programs tab.

To remove control programs from the strategy, select the programs to remove and then click the Remove button. The Edit button will open an Edit Control Program window for each of the selected control programs.

More than one of the same type of control program can be added to a strategy. For example, you could add three Plant Closure Control Programs: Cement Plant Closures, Power Plant Closures, and Boiler Closures. All three of these control programs would be evaluated and a record of the evaluation would be stored in the Strategy Detailed Result. If there happen to be multiple Projection, Control, or Allowable Type Control Programs added to a strategy, packets of the same type are merged into one packet during the matching analysis so that no duplicate source-control-packet pairings are created. Duplicate records will be identified during the run process and the user will be prompted to remove duplicates before the core algorithm performs the projection process.

7.3.4 Constraints

Fig. 7.15 shows the Constraints tab for a Project Future Year Inventory strategy. The only constraint used by PFYI strategies is a strategy-specific constraint named Minimum Percent Reduction Difference for Predicting Controls (%). This constraint determines whether a predicted control measure has a similar percent reduction to the percent reduction specified in the Control Program Control Packet.

Figure 7.15: Project Future Year Inventory Constraints tab
Figure 7.15: Project Future Year Inventory Constraints tab

7.4 Running the Control Strategy

To run the Project Future Year Inventory control strategy, click the Run button at the bottom of the Edit Control Strategy window. The EMF will begin running the strategy. Check the Status window (Sec. 2.6.5) to monitor the status of the run.

7.4.1 Control Program Application Order

The Project Future Year Inventory strategy processes Control Programs in the following order:

  1. Plant Closure control programs
  2. Projection control programs
  3. Control type control programs
  4. Allowable control programs

The Control analysis is dependent on the Projection analysis; likewise, the Allowable analysis is dependent on the Projection and Control analyses. The adjusted source emission values need to be carried along from each analysis step to make sure each portion of the analysis applies the correct adjustment factor. For example, a source could be projected, and also controlled, in addition to having a cap placed on the source. Or, a source could have a projection or control requirement, or perhaps just a cap or replacement requirement.

7.5 Outputs from the Control Strategy

7.5.1 Strategy Detailed Result

The main output for each control strategy is a table called the Strategy Detailed Result. This dataset consists of pairings of emission sources and control programs, each of which contains information about the emission adjustment that would be achieved if the control program were to be applied to the source, along with the cost of application. The Strategy Detailed Result table can be used with the original input inventory to produce, in an automated manner, a controlled emissions inventory that reflects implementation of the strategy; this inventory includes information about the control programs that have been applied to the controlled sources. The controlled inventory can then be directly input to the SMOKE modeling system to prepare air quality model-ready emissions data. In addition, comments are placed at the top of the inventory file to indicate the strategy that produced it and the settings of the high-level parameters that were used to run the strategy.

The columns in the Strategy Detailed Result dataset are described in Sec. 7.9, Tbl. 7.14.

7.5.2 Strategy Messages

In additional to the Strategy Detailed Result dataset, CoST automatically generates a Strategy Messages dataset. The Strategy Messages output provides useful information that is gathered while the strategy is running. This output can store ERROR and WARNING types of messages. If an ERROR is encountered during the prerun validation process, the strategy run will be canceled and the user can peruse this dataset to see what problems the strategy has (e.g., duplicate packet records).

The columns in the Strategy Messages dataset are described in Sec. 7.9, Tbl. 7.15.

7.6 Creating Future Year Inventories

After the Project Future Year Inventory control strategy has been run, you can create a future year emissions inventory. From the Outputs tab, select the Strategy Detailed Result for the base year inventory and select the Controlled Inventory radio button as shown in Fig. 7.16.

Figure 7.16: Creating a future year inventory
Figure 7.16: Creating a future year inventory

Click the Create button to begin creating the future year inventory. Monitor the Status window for messages and to see when the process is complete.

The future year inventory will automatically be added as a dataset matching the dataset type of the base year inventory. The new dataset’s description will contain comments indicating the strategy used to produce it and the high-level settings for that strategy.

For ORL Inventories:

For the sources that were controlled, CoST fills in the CEFF (control efficiency), REFF (rule effectiveness), and RPEN (rule penetration) columns based on the Control Packets applied to the sources. The CEFF column is populated differently for a replacement Control Packet record than for an add-on Control Packet record. For a replacement control, the CEFF column is populated with the percent reduction of the replacement control. For an add-on control, the CEFF column is populated with the overall combined percent reduction of the add-on control plus the preexisting control, using the following formula:

(1 – {[1 – (existing percent reduction / 100)] x [1 – (add-on percent reduction / 100)]}) x 100

For both types of Control Packet records (add-on or replacement), the REFF and RPEN are defaulted to 100 since the CEFF accounts for any variation in the REFF and RPEN by using the percent reduction instead of solely the CEFF.

Note that only Control Packets (not Plant Closure, Projection, or Allowable packets) will be used to help populate the columns discussed above.

For Flat File 2010 Inventories:

For the sources that were controlled, CoST fills in the annual (ANN_PCT_RED) and monthly percent reduction (JAN_PCT_RED) columns based on the values for the Control Packet that was applied to the sources. The CEFF column is populated differently for a replacement control than for an add-on control. For a replacement control, the CEFF column is populated with the percent reduction of the replacement control. For an add-on control, the CEFF column is populated with the overall combined percent reduction of the add-on control plus the preexisting control, using the following formula:

(1 – {[1 – (existing percent reduction / 100)] x [1 – (add-on percent reduction / 100)]}) x 100

For both types of measures, the REFF and RPEN values are defaulted to 100, because the CEFF accounts for any variation in the REFF or RPEN by using the percent reduction instead of the CEFF.

CoST also populates several additional columns toward the end of the ORL and Flat File 2010 inventory rows that specify information about measures that it has applied. These columns are:

7.7 Control Program Dataset Formats

7.7.1 Plant Closure Packet

The format of the Plant Closure Packet described in Tbl. 7.6 is based on the CSV format. The first row of this dataset file must contain the column header definition as defined in Line 1 of Tbl. 7.6. All the columns specified here must be included in the dataset import file.

Table 7.6: Plant Closure Packet Data Format
Line Position Description
1 A..H Column header definition - must contain the following columns: fips,plantid,pointid,stackid,segment,plant,effective_date,reference
2+ A Country/State/County code, required
B Plant Id for point sources, optional; blank, zero, or -9 if not specified; leave blank for nonpoint inventories
C Point Id for point sources, optional; blank, zero, or -9 if not specified; leave blank for nonpoint inventories
D Stack Id for point sources, optional; blank, zero, or -9 if not specified; leave blank for nonpoint inventories
E Segment for point sources, optional; blank, zero, or -9 if not specified; leave blank for nonpoint inventories
F Plant name or description, for point sources, optional; leave blank for nonpoint inventories
G Effective Date, the effective date for the plant closure to take place. When the closure effective cutoff is after this effective date, the plant will not be closed. A blank value is assumed to mean that the sources matched from this record will be closed regardless. The strategy target year is the year used in the closure effective cutoff date check. See Sec. 7.7.8 for more information.
H Reference, contains reference information for closing the plant

7.7.2 Facility Closure Extended

The Facility Closure Extended format (Tbl. 7.7) is similar to the Plant Closure Packet but uses column names consistent with the Flat File 2010 inventories. The format also contains additional columns that may be used in the future to further enhance the inventory source matching capabilities: COUNTRY_CD, TRIBAL_CODE, SCC, and POLL.

Table 7.7: Facility Closure Extended Data Format
Column Description
Country_cd Country code, optional; currently not used in matching process
Region_cd State/county code, or state code with blank for county, or zero (or blank or -9) for all state/county or state codes
Facility_id Facility ID for point sources, optional; blank, zero, or -9 if not specified; leave blank for nonpoint inventories
Unit_id Unit ID for point sources, optional; blank, zero,or -9 if not specified; leave blank for nonpoint inventories
Rel_point_id Release Point ID for point sources, optional; blank, zero, or -9 if not specified; leave blank for nonpoint inventories
Process_id Process ID for point sources, optional; blank, zero, or -9 if not specified; leave blank for nonpoint inventories
Facility_name Facility name or description, for point sources, optional; leave blank for nonpoint inventories
Tribal_code Tribal code, optional; currently not used in matching process
SCC 8- or 10-digit SCC, optional; blank, zero, or -9 if not an SCC-specific closure; currently not used in matching process
Poll Pollutant name, optional; blank, zero, or -9 if not a pollutant-specific closure; currently not used in matching process
Effective_date Effective Date, the effective date for the plant closure to take place. When the closure effective cutoff is after this effective date, the plant will not be closed. A blank value is assumed to mean that the sources matched from this record will be closed regardless. The strategy target year is the year used in the closure effective cutoff date check. See Sec. 7.7.8 for more information.
Comment Information about this record and how it was produced and entered by the user.

7.7.3 Projection Packet

The format of the Projection Packet (Tbl. 7.8) is based on the SMOKE file format as defined in the SMOKE User’s Manual. One modification was made to enhance this packet’s use in CoST: the unused SMOKE column at position K is now used to store the NAICS code.

Table 7.8: Projection Packet Data Format
Line Position Description
1 A /PROJECTION <4-digit from year> <4-digit to year>/
2+ A # Header entry. Header is defined by the # as the first character on the line
3+ A Country/State/County code, or Country/state code with blank for county, or zero (or blank or -9) for all Country/State/County or Country/state codes
B 8 or 10-digit SCC, optional, blank, zero, or -9 if not a SCC-specific projection
C Projection factor [enter number on fractional basis; e.g., enter 1.2 to increase emissions by 20%]
D Pollutant , blank, zero, or -9 if not a pollutant-specific projection
E Standard Industrial Category (SIC), optional, blank, zero, or -9 if not a SIC- specific projection
F Maximum Achievable Control Technology (MACT) code, optional, blank, zero, or -9 if not a MACT-specific projection
G Plant Id for point sources, optional; blank, zero, or -9 if not specified; leave blank for nonpoint inventories
H Point Id for point sources, optional; blank, zero, or -9 if not specified; leave blank for nonpoint inventories
I Stack Id for point sources, optional; blank, zero, or -9 if not specified; leave blank for nonpoint inventories
J Segment for point sources, optional; blank, zero, or -9 if not specified; leave blank for nonpoint inventories
K North American Industry Classification (NAICS) Code, optional; blank, zero, or -9 if not a NAICS-specific projection
L Characteristic 5 (blank for ORL inventory input format), optional
3 A /END/

7.7.4 Projection Packet Extended

The format of the Projection Packet Extended (Tbl. 7.9) dataset is not based on the SMOKE format. It is based on the EMF Flexible File Format, which is based on the CSV-based format. This new format uses column names that are aligned with the Flat File 2010 dataset types in the EMF system. The format also supports monthly projection factors in addition to annual projection factors. For example, instead of using the FIPS code, the new format uses the REGION_CD column, and instead of PLANTID the new format uses FACILITY_ID. The appropriate mapping between the old and new formats is described in Tbl. 7.2. The new format also contains additional columns that will be used in the future to help further enhance the inventory source matching capabilities, these include COUNTRY_CD, TRIBAL_CODE, CENSUS_TRACT_CD, SHAPE_ID, and EMIS_TYPE.

Table 7.9: Projection Packet Extended Data Format
Column Description
Country_cd Country code, optional; currently not used in matching process
Region_cd State/county code, or state code with blank for county, or zero (or blank or -9) for all state/county or state codes
Facility_id Facility ID (aka Plant ID in ORL format) for point sources, optional; blank, zero, or -9 if not specified; leave blank for nonpoint inventories
Unit_id Unit ID (aka Point ID for ORL format) for point sources, optional; blank, zero,or -9 if not specified; leave blank for nonpoint inventories
Rel_point_id Release Point ID (aka Stack ID in ORL format) for point sources, optional; blank, zero, or -9 if not specified; leave blank for nonpoint inventories
Process_id Process ID (aka Segment on ORL format) for point sources, optional; blank, zero, or -9 if not specified; leave blank for nonpoint inventories
Tribal_code Tribal code, optional; currently not used in matching process
Census_tract_cd Census tract ID, optional; currently not used in matching process
Shape_id Shape ID, optional; currently not used in matching process
Emis_type Emission type, optional; currently not used in matching process
SCC 8- or 10-digit SCC, optional; blank, zero, or -9 if not an SCC-specific control
Poll Pollutant;, blank, zero, or -9 if not a pollutant-specific projection
Reg_code Regulatory code (aka Maximum Achievable Control Technology code), optional; blank, zero, or -9 if not a regulatory code-specific control
SIC Standard Industrial Category (SIC), optional; blank, zero, or -9 if not an SIC- specific control
NAICS North American Industry Classification (NAICS) code, optional; blank, zero, or -9 if not a NAICS-specific control
Ann_proj_factor The annual projection factor used to adjust the annual emission of the inventory. The number is stored as a fraction rather than a percentage; e.g., enter 1.2 to increase emissions by 20% (double precision).
The annual projection factor is also used as a default for monthly-specific projection factors when they are not specified. If you do not want to specify a monthly-specific projection factor value, then also make sure not to specify an annual projection factor, which could be used as a default.
Jan_proj_factor The projection factor used to adjust the monthly January emission of the inventory (the jan_value column of the FF10 inventory). The number is stored as a fraction rather than a percentage; e.g., enter 1.2 to increase emissions by 20% (double precision).
If no January projection factor is specified, the annual projection factor value will be used as a default.
The monthly-specific projection factor fields are not used on the older ORL inventory formats; only the annual projection factor field will be used on these older formats.
Feb_proj_factor Analogous to the January projections factor, above.
Dec_proj_factor The projection factor used to adjust the monthly December emission of the inventory (the dec_value column of the FF10 inventory). The number is stored as a fraction rather than a percentage; e.g., enter 1.2 to increase emissions by 20% (double precision).
If no December projection factor is specified, the annual projection factor value will be used as a default.
The monthly-specific projection factor fields are not used on the older ORL inventory formats; only the annual projection factor field will be used on these older formats.
Comment Information about this record and how it was produced and entered by the user.

7.7.5 Control Packet

The format of the Control Packet (Tbl. 7.10) is based on the SMOKE file format as defined in the SMOKE User’s Manual. Several modifications were made to enhance the packet’s use in CoST:

  1. The unused SMOKE column at position D is now used to store the primary control measure abbreviation; if one is specified, this measure is used on any source that was matched with those control packet entries.
  2. The unused SMOKE column at position P is used to store the compliance date the control can be applied to sources.
  3. The unused SMOKE column at position Q is used to store the NAICS code.
Table 7.10: Control Packet Data Format
Line Position Description
1 A /CONTROL/
2+ A # Header entry. Header is indicated by use of “#” as the first character on the line.
3+ A Country/state/county code, or country/state code with blank for county, or zero (or blank or -9) for all country/state/county or country/state codes
B 8- or 10-digit SCC, optional; blank, zero, or -9 if not an SCC-specific control
C Pollutant; blank, zero, or -9 if not a pollutant-specific control
D Primary control measure abbreviation; blank, zero, or -9 applies to all measure in the Control Measure Database
E Control efficiency; value should be a percent (e.g., enter 90 for a 90% control efficiency)
F Rule effectiveness; value should be a percent (e.g., enter 50 for a 50% rule effectiveness)
G Rule penetration rate; value should be a percent (e.g., enter 80 for a 80% rule penetration)
H Standard Industrial Category (SIC); optional, blank, zero, or -9 if not an SIC- specific control
I Maximum Achievable Control Technology (MACT) code; optional, blank, zero, or -9 if not a MACT-specific control
J Application control flag:
Y = control is applied to inventory
N = control will not be used
K Replacement flag:
A = control is applied in addition to any controls already on source
R = control replaces any controls already on the source
L Plant ID for point sources, optional; blank, zero, or -9 if not specified; leave blank for nonpoint inventories
M Point ID for point sources, optional; blank, zero, or -9 if not specified; leave blank for nonpoint inventories
N Stack ID for point sources, optional; blank, zero, or -9 if not specified; leave blank for nonpoint inventories
O Segment for point sources, optional; blank, zero, or -9 if not specified; leave blank for nonpoint inventories
P Compliance Date. The compliance date on which a control can be applied to sources; prior to this date, the control will not be applied. A blank value is assumed to mean that the control is within the compliance date and the sources matched from this record will be controlled regardless. The strategy target year is the year that is used in the control compliance cutoff date check. See Sec. 7.7.8 for more information.
Q North American Industry Classification (NAICS) Code, optional; blank, zero, or -9 if not a NAICS-specific control
4 A /END/

7.7.6 Control Packet Extended

The format of the Control Packet Extended (Tbl. 7.11) dataset is not based on the SMOKE format. It is based on the EMF Flexible File Format, which is based on the CSV-based format. This new format uses column names that are aligned with the Flat File 2010 dataset types in the EMF system. The format also contains additional columns that will be used in the future to help further enhance the inventory source matching capabilities: COUNTRY_CD, TRIBAL_CODE, CENSUS_TRACT_CD, SHAPE_ID, and EMIS_TYPE.

Table 7.11: Control Extended Packet Data Format
Column Description
Country_cd Country code, optional; currently not used in matching process
Region_cd State/county code, or state code with blank for county, or zero (or blank or -9) for all state/county or state codes
Facility_id Facility ID (aka Plant ID in ORL format) for point sources, optional; blank, zero, or -9 if not specified; leave blank for nonpoint inventories
Unit_id Unit ID (aka Point ID for ORL format) for point sources, optional; blank, zero, or -9 if not specified; leave blank for nonpoint inventories
Rel_point_id Release Point ID (aka Stack ID in ORL format) for point sources, optional; blank, zero, or -9 if not specified; leave blank for nonpoint inventories
Process_id Process ID (aka Segment on ORL format) for point sources, optional; blank, zero, or -9 if not specified; leave blank for nonpoint inventories
Tribal_code Tribal code, optional; currently not used in matching process
Census_tract_id Census tract ID, optional; currently not used in matching process
Shape_id Shape ID, optional; currently not used in matching process
Emis_type Emission type, optional; currently not used in matching process
SCC 8- or 10-digit SCC, optional; blank, zero, or -9 if not an SCC-specific control
Poll Pollutant;, blank, zero, or -9 if not a pollutant-specific control
Reg_code Regulatory code (aka Maximum Achievable Control Technology code), optional; blank, zero, or -9 if not a regulatory code-specific control
SIC Standard Industrial Category (SIC), optional; blank, zero, or -9 if not an SIC-specific control
NAICS North American Industry Classification (NAICS) code, optional; blank, zero, or -9 if not a NAICS-specific control
Compliance_Date Compliance Date. The compliance date on which a control can be applied to sources; prior to this date, the control will not be applied. A blank value is assumed to mean that the control is within the compliance date and the sources matched from this record will be controlled regardless. The strategy target year is the year used in the control compliance cutoff date check. See Sec. 7.7.8 for more information.
Application_control Application control flag:
Y = control is applied to inventory
N = control will not be used
Replacement Replacement flag:
A = control is applied in addition to any controls already on source
R = control replaces any controls already on the source
Pri_cm_abbrev Primary control measure abbreviation (from the Control Measure Database) that defines the control packet record
Ann_pctred The percent reduction of the control (value should be a percent; e.g., enter 90 for a 90% percent reduction) to apply to the annual emission factor; the percent reduction can be considered a combination of the control efficiency, rule effectiveness, and rule penetration (CE * RE/100 * RP/100).
The annual percent reduction field is used to reduce annual emission of the inventory (the ann_value column of the FF10 inventory formats contains the annual emission value).
The annual percent reduction is also used as a default for monthly-specific percent reductions when they are not specified. If you do not want to specify a monthly-specific projection factor value, then also make sure not to specify an annual projection factor, which could be used as a default.
Jan_pctred The percent reduction of the control to apply to the monthly January emission factor (the jan_value column of the FF10 inventory).
If no January percent reduction is specified, the annual percent reduction value will be used as a default.
The monthly-specific percent reduction fields are not used on the older ORL inventory formats; only the annual percent reduction field will be used on these older formats.
Feb_pctred Analogous to the January percent reduction, above.
Dec_pctred The percent reduction of the control to apply to the monthly December emission factor (the dec_value column of the FF10 inventory).
If no December percent reduction is specified, the annual percent reduction value will be used as a default.
The monthly-specific percent reduction fields are not used on the older ORL inventory formats; only the annual percent reduction field will be used on these older formats.
Comment Information about this record and how it was produced and entered by the user.

7.7.7 Allowable Packet

The format of the Allowable Packet (Tbl. 7.12) is based on the SMOKE file format as defined in the SMOKE User’s Manual. Two modifications were made to enhance this packet’s use in CoST:

  1. The unused SMOKE column at position L is now used to store the compliance date that the cap or replacement emission value can be applied to a source.
  2. The unused SMOKE column at position M is used to store the NAICS code.
Table 7.12: Allowable Data Format
Line Position Description
1 A /ALLOWABLE/
2+ A # Header entry. Header is indicated by use of “#” as the first character on the line.
3+ A Country/state/county code, or country/state code with blank for county, or zero (or blank or -9) for all country/state/county or country/state codes
B 8- or 10-digit SCC, optional; blank, zero, or -9 if not an SCC-specific cap or replacement
C Pollutant; blank, zero, or -9 if not a pollutant-specific control; in most cases, the cap or replacement value will be a pollutant-specific value, and that pollutant’s name needs to be placed in this column
D Control factor (no longer used by SMOKE or CoST; enter -9 as placeholder)
E Allowable emissions cap value (tons/day) (required if no “replace” emissions are given)
F Allowable emissions replacement value (tons/day) (required if no “cap” emissions are given)
G Standard Industrial Category (SIC), optional; blank, zero, or -9 if not an SIC- specific cap or replacement
H Plant ID for point sources, optional; blank, zero, or -9 if not specified; leave blank for nonpoint inventories
I Point ID for point sources, optional; blank, zero, or -9 if not specified, leave blank for nonpoint inventories
J Stack ID for point sources, optional; blank, zero, or -9 if not specified, leave blank for nonpoint inventories
K Segment for point sources, optional; blank, zero, or -9 if not specified, leave blank for nonpoint inventories
L Compliance Date. The compliance date on which a cap or replacement entry can be applied to sources; prior to this date, the cap or replacement will not be applied. A blank value is assumed to mean that the cap or replacement is within the compliance date and is available for analysis. See Sec. 7.7.8 for more information.
M North American Industry Classification (NAICS) Code, optional; blank, zero, or -9 if not a NAICS-specific projection
4 A /END/

7.7.8 Effective and Compliance Date Handling

For control programs that use an effective date (plant closures) or compliance date (controls), CoST uses the control strategy target year to build a cutoff date to use when determining which programs are in effect. To specify the month and day of the cutoff date (used in combination with the target year), there are two EMF system-level properties. These properties are stored in the emf.properties table and are named COST_PROJECT_FUTURE_YEAR_EFFECTIVE_DATE_CUTOFF_MONTHDAY (for effective dates) and COST_PROJECT_FUTURE_YEAR_COMPLIANCE_DATE_CUTOFF_MONTHDAY (for compliance dates). To set a cutoff month/day of October 1, the property value would be “10/01”.

For a strategy with a target year of 2020 and an effective cutoff month/day of 10/01, the closure effective cutoff date is 10/01/2020.

Closure Record Effective Date Outcome
07/01/2013 Effective date is before the cutoff date so all sources matching this record will be closed
blank All sources matching this record will be closed
11/15/2020 Effective date is after the cutoff date so matching sources will not be closed

7.8 Control Program Source Matching Hierarchy

Tbl. 7.13 lists the source matching combinations, the inventory types the matching criteria can be used for, and the Control Program Packet Types that can use these criteria.

Table 7.13: Control Packet Source Matching Hierarchy
Ranking Matching Combination Inventory Types Control Program Types
1 Country/State/County code, plant ID, point ID, stack ID, segment, 8-digit SCC code, pollutant point allowable, control, projection, plant closure
2 Country/State/County code, plant ID, point ID, stack ID, segment, pollutant point allowable, control, projection, plant closure
3 Country/State/County code, plant ID, point ID, stack ID, pollutant point allowable, control, projection, plant closure
4 Country/State/County code, plant ID, point ID, pollutant point allowable, control, projection, plant closure
5 Country/State/County code, plant ID, 8-digit SCC code, pollutant point allowable, control, projection, plant closure
6 Country/State/County code, plant ID, MACT code, pollutant point control, projection
7 Country/State/County code, plant ID, pollutant point allowable, control, projection, plant closure
8 Country/State/County code, plant ID, point ID, stack ID, segment, 8-digit SCC code point allowable, control, projection, plant closure
9 Country/State/County code, plant ID, point ID, stack ID, segment point allowable, control, projection, plant closure
10 Country/State/County code, plant ID, point ID, stack ID point allowable, control, projection, plant closure
11 Country/State/County code, plant ID, point id point allowable, control, projection, plant closure
12 Country/State/County code, plant ID, 8-digit SCC code point allowable, control, projection, plant closure
13 Country/State/County code, plant ID, MACT code point control, projection
14 Country/State/County code, plant ID point allowable, control, projection, plant closure
15 Country/State/County code, MACT code, 8-digit SCC code, pollutant point, nonpoint control, projection
16 Country/State/County code, MACT code, pollutant point, nonpoint control, projection
17 Country/State code, MACT code, 8-digit SCC code, pollutant point, nonpoint control, projection
18 Country/State code, MACT code, pollutant point, nonpoint control, projection
19 MACT code, 8-digit SCC code, pollutant point, nonpoint control, projection
20 MACT code, pollutant point, nonpoint control, projection
21 Country/State/County code, 8-digit SCC code, MACT code point, nonpoint control, projection
22 Country/State/County code, MACT code point, nonpoint control, projection
23 Country/State code, 8-digit SCC code, MACT code point, nonpoint control, projection
24 Country/State code, MACT code point, nonpoint control, projection
25 MACT code, 8-digit SCC code point, nonpoint control, projection
26 MACT code point, nonpoint control, projection
27 Country/State/County code, NAICS code, 8-digit SCC code, pollutant point, nonpoint control, projection
28 Country/State/County code, NAICS code, pollutant point, nonpoint control, projection
29 Country/State code, NAICS code, 8-digit SCC code, pollutant point, nonpoint control, projection
30 Country/State code, NAICS code, pollutant point, nonpoint control, projection
31 NAICS code, 8-digit SCC code, pollutant point, nonpoint control, projection
32 NAICS code, pollutant point, nonpoint control, projection
33 Country/State/County code, NAICS code, 8-digit SCC code point, nonpoint control, projection
34 Country/State/County code, NAICS code point, nonpoint control, projection
35 Country/State code, NAICS code, 8-digit SCC code point, nonpoint control, projection
36 Country/State code, NAICS code point, nonpoint control, projection
37 NAICS code, 8-digit SCC code point, nonpoint control, projection
38 NAICS code point, nonpoint control, projection
39 Country/State/County code, 8-digit SCC code, 4-digit SIC code, pollutant point, nonpoint allowable, control, projection
40 Country/State/County code, 4-digit SIC code, pollutant point, nonpoint allowable, control, projection
41 Country/State code, 8-digit SCC code, 4-digit SIC code, pollutant point, nonpoint allowable, control, projection
42 Country/State code, 4-digit SIC code, pollutant point, nonpoint allowable, control, projection
43 4-digit SIC code, SCC code, pollutant point, nonpoint allowable, control, projection
44 4-digit SIC code, pollutant point, nonpoint allowable, control, projection
45 Country/State/County code, 4-digit SIC code, SCC code point, nonpoint allowable, control, projection
46 Country/State/County code, 4-digit SIC code point, nonpoint allowable, control, projection
47 Country/State code, 4-digit SIC code, SCC code point, nonpoint allowable, control, projection
48 Country/State code, 4-digit SIC code point, nonpoint allowable, control, projection
49 4-digit SIC code, SCC code point, nonpoint allowable, control, projection
50 4-digit SIC code point, nonpoint allowable, control, projection
51 Country/State/County code, 8-digit SCC code, pollutant point, nonpoint, onroad, nonroad allowable, control, projection
52 Country/State code, 8-digit SCC code, pollutant point, nonpoint, onroad, nonroad allowable, control, projection
53 8-digit SCC code, pollutant point, nonpoint, onroad, nonroad allowable, control, projection
54 Country/State/County code, 8-digit SCC code point, nonpoint, onroad, nonroad allowable, control, projection
55 Country/State code, 8-digit SCC code point, nonpoint, onroad, nonroad allowable, control, projection
56 8-digit SCC code point, nonpoint, onroad, nonroad allowable, control, projection
57 Country/State/County code, pollutant point, nonpoint, onroad, nonroad allowable, control, projection
58 Country/State/County code point, nonpoint, onroad, nonroad allowable, control, projection, plant closure
59 Country/State code, pollutant point, nonpoint, onroad, nonroad allowable, control, projection
60 Country/State code point, nonpoint, onroad, nonroad allowable, control, projection, plant closure
61 Pollutant point, nonpoint, onroad, nonroad allowable, control, projection

7.9 Control Strategy Output Dataset Formats

7.9.1 Strategy Detailed Result

Table 7.14: Columns in the Strategy Detailed Result
Column Description
SECTOR The source sector specified for the input inventory dataset.
CM_ABBREV For Plant Closure Packets, this column will be set to “PLTCLOSURE”.

For Projection Packets, this column will be set to “PROJECTION”.

For Control Packets, this column will be set to the abbreviation of the control measure that was applied to the source, if it was explicitly specified in the packet, or it could be the predicted measure abbreviation as found in the CMDB. If no measure can be found, then it will be set to “UNKNOWNMSR”.

For Allowable Packets, this column will be set to the predicted abbreviation of the control measure that was applied to the source. If no measure can be found, then it will be set “UNKNOWNMSR”.
POLL The pollutant for the source, found in the inventory
SCC The SCC code for the source, found in the inventory
REGION_CD The state and county FIPS code for the source, found in the inventory
FACILITY_ID For point sources, the facility ID for the source from the inventory.
UNIT_ID For point sources, the unit ID for the source from the inventory.
REL_POINT_ID For point sources, the release point ID for the source from the inventory.
PROCESS_ID For point sources, the process ID for the source from the inventory.
ANNUAL_COST ($) The total annual cost (including both capital and operating and maintenance) required to keep the measure on the source for a year. Note that costs are adjusted to the strategy-defined “Cost Year” dollars.
CTL_ANN_COST_PER_TON ($/ton) This field is not used for the strategy type and is left blank/null.
EFF_ANN_COST_PER_TON ($/ton) The annual cost (both capital and operating and maintenance) to reduce one ton of the pollutant. Note that costs are adjusted to the strategy-defined “Cost Year” dollars.
ANNUAL_OPER_MAINT_COST ($) The annual cost to operate and maintain the measure once it has been installed on the source. Note that costs are adjusted to the strategy-defined “Cost Year” dollars.
ANNUAL_VARIABLE_OPER_MAINT_COST ($) The annual variable cost to operate and maintain the measure once it has been installed on the source. Note that costs are adjusted to the strategy-defined “Cost Year” dollars.
ANNUAL_FIXED_OPER_MAINT_COST ($) The annual fixed cost to operate and maintain the measure once it has been installed on the source. Note that costs are adjusted to the strategy-defined “Cost Year” dollars.
ANNUALIZED_CAPITAL_COST ($) The annualized cost of installing the measure on the source assuming a particular discount rate and equipment life. Note that costs are adjusted to the strategy-defined “Cost Year” dollars.
TOTAL_CAPITAL_COST ($) The total cost to install a measure on a source. Note that costs are adjusted to the strategy-defined “Cost Year” dollars.
CONTROL_EFF (%) The control efficiency as specified by the Control Packet or Allowable Packet. This field is null for Plant Closure and Projection Packets.
RULE_PEN (%) The rule penetration that is specified in the old Control Packet format.

For the new Control Extended Packet format, this is set to 100.

This field is null for Plant Closure and Projection Packets.
RULE_EFF (%) The rule effectiveness that is specified in the old Control Packet format.

For the new Control Extended Packet format, this is set to 100.

This field is null for Plant Closure and Projection Packets.
PERCENT_REDUCTION (%) The percent by which the emissions from the source are reduced after the Control Packet has been applied. This field is null for Plant Closure and Projection Packets.
ADJ_FACTOR The adjustment factor stores the Projection Packet factor that is applied to the source. This number is stored in a fractional state rather than as a percentage.

This field is null for Plant Closure and Control Packets.
INV_CTRL_EFF (%) The control efficiency for the existing measure on the source, found in the inventory
INV_RULE_PEN (%) The rule penetration for the existing measure on the source, found in the inventory
INV_RULE_EFF (%) The rule effectiveness for the existing measure on the source, found in the inventory
FINAL_EMISSIONS (tons) The final emissions amount that results from the source’s being adjusted by the various Control Program Packets. This is set by subtracting the emis_reduction field by the inv_emissions field.
CTL_EMIS_REDUCTION (tons) This field is not used for the strategy type and is left blank/null.
EFF_EMIS_REDUCTION (tons) This field is used to store the amount by which the emission was reduced for the particular Control Program Packet (Plant Closure, Projection, Control, or Allowable) that is being processed.
INV_EMISSIONS (tons) This field is used to store the beginning/input emission for the particular Control Program Packet (Plant Closure, Projection, Control, or Allowable) that is being processed.
APPLY_ORDER This field stores the Control Program Action Code that is being used on the source. These codes indicate whether the Control Program is applying a Plant Closure, Projection, Control, or Allowable Packet.
INPUT_EMIS (tons) This field is not used for the strategy type and is left blank/null.
OUTPUT_EMIS (tons) This field is not used for the strategy type and is left blank/null.
FIPSST The two-digit FIPS state code.
FIPSCTY The three-digit FIPS county code.
SIC The SIC code for the source from the inventory.
NAICS The NAICS code for the source from the inventory.
SOURCE_ID The record number from the input inventory for this source.
INPUT_DS_ID The numeric ID of the input inventory dataset (for bookkeeping purposes).
CS_ID The numeric ID of the control strategy
CM_ID This field is not used for the strategy type and is left blank/null.
EQUATION TYPE The control measure equation that was used during the cost calculations. If a minus sign is in front of the equation type, this indicates that the equation type was missing inputs and the strategy instead used the default approach to estimate costs.

Note that this field will be used only when Control Packets are applied, not when any of the other packet types are applied.
ORIGINAL_DATASET_ID This field is not used for the strategy type and is left blank/null.
SECTOR This field is not used for the strategy type and is left blank/null.
CONTROL_PROGRAM The control program that was applied to produce this record
XLOC The longitude for the source, found in the inventory for point sources, for nonpoint inventories the county centroid is used. This is useful for mapping purposes
YLOC The latitude for the source, found in the inventory for point sources, for nonpoint inventories the county centroid is used. This is useful for mapping purposes.
FACILITY The facility name from the inventory (or county name for nonpoint sources)
REPLACEMENT_ADDON Indicates whether the Control Packet was applying a replacement or an add-on control.
A = Add-On Control
R = Replacement Control

Note that this field will be used only when Control Packets are applied, not when any of the other packet types are applied.
EXISTING_MEASURE_ABBREVIATION This field is not used for the strategy type and is left blank/null.
EXISTING_PRIMARY_DEVICE_TYPE_CODE This field is not used for the strategy type and is left blank/null.
STRATEGY_NAME This field is not used for the strategy type and is left blank/null.
CONTROL_TECHNOLOGY This field is not used for the strategy type and is left blank/null.
SOURCE_GROUP This field is not used for the strategy type and is left blank/null.
COUNTY_NAME This field is not used for the strategy type and is left blank/null.
STATE_NAME This field is not used for the strategy type and is left blank/null.
SCC_L1 This field is not used for the strategy type and is left blank/null.
SCC_L2 This field is not used for the strategy type and is left blank/null.
SCC_L3 This field is not used for the strategy type and is left blank/null.
SCC_L4 This field is not used for the strategy type and is left blank/null.
JAN_FINAL_EMISSIONS The monthly January final emission that results from the source’s being adjusted by the various Control Program Packets. This is set by subtracting the monthly January emission reduction by the monthly January input emission.

This monthly- related field is populated only when projecting Flat File 2010 inventories.
FEB_FINAL_EMISSIONS Same as defined for the jan_final_emissions field but for February.
DEC_FINAL_EMISSIONS Same as defined for the jan_final_emissions field but for December.
JAN_PCT_RED The percent by which the source’s January monthly emission is reduced after the Control Packet has been applied.

This field is null for Plant Closure and Projection Packets.

This monthly-related field is only populated when projecting Flat File 2010 inventories.
FEB_PCT_RED Same as defined for the jan_pct_red field but for February
DEC_PCT_RED Same as defined for the jan_pct_red field but for December
COMMENT Information about this record and how it was produced; this can be either created automatically by the system or entered by the user.

7.9.2 Strategy Messages

Table 7.15: Columns in the Strategy Messages Dataset
Column Description
region_cd The state and county FIPS code for the source, found in the inventory
scc The SCC code for the source, found in the inventory
facility_id For point sources, the plant/facility ID for the source, found in the inventory
unit_id For point sources, the point/unit ID for the source, found in the inventory
rel_point_id For point sources, the stack/release point ID for the source, found in the inventory
process_id For point sources, the segment/process ID for the source, found in the inventory
poll The pollutant for the source, found in the inventory
status The status type. The possible values are listed below:
Warning - description
Error - description
Informational - description
control_program The control program for the strategy run; this is populated only when using the PFYI strategy type.
message The text describing the strategy problem.
message_type Contains a high-level message-type category. Currently this is populated only when using the PFYI strategy type.
The possible values are listed below:
Inventory Level (or blank) - message has to do specifically with a problem with the inventory
Packet Level - message has to do specifically with a problem with the packet record being applied to the inventory
inventory Identifies the inventory with the problem.
packet_region_cd The state and county FIPS/region code for the source, found in the control program packet
packet_scc The SCC code for the source, found in the control program packet
packet_facility_id For point sources, the plant/facility ID for the source, found in the control program packet
packet_unit_id For point sources, the point/unit ID for the source, found in the control program packet
packet_rel_point_id For point sources, the stack/release point ID for the source, found in the control program packet
packet_process_id For point sources, the segment/process ID for the source, found in the control program packet
packet_poll The pollutant for the source, found in the control program packet
packet_sic The SIC code for the source, found in the control program packet
packet_mact The MACT/regulatory code for the source, found in the control program packet
packet_naics The NAICS code for the source, found in the control program packet
packet_compliance_effective_date The compliance or effective date, found in the control program packet. The compliance date is used in the Control Packet; the effective date is used in the Plant Closure Packet
packet_replacement Indicates whether the packet identifies a replacement versus an add-on control, found in the control program packet
packet_annual_monthly Indicates whether the packet is monthly based or annual based

8 Module Types and Modules

8.1 Introduction

The “module type” and “module” features have been developed as a component of the EMF and reuse many of its features (dataset types, datasets, client-server architecture, PostgreSQL database, etc.), while allowing users flexibility to utilize datasets in new ways through PostgreSQL commands.

8.2 Features

Both “module types” and “modules” are easy to use and are flexible enough to address a wide variety of scenarios, systematically tracks changes in either algorithms, inputs or assumptions; moreover, these changes are easy to document.

A module type defines an algorithm which can operate on input datasets and parameters and produces output datasets and parameters. Module types are equivalent to functions in most programming languages.

A simple module type implements the algorithm in PL/pgSQL, the SQL procedural language for the PostgreSQL database system. A composite module type implements the algorithm using a network of interconnected submodules based on other (simple or composite) module types.

A module is a construct that binds a module type’s inputs and outputs to concrete datasets and parameter values. Running a module executes the algorithm on the concrete datasets and parameter values bound to inputs and produces the datasets and parameters bound to outputs. Modules are equivalent to complete executable programs.

The module types and the modules are generic components and can be used to implement any model.

The module type and module features consist of:

A module’s outputs can be another module’s inputs. Consequently, the modules can be organized in complex networks modeling complex dataflows.

The relationship between Module Types and Modules is very similar to the relationship between Dataset Types and Datasets:

Figure 8.1: Datasets and Modules
Figure 8.1: Datasets and Modules

8.3 User Interface

8.3.1 Module Type Manager

The Module Type Manager window lists the existing module types and allows the user to view edit, create, or remove module types. The user can create simple or composite module types.

Removing module types used by modules and other module types requires user confirmation:

Figure 8.2: Remove Module Type Confirmation
Figure 8.2: Remove Module Type Confirmation

Only users with administrative privileges can remove entire module types via the Module Type Manager window.

8.3.2 Module Type Version Manager

The Module Type Version Manager window lists all module type versions for the selected module type and allows the user to view, edit, copy, and remove module type versions. Only users with administrative privileges can remove module type versions that have been finalized.

8.3.3 Module Type Version Editor

The Module Type Version Properties window lists module type metadata (name, description, creator, tags, etc.), module type version metadata (version, name, description, etc.), datasets, parameters, and revision notes for the selected module type version. It also lists the algorithm for simple module types and the submodules and the connections for the composite module types. The user can select a parameter’s type from a limited (but configurable) list of SQL types (integer, varchar, etc.).

The user can indicate that a dataset or parameter is optional. For composite module types, if the target of a connection is optional then a source does not have to be selected. The UI prevents the user from connecting an optional source to a non-optional (required) target.

The algorithm for a simple module type must handle optional datasets and parameters. The following placeholders (macros) can be used to test if a dataset/parameter is optional and if a dataset/value was provided: ${placeholder-name.is_optional}, ${placeholder-name.is_set}, #{parameter-name.is_optional}, and #{parameter-name.is_set}. See Algorithm Syntax (Sec. 8.5).

The user can change, save, validate, and finalize the module type version. The user is automatically prompted to add new revision notes every time new changes are saved. The validation step verifies (among other things) that all dataset placeholders in the algorithm are defined.

Updating a module type version used by modules and other composite module type versions requires user confirmation:

Figure 8.3: Update Module Type Version Confirmation
Figure 8.3: Update Module Type Version Confirmation

For a composite module type, finalizing a module type version requires finalizing all module type versions used by submodules, recursively. The user is shown the list of all required changes and the finalization proceeds only after the user agrees to all the changes.

Figure 8.4: Finalize Composite Module Type Version
Figure 8.4: Finalize Composite Module Type Version

When working with a composite module type, the Diagram tab displays a diagram illustrating the composite module type’s submodules, inputs, outputs, and connections. Each submodule is color-coded so that the submodule and its specific inputs and outputs can be identified. Overall inputs to the composite module type are shown with a white background. In the diagram, datasets are identified by boxes with blue borders, and dataset connection are shown with a blue line. Parameters use boxes with red borders, and parameter connections use red lines.

8.3.4 Module Manager

The Module Manager UI that lists the existing modules and allows the user to view, edit, create, copy, remove, compare, and run modules.

Figure 8.5: Module Manager
Figure 8.5: Module Manager

Users who do not have administrative privileges can only remove modules that they created, and only modules that have not been finalized. When removing a module, the user can choose to remove all datasets that were output by that module. Datasets that are used as inputs to other modules, or are in use by other parts of the EMF (e.g. control strategies, control programs) won’t be deleted. Eligible output datasets will be fully deleted, the equivalent of Remove and Purge in the Dataset Manager.

The module comparison feature produces a side-by-side report listing all module attributes and the comparison results: MATCH, DIFFERENT, FIRST ONLY, SECOND ONLY.

Figure 8.6: Module Comparison
Figure 8.6: Module Comparison

8.3.5 Module Editor

The View/Edit Module window lists metadata (description, creator, tags, project, etc.), dataset bindings, parameter bindings, and execution history for the selected module. The user can bind concrete datasets to dataset placeholders and concrete values to input parameters. If a dataset/parameter is optional then a dataset/value binding is not required.

The View/Edit Module window also lists the algorithm for simple module types and the submodules, connections, internal datasets, and internal parameters for composite module types. The internal datasets and parameters are usually lost after a run, but the user can choose to keep some or all internal datasets and parameters (mostly for debugging). The user can change, save, validate, run, and finalize the selected module.

In the datasets tab the user can select and open a concrete dataset used or produced by the run (if any) and inspect the data. The user can also obtain the list of modules related to a concrete dataset. A module is related to a dataset if it produced the dataset as output or it’s using the dataset as input.

In the parameters tab the user can inspect the value of the output parameters as produced by the last run (only if the last run was successful).

A module can be finalized if the following conditions are met:

  1. The Module Type is final.
  2. The last module run was successful.
  3. The last module run (that is, the last history record) is up-to-date with respect to the module type (that is, the module type is older than the start of the last module run).
  4. The last module run is up-to-date with respect to the input and output datasets (that is, the input datasets are older than the start of the last run and the output datasets are older than the end of the last run).

Finalizing a module finalizes the input and output datasets also.

The View/Edit Module window has a status indicator that informs the user that the module is UpToDate or OutOfDate.

Figure 8.7: Viewing an Out-Of-Date Module
Figure 8.7: Viewing an Out-Of-Date Module

The Status button brings up a dialog box explaining why the module is Out-Of-Date.

Figure 8.8: Out-Of-Date Status
Figure 8.8: Out-Of-Date Status

A module is UpToDate when:

8.3.6 Module History

The Module History window lists all execution records for the selected module. The user can select and view each record in the Module History Details window.

Figure 8.9: Module History
Figure 8.9: Module History

8.3.7 Module History Details

The Module History Details window lists metadata, concrete datasets used or produced by the run (including the internal datasets the user chose to keep), the parameter values used or produced by the run (including the internal parameters the user chose to keep), the actual setup/user/teardown scripts executed by the database server for the module and each submodule, and detailed logs including error messages, if any. The user can select and open a concrete dataset used or produced by the run and inspect the data. The user can also obtain the list of modules related to a concrete dataset.

The setup script used by the Module Runner creates a temporary database user with very limited permissions. It also creates a temporary default schema for this user.

The actual user scripts executed by the database server for each simple module or submodule contains the algorithm (with all placeholders replaced) surrounded by some wrapper/interfacing code generated by the Module Runner. The user script is executed under the restricted temporary database user account in order to protect the database from malicious or buggy code in the algorithm.

The teardown script drops the temporary schema and the temporary database user.

8.3.8 Dataset Manager

The Dataset Manager lists all datasets in the EMF, including those used by modules, with options to view, edit, import, export, and remove datasets. When removing a dataset via the Dataset Manager, the system checks if that dataset is in use by a module as 1) an input to a module, 2) an output of a module where the module replaces the dataset, or 3) the most recent output created as a new dataset from a module. If any of the usage conditions are met, the dataset will not be deleted; the Status window will include a message detailing which modules use which datasets.

8.4 Module Runner

The Simple Module Runner is a server component that validates the simple module, creates the output datasets, creates views for all datasets, replaces all placeholders in the module’s algorithm with the corresponding dataset views, executes the resulting scripts on the database server (using a temporary restricted database user account), retrieves the values of all output parameters, and logs the execution details including all errors, if any. The Module Runner automatically adds new custom keywords to the output datasets listing the module name, the module database id, and the placeholder.

The Composite Module Runner is a server component that validates the composite module and executes its submodules in order of dependencies by:

The order in which the submodules are executed is repeatable: when multiple submodules independent of each other are ready for execution, they are processed in the order of their internal id.

The Composite Module Runner keeps track of temporary internal datasets and parameters and deletes them as soon as they are not needed anymore, unless the user explicitly chose to keep them.

While running a module, the Module Runner enforces strict dataset replacement rules to prevent unauthorized dataset replacement.

8.5 Algorithm Syntax

The algorithm for a simple module type must be written in PL/pgSQL, the SQL procedural language for the PostgreSQL database system (https://www.postgresql.org/docs/9.5/static/plpgsql-overview.html).

The EMF Module Tool extends this language to accept placeholders for the module’s datasets. The placeholder syntax is: ${placeholder-name}. For example, if a module type has defined a dataset called input_options_dataset then the algorithm can refer to it using the ${input_options_dataset} syntax.

The module tool also uses placeholders for the module’s parameters. The parameter placeholder syntax is: #{parameter-name}. For example, if a module type has defined a parameter called increase_factor then the algorithm can refer to it using the #{increase_factor} syntax.

For example, the following algorithm reads records from input_emission_factors_dataset, applies a multiplicative factor to the Emission_Factor column, and inserts the resulting records into a new dataset called output_emission_factors_dataset:

INSERT INTO ${output_emission_factors_dataset}
    (Fuel_Type, Pollutant, Year, Emission_Factor, Comments)
SELECT
    ief.Fuel_Type,
    ief.Pollutant,
    ief.Year,
    ief.Emission_Factor * #{increase_factor},
    ief.Comments
FROM ${input_emission_factors_dataset} ief;

More detailed information is available for each dataset placeholder:

Table 8.1: Dataset Placeholders
Placeholder Description Example
${placeholder-name.table_name} The name of the PostgreSQL table that holds the data for the dataset. emissions.ds_inputoptions_dataset_1_1165351574
${placeholder-name.dataset_name} The dataset name. Input Options Dataset
${placeholder-name.dataset_id} The dataset id. 156
${placeholder-name.version} The version of the dataset as selected by the user. 2
${placeholder-name.view} The name of the temporary view created for this dataset table by the Module Runner. input_options_dataset_iv
${placeholder-name} Same as ${placeholder-name.view}. input_options_dataset_iv
${placeholder-name.mode} The dataset mode: IN, INOUT, or OUT, where IN is an input dataset, INOUT is both an input and updated as output, and OUT is an output dataset. IN
${placeholder-name.output_method} The dataset output method (defined only when mode is OUT): NEW or REPLACE. NEW
${placeholder-name.is_optional} TRUE if the dataset is optional, FALSE if the dataset is required TRUE
${placeholder-name.is_set} TRUE if a dataset was provided for the placeholder, FALSE otherwise TRUE

The following “general information” placeholders related to the current user, module, or run are defined also:

Table 8.2: General Information Placeholders
Placeholder Description Example
${user.full_name} The current user’s full name. John Doe
${user.id} The current user’s id. 6
${user.account_name} The current user’s account name. jdoe
${module.name} The current module’s name. Refinery On-Site Emissions
${module.id} The current module’s id. 187
${module.final} If the module is final, then the placeholder is replaced with the word Final. Otherwise the placeholder is replaced with the empty string. Final
${module.project_name} If the module has a project, then the placeholder is replaced with the name of the project. Otherwise the placeholder is replaced with the empty string.
${run.id} The unique run id. 14
${run.date} The run start date. 11/28/2016
${run.time} The run start time. 14:25:56.825

The following parameter placeholders are defined:

Table 8.3: Parameter Placeholders
Placeholder Description Example
#{parameter-name} The name of the parameter with a timestamp appended to it. increase_factor_094517291
#{parameter-name.sql_type} The parameter’s SQL type. double precision
#{parameter-name.mode} The parameter mode: IN, INOUT, or OUT, where IN is an input parameter, INOUT is both an input and updated as output parameter (e.g. an index value), and OUT is an output parameter. IN
#{parameter-name.input_value} The parameter’s input value (defined only when mode is IN or INOUT). 1.15
#{parameter-name.is_optional} TRUE if the parameter is optional, FALSE if the parameter is required TRUE
#{parameter-name.is_set} TRUE if a value was provided for the parameter, FALSE otherwise TRUE

8.6 New Output Datasets Name Pattern Syntax

The “general information” placeholders listed above (see Tbl. 8.2) can also be used to build output dataset name patterns in the Module Editor. For example, a module could specify the following name pattern for a new output dataset:

Refinery On-Site Emissions #${run.id} ${user.full_name} ${run.date} ${run.time}

When running the module, all placeholders in the name pattern will be replaced with the corresponding current value. For example:

Refinery On-Site Emissions #43 John Doe 12/05/2016 09:45:17.291

9 Troubleshooting

9.1 Client won’t start

Problem:

On startup, an error message is displayed like Fig. 9.1:

"The EMF client was not able to contact the server due to this error:

(504)Server doesn’t respond at all."

or

(504)Server denies connection.

Figure 9.1: Error Starting the EMF Client
Figure 9.1: Error Starting the EMF Client

Solution:

The EMF client application was not able to connect to the EMF server. This could be due to a problem on your computer, the EMF server, or somewhere in between.

If you are connecting to a remote EMF server, first check your computer’s network connection by loading a page like google.com in your web browser. You must have a working network connection to use the EMF client.

Next, check the server location in the EMF client start up script C:\EMF_State\EMFClient.bat. Look for the line

set TOMCAT_SERVER=http://<server location>:8080

You can directly connect to the EMF server by loading

http://<server location>:8080/emf/services

in your web browser. You should see a response similar to Fig. 9.2.

Figure 9.2: EMF Server Response
Figure 9.2: EMF Server Response

If you can’t connect to the EMF server or don’t get a response, then the EMF server may not be running. Contact the EMF server administrator for further help.

9.2 Can’t load Dataset Manager

Problem:

When I click the Datasets item from the main Manage menu, nothing happens and I can’t click on anything else.

Solution:

Clicking Datasets from the main Manage menu displays the Dataset Manager. In order to display this window, the EMF client needs to request a complete list of dataset types from the EMF server. If you are connecting to an EMF server over the Internet, fetching lists of data can take a while and the EMF client needs to wait for the data to be received. Try waiting to see if the Dataset Manager window appears.

9.3 Can’t load all datasets

Problem:

In the Dataset Manager, I selected Show Datasets of Type “All” and nothing happens and I can’t click on anything else.

Solution:

When displaying datasets of the selected type, the EMF client needs to fetch the details of the datasets from the EMF server. If you are connecting to an EMF server over the Internet or if there are many datasets imported into the EMF, loading this data can take a long time. Try waiting to see if the list of datasets is displayed. Rather than displaying all datasets, you may want to pick a single dataset type or use the Advanced search to limit the list of datasets that need to be loaded from the EMF server.

10 Server Administration

10.1 Components

The EMF server consists of a database, file storage, and the server application which handles requests from the clients and communicates with the database.

The database server is PostgreSQL version 9.2 or later. For ShapeFile export, you will need the PostGIS module installed.

The server application is a Java executable that runs in the Apache Tomcat servlet container. You will need Apache Tomcat 8.0 or later.

The server components can run on Windows, Linux, or Mac OS X.

10.2 Network Access

The EMF client application communicates with the server on port 8080. For the client application, the EMFClient.bat launch script specifies the server location and port via the setting

set TOMCAT_SERVER=http://<server address>:8080

In order to import data into the EMF, the files must be locally accessible by the server. Depending on your setup, you may want to mount a network drive on the server or allow SFTP connections for users to upload files.

10.3 EMF Administrator

Inside the EMF client, users with administrative privileges have access to additional management options.

10.3.1 User Management

EMF administrators can reset users passwords. Administrators can also create new users.

10.3.2 Dataset Type Management

Administrators can create and edit dataset types. Administrators can also add QA step templates to dataset types.