Paging and I/O usage

Paging controls the amount of data available to the transformation portion of the map execution at any time. If a map rule requires a piece of data that is not in memory, the paging subsystem will fetch the appropriate data from disk. This can impact existing data in memory; if there are no unused pages, an existing page is swapped out of memory.

In situations where a great deal of information from various areas of the input are needed, increasing the number and size of pages available to the system can reduce I/O volume. Increasing the information window allows the map to process more information without loading pages from disk.

Paging portions of data files allow for processing a large transformation in a much smaller amount of memory. Choosing paging settings appropriate to the transformation can improve performance considerably.

The following paging-related activities affect performance:

Using the WorkSpace paging setting

In many maps, paging settings are explicitly specified in the maps themselves during design. If the designer of a map does not explicitly specify page settings, the page settings for the map will default to a page size of 64 KB and a page count of 8 KB. Optimal page size and count settings depend ultimately on the amount of workspace needed. Because the workspace requirements are not available at the beginning of execution, the estimated page size and count will not always result in an optimal execution time. In this situation or if the paging settings within a map are sub optimal, altering paging behavior at execution time might improve performance.

The primary method of altering runtime paging behavior during Command Server execution uses the WorkSpace PageSize (-P) execution command. With the (-P) command, the page size and count can be specified for a transformation. The page count determines the total number of pages available to the system for both data and work file usage. Half of the requested pages are allocated to each file usage. Page size determines the total amount of storage available on each page. Together, the two directly control the amount of information available at any given time, as an information window.

To determine the values for the page settings, you must understand how the transformation engine allocates memory.

When the WorkArea card setting on a map is set to Memory, the transformation engine will allocate memory as required for both data and workspace, with the page size controlled by the PageSize setting in the map.

When the WorkArea card setting on a map is set to File, the transformation engine will allocate memory to track the information written to file for both data and work space, with the page size controlled by the PageSize and PageCount settings.

As a result of how memory is allocated depending on the WorkArea card setting, if the WorkArea is set to File, and the PageSize and PageCount map settings are set to a high number, the transformation engine is more likely to exceed memory.

For most data transformations, a PageSize map setting of 1024 kilobytes is too large. Data transformations whose PageSize map settings are 64, 128, 256 or 384 kilobytes should be tested for performance results using sample data that mirrors the size and complexity of the data that will be used in your production environment. Performance will be impacted negatively if you use PageSize map settings that are lower or higher than these recommended settings.

Setting information windows

When specifying paging information, it is critical to understand the information requirements of the map. Certain usage patterns require a larger window into the data and the workspace than other usage patterns. If a paging setting does not provide a large enough window, the paging subsystem will generate additional I/O requests to meet the demands of the map.

Conversely, requesting too much memory through paging settings can have detrimental effects as well. If the amount of requested memory exceeds available physical memory, the underlying operating system will begin paging memory to disk. Selecting appropriate paging settings is a balance between the information needs of the map and the amount of available memory.

Setting an appropriate information window especially applies to series-consuming functions such as EXTRACT(), LOOKUP(), and CHOOSE(). Depending on context, series-consuming functions can require access to the entire data file to complete map execution. This contrasts with the much smaller requirements of simple sequential maps.

Knowing the size of the input

One significant factor in determining appropriate paging settings is the amount of data processed by a map. A larger amount of input might require more paging memory to attain optimal execution time. More paging memory results in a larger information window. The degree to which larger input might require a larger information window varies with the design of the map.