Search functions reference

This documentation contains quantitative comparisons of the data search functions. It presents results of comparing the LOOKUP() against the SEARCHUP() or SEARCHDOWN() functions.

The primary difference between the functionality of LOOKUP() and SEARCHUP() or SEARCHDOWN() is the ability to use a full logical expression in LOOKUP(). For example, LOOKUP() can be used to find the first item whose value is less than 3. SEARCHUP() or SEARCHDOWN() can only find an exact match.

Functionality aside, LOOKUP() and SEARCHUP() or SEARCHDOWN() differ in their performance characteristics. A LOOKUP() search operates by scanning the appropriate set of items in sequence, attempting to match the logical condition. On average, a LOOKUP() search takes time proportional to the number of items in the set.

In contrast, SEARCHUP() and SEARCHDOWN() take advantage of sorted input to provide faster response. Searches with SEARCHUP() or SEARCHDOWN() use a binary search index on the items in the data series. This results in execution time proportional to the logarithm of the item count. In other words, doubling the number of items in the data series causes only an incremental increase in the execution time required for that search.

To achieve this response-time benefit, SEARCHUP() and SEARCHDOWN() require sorted input. Additionally, SEARCHUP() and SEARCHDOWN() can only be used to find an exact match. If these conditions are suitable for the map, then using SEARCHUP() or SEARCHDOWN() can provide significant execution-time benefits. These benefits are especially noticeable for larger data series.

The following table presents the example results.

This table compares map execution times of the SEARCHUP() and LOOKUP() functions, based on the size of the input data.

The row for each map function presents the execution times for four different input-data sizes.

Map Functions 16 KB 128 KB 1 MB 2 MB
SEARCHUP() 0.23 0.46 3.04 5.83
LOOKUP() 0.28 4.84 288.40 1220.27

This table captures execution times in seconds for two maps that both attempt to match each data element from one input file with another. The two maps differ in the function that each uses to match data. The first map uses SEARCHUP() to locate matches quickly. The second map uses LOOKUP().

For small series of data, the difference in execution time is not as noticeable. However, as the input series size grows, the execution time for the SEARCHUP map grows much more slowly than the execution time for the LOOKUP map.

A search is executed for every data element in the input file. As a result, the overall execution time grows with the data size and the search response time. In the case of the LOOKUP map, this yields a map execution time that grows exponentially with input size. A doubling of the input size requires an execution time four times as long for this map.