Binary data processing with the HZAZIP utility

The HZAZIP utility processes binary data in order to preserve record boundaries, while other platforms typically consider binary data to be a byte stream without structure.

When compressing a record of binary data, the HZAZIP utility performs the following processing:
  • So that record boundaries can be preserved, the following is done depending on the input record format:
    • For fixed-length records, no additional data preparation is done.
    • For variable-length records, the record descriptor word (RDW) is retained as part of the data.
    • When the record format is undefined, each block is prefixed by an RDW where the first two bytes contain the length of the block including the RDW, and the third and fourth bytes contain zeros.
  • Compresses the data and writes it to the archive.

Each compressed member is marked as a binary file and the internal attribute value of the central file header is set to 0.

The following input data set attributes are also stored in the zip header extended field:
  • Data set organisation
  • Record format
  • Block size
  • Logical record length
When decompressing binary data, the HZAZIP utility performs the following processing:
  • To establish the length of the record, the following is done depending on the record format of the original input data set:
    • For fixed-length records, the original record length is used.
    • For other record formats, 4 bytes from the archive are decompressed and examined to determine if they form a valid RDW. If so, the RDW length indication is used, and if not, then the data is treated as a byte stream where record boundaries do not need to be preserved.
  • Data is decompressed and written as a record of the determined length. Maximum-length records are written when the data is assessed to be a byte stream.

During decompression of binary data, the embedded RDWs are checked for validity. If an RDW does not indicate a positive length greater than 4 or does not end with two bytes of zeros, the HZAZIP utility switches to byte stream mode. In byte stream mode, the utility considers data as a stream of bytes without an inherent record structure. If the RDW that fails the validity test is the first four bytes of the file, the resultant decompression is broadly compatible with the decompression that most other platforms perform and the utility issues an informational message. If the RDW that fails the validity test is not at the start of the file, the utility issues a warning message, sets the final condition code to be greater than zero, but continues processing so that the output data is available for any necessary data recovery activity.