HCL Commerce Version 9.1.9.0

maintenance

implements a number of required maintenance processes.

To support features such as invalidation by dependency ID, the maintains metadata information for each cache entry. This metadata cannot be expired or evicted by Redis as this would lead to inconsistences such as missed invalidations.

implements a number of background processes to maintain the metadata information:

Expired maintenance
Removes metadata for objects that have expired.
Low memory maintenance
Triggers when Redis memory is close to full and removes soonest to expire cache entries to free up memory.

For more details, see Memory Management in Redis.

The maintenance jobs can add overhead to the Redis servers. It is important that performance test environments accurately simulate production environments, exercising the maintenance processes in a similar manner. For example, if the production environment typically fills up the Redis memory, the performance environment should do the same. Short tests (e.g. one hour in duration) might not be long enough to simulate expired and inactivity maintenance processing conditions.

See HCL Cache Maintenance for thelatest available configurations. Compared to the latest available, the key differences are as follows:

  • In Version 9.1.9, Low Memory Maintenance uses the fastest available configuration (cleanupRate) for Expired Maintenance. In Version 9.1.10+ Low Memory Maintenance has its own configuration.

  • Inactivity Maintenance is not available in Version 9.1.9.

Self-Adjusting maintenance processes

All maintenance processes implement a similar technique to self-adjust the speed of maintenance. Executing maintenance too quickly can impact performance, while if it is done too slowly, new data can be added at a rate that is faster than it is removed, leading to out of memory (OOM) situations. For example, Expired Maintenance adjusts the speed of maintenance considering the time since expiry of the oldest expired cache entries. If the time since expiry increases, it means expired maintenance is not running at a fast enough rate, and the speed is increased.

The maintenance processes also have configurations to determine how many cache entries are removed at once. This is required because Redis is single-threaded, and a large maintenance operation can block Redis:

numCacheIdPerLUACall
This is the maximum number of cache entries that will be inspected and processed by a LUA script. Increasing the number speeds up maintenance but can also block the Redis thread for a longer period.
numLUACallsInPipeline
The number of LUA scripts that are sent together as a batch. The Redis thread is only locked during each individual script execution.

LUA is a scripting language supported by Redis for server-side operations. LUA scripts are atomic and blocking.

Due to the self-adjusting nature of the maintenance processes, tuning should not typically be required, but performance testing is critical to confirm they run at optimal speeds.

Expired maintenance (onlineExpiredEntriesMaintenance)

While Redis automatically removes expired cached values from memory, the expired maintenance process is responsible for removing expired cache entries from the metadata (dependency IDs). This process runs from all the pods and the speed is determined by the age of the oldest expired entry pending maintenance.

Expired maintenance cleanup rates
The speed of maintenance adjusts depending on the age of the oldest expired entry. For example, if the maintenance process finds cache entries that have been expired for 12 minutes, it will use the maintenance configuration for objects from 10-13 minutes, which cleans at a rate of 20/ second.
newerThan:  300 secs (  5 mins) inLUA:  1 pipeline:  1 delayMs:  60000 -- speed:   0.02/sec,       1/min
newerThan:  600 secs ( 10 mins) inLUA:  2 pipeline:  5 delayMs:   1000 -- speed:     10/sec,     600/min
newerThan:  780 secs ( 13 mins) inLUA:  2 pipeline:  5  delayMs:   500 -- speed:     20/sec,   1,200/min
newerThan:  900 secs ( 15 mins) inLUA:  3 pipeline:  5 delayMs:    300 -- speed:     50/sec,   3,000/min
newerThan: 1200 secs ( 20 mins) inLUA:  3 pipeline: 10 delayMs:    250 -- speed:    120/sec,   7,200/min
newerThan: ~ ALL ~              inLUA:  5 pipeline: 10 delayMs:    200 -- speed:    250/sec,  15,000/min

For details on updating the configuration, see Updating the default maintenance values.

Expired maintenance details from the HCL Cache - Remote dashboard:

Low memory maintenance (onlineInactiveEntriesMaintenance)

with Redis does not perform well when memory is full. Processes, including maintenance processes, can fail with memory errors "command not allowed when used memory > 'maxmemory'". To prevent this situation, monitors the percentage of memory used and triggers Low-Memory maintenance processing to reduce the size of each cache. The processing removes both cached values and their associated cache entry metadata. The keys selected for removal are those sooner to expire. The Low Memory Maintenance job is scheduled from all the pods, but it can only be active from a single container at any one time.

Low memory maintenance in Redis enterprise
Due to differences in architecture, Redis Enteprise does not make used memory statistics available to the application. This is the trigger the Low Memory Maintenance process uses to determine when and how much maintenance is required. As a result, with Redis Enterprise, the softMaxSize configuration must be manually configured for each cache to define a maximum size in number of entries.
Low memory maintenance default configurations
The default configurations are as follows. For details on updating the configuration see Updating the default maintenance values.
Configuration Default Use
intervalSecs 120 Interval at which the Low-Memory maintenance job runs on each pod to check for memory conditions.
maxMemoryPercentage 93 If the percentage of memory used is at or above this configuration, the maintenace process must execute.
maintenancePercentageBuffer 5 The percentage of the cache that is removed. For example, if maxMemoryPercentage is 93% and maintenancePercentageBuffer is 5%, the target memory used after maintenance is 88%.
putOperationPausePercentage 5 This percentage is added to the maxMemoryPercentage. For example, if maxMemoryPercentage is 93% and putOperationPausePercentage is 5%, when used memory reaches 98%, caches stop inserting to the remote cache to allow maintenance to catch up.
softMaxSize -1 Used to set a maximum size in entries. It can be used in combination with maxMemoryPercentage.
Low memory maintenance cleanup rates
Low Memory Maintenance cleanup rates rely on the Expired Maintenance configuration. Low Memory Maintenance uses the fastest configuration available for Expired Maintenance which is currently 250 keys/sec or 15,000 keys/ min.

The default configuration of 250 keys/second might not be fast enough for certain production environments and might need to be updated. See Updating the Default Maintenance Values for details.

Updating the default maintenance values

Although due to the self-adjusting nature of the scripts, tuning may not be required, configurations can be changed by updating the Cache YAML configuration files. Configurations can be changed at the cache level, or for all caches by using defaultCacheConfig:
cacheConfigs:
   defaultCacheConfig:
     remoteCache:
       onlineExpiredEntriesMaintenance:
         ...
       onlineLowMemoryMaintenance:
         ...

The following links include YAML snippets for the default (starting) configurations:

For list configurations such as cleanupRate, customizations must re-define the whole list instead of individual elements.