Extensible metrics for monitoring and alerts

You can use the HCL Commerce Version 9.1 Metrics Monitoring framework with built-in performance dashboards, or build your own. The monitoring data are collected using Micrometer and are provided in the industry-standard Prometheus format. This means you can use them with many different tools. HCL provides a set of Grafana dashboards to get you started.

You can also use this Metrics Monitoring framework to visualize the cache requests sent and received by Nifi . Using this new API, http://NIFIHOST:30690/monitor/metrics, the monitoring data can be collected in the industry-standard Prometheus format which can be used in Grafana or any other different tools to visualize the cache requests sent and received by Nifi.

There are three parts to the monitoring framework. First, a fully-customizable presentation layer enables you to use your preferred tools to report and analyze your systems' performance. The flexibility of this layer comes from its use of a vendor-neutral, industry-standard data-representation language. This is the open-source Prometheus toolkit. Finally, Prometheus gets its data from the fully-customizable Micrometer library, which "scrapes" the data from your containers.

Note: For more information on using of Grafana and sample dashboards, see HCL Commerce Monitoring - Prometheus and Grafana Integration.

Reporting and dashboarding

The top of the framework is the reporting layer. Because your data is represented in the Prometheus format, you can use many different tools to display and analyze it. One popular dashboarding tool is Grafana (https://grafana.com/). Grafana is often used with Prometheus to provide graphical analysis of monitoring data.

You can download the HCL Commerce Grafana Dashboards from the HCL License and Delivery portal. For more information on the available HCL Commerce Grafana Dashboard package, see HCL Commerce eAssemblies.

The Prometheus toolkit

HCL Commerce metrics use the Prometheus text-based exposition format. Although this format is native to the Prometheus monitoring framework (https://prometheus.io/), the popularity of the format has led to widespread adoption and support. For example, Prometheus joined the Cloud Native Computing Foundation in 2016 as the second hosted project, after Kubernetes.

Micrometer application monitoring

Monitoring and performance data is scraped using the JVM-based Micrometer instrumentation library. The key concept for Micrometer is the meter. A rich set of predefined meter primitives exist that define times, counters, gauges and other data collection types. You can use the default meters to aggregate performance and monitoring data from your containers, or customize your own.

Metrics for the performance of each container are exposed at its /monitor/metrics endpoint. They are collected by a process known as “scraping.” Micrometer scrapes the metrics endpoint on all containers at a configurable internal. The metrics are stored in a database where other services can access them. In Kubernetes environments, scrapers also add contextual metadata to the metrics obtained by endpoints, such as the service, namespace, and pod that identify the origin of the data.

Configuring meters

Metrics are enabled by default when using the HCL Commerce Helm charts. They can also be enabled by configuring the environment variable:
EXPOSE_METRICS=true
Metrics are exposed on each pods on the following paths and ports:
Deployment Path Metrics port (http)
demoqaauthcrs-app /monitor/metrics 8280
demoqaauthsearch-* /monitor/metrics 3280
demoqaliveingest-app /monitor/metrics 30880
demoqalivequery-app /monitor/metrics 30280
demoqaauthts-app /monitor/metrics 5280
demoqaauthxc-app /monitor/metrics 9280
demoqaingest-app /monitor/metrics 30880
demoqalivequery-app /monitor/metrics 30280

In addition to enabling metrics, the Helm chart exposes the metrics port thru the services, and offers the option to define a servicemonitor ( metrics.servicemonitor.enabled, metrics.servicemonitor.namespace) for use with Prometheus Operator.

Implementing custom meters

In addition to the default set of meters, you can add your own. When meters are enabled, the Metrics class makes the global registry available. Meters added to the global registry are automatically published to the metrics endpoint.

New meters can be added to the registry by using the Micrometer APIs. See the Micrometer Javadoc for API details: https://javadoc.io/doc/io.micrometer/micrometer-core/1.3.5/index.html.

Samples

The following examples show how metrics can be used from custom code.

Counters
A positive count that can be increased by a fixed amount. For example, “number of requests.” Prometheus includes functions such as rate() and increase() that can be used to protect against counter resets
See the following examples for defining a Counter type:
Adding a new Counter with known value labels
private static Counter BACKEND_COUNTER = 
        Metrics.isEnabled() 
            ? Counter.builder( "backend.calls.total" )
               .tags( "result", "ok" )
               .description("Number of successful backend requests")
               .builder.register( Metrics.getRegistry())
            : null;
…

if ( BACKEND_COUNTER != null ) {
   BACKEND_COUNTER.increment();
}
Adding a new Counter with labels unknown in advance
if ( Metrics.isEnabled() ) {

   Metrics.getRegistry().counter( 
      "backend.calls.total", 
      "result",
      myGetResult()
     ).increment();  
}
Timers
Timers are used to track the duration and frequency of events. Besides calculating the average durations, the API allows to configure a set of Service Level Objectives (SLO), which are translated to histogram buckets. SLOs can also be used to calculate quantiles. For more information, see Histograms and Summaries on the Prometheus website.

The Metrics class defines SLOs for common usages. For example, Metrics.DEFAULT_SLO_REST_DURATIONS_NAME defines buckets that are appropriate for typical REST execution times. If your timer doesn’t match these durations, you can specify new values as a long array. For more information, see .sla() in the Timer.Builder class definition on the Micrometer website.

Example: Adding a new Timer with known value labels

private static Timer BACKEND_TIMER = 
        Metrics.isEnabled() 
            ? Timer.builder( "backend.calls.duration" )
               .tags( "result", "ok" )
               .sla( Metrics.getSLOsByName(Metrics.DEFAULT_SLO_REST_DURATIONS_NAME) )  
               .description("Duration of successful backend requests")
               .builder.register( Metrics.getRegistry())
            : null;

…

if (BACKEND_TIMER != null ) {
  startTime = System.nanoTime();
}

doWork();

if (BACKEND_TIMER != null ) {
  final long deltaTime = System.nanoTime() - startTime;
  BACKEND_TIMER .record(deltaTime, TimeUnit.NANOSECONDS );
}

When using a Timer with label values that are not known in advance, the micrometer API doesn’t allow for SLO (.sla(..)) to be specified. In order to achieve this, define a meter filter to merge the config. The Metrics.applySLO(final String metricName, final long[] slos) or Metrics.applySLO(final String metricName, final String name) utility methods can be used for the same.

Example: Adding a new Timer with label values not known in advance
private static String TIMER_NAME = "backend.calls.duration";

static {
   Metrics.applySLO( TIMER_NAME, Metrics.DEFAULT_SLO_REST_DURATIONS_NAME );
}

…

if ( Metrics.isEnabled() ) {
  startTime = System.nanoTime();
}

doWork();

if ( Metrics.isEnabled() ) {
   final long deltaTime = System.nanoTime() - startTime;
   Metrics.getRegistry().timer(
        TIMER_NAME,
        "result",
        getResult() )
            .record( deltaTime , TimeUnit.MILLISECONDS ); 

}
Gauges
A gauge holds a value that can increase and decrease over time. The meter is mapped to a function to obtain the value. Examples include number of active sessions and current cache sizes.
Example: Defining a gauge
class MyService {

 private Gauge myActiveClientsGauge;

 final void setup() {
   if (Metrics.isEnabled()) {
     myActiveClientsGauge = Gauge.builder( "myservice.activeclients", this,       
                                MyService::getActiveClients )
      .tags("endpoint", getEndpointName())
      .register(Metrics.getRegistry());
   }
 }
 
 private double getActiveClients() {		
   return nActiveClients; 
 }
}