Operator Metrics

How to monitor and consume metrics emitted by CAP Operator

Overview

The CAP Operator includes built-in Prometheus metrics that enable users to effectively monitor and analyze the operator’s performance. These metrics can provide insights into resource usage, potential issues, and overall operator health. The metrics are accessible at the /metrics endpoint on port 9090 of both the Controller and the Subscription Server.

Controller Metrics

The Controller emits several types of metrics, including:

  • Standard Go Metrics: These metrics are provided by the Prometheus Go client and include runtime statistics.
  • Workqueue Metrics: These metrics are relevant to the resources being reconciled and are based on the MetricsProvider.
  • Specific metrics: mentioned below.

Specific Metrics

  1. Reconcile Errorscap_op_reconcile_errors, e.g.:
cap_op_reconcile_errors{kind="CAPApplication",name="my-app",namespace="app"} 11
  • Type: Counter
  • Description: This metric tracks the total number of resources that failed to reconcile for each resource kind, such as CAPApplication.
  1. Tenant Operationscap_op_tenant_operations, e.g.
cap_op_tenant_operations{app="<hash>",operation="provisioning"} 83
  • Type: Counter
  • Description: This metric provides insights into overall tenant operations being performed.

Detailed Operational Metrics

To gain deeper insights, you can enable more granular metrics by setting the environment variable DETAILED_OPERATIONAL_METRICS to "true".

  1. Failed Tenant Operationscap_op_tenant_operation_failures, e.g.:
cap_op_tenant_operation_failures{app="<hash>",operation="upgrade",tenant_id="<guid>",namespace="app",name="my-app-tenant-op-xxyyz"} 2
  • Type: Counter
  • Description: This metric reveals the number of failed tenant operations, categorized by application, tenant ID, and specific operation details.
  1. Last Tenant Operation Durationcap_op_last_tenant_operation_duration_seconds, e.g.:
cap_op_last_tenant_operation_duration_seconds{app="<hash>",tenant_id="<guid>"} 17
  • Type: Gauge
  • Description: This metric measures the duration (in seconds) of the last tenant operation for a specified application and tenant.

Subscription Server Metrics

The Subscription Server emits the following metrics:

Specific Metrics

  1. Subscription Requests Totalcap_op_subscription_requests_total, e.g.:
cap_op_subscription_requests_total{code="202",method="POST"} 2024
  • Type: Counter
  • Description: This metric tracks the total number of subscription requests that were processed, categorized by HTTP method and response code.
  1. Inflight Subscription Requestscap_op_subscription_requests_inflight, e.g.:
cap_op_subscription_requests_inflight{} 4
  • Type: Gauge
  • Description: This metric indicates the number of subscription requests currently being processed by the server.

Conclusion

The CAP Operator provides a rich set of metrics to facilitate monitoring and operational insights. By effectively leveraging these metrics, you can monitor and ensure the reliability and performance of your applications. For further details, consider exploring the Prometheus documentation and integrating these metrics into your monitoring systems.

Last modified December 20, 2024: [Docu] website: Documentation enhanced (9b7b561)