Metrics Endpoints

The /metrics endpoint provides metrics for the application that are collected via the MetricsCollector class. It uses the prom-client library and performance hooks from Node.js to gather and expose the metrics data in a format that can be scraped by Prometheus.

Metrics Exposed

The MetricsCollector exposes the following metrics:

  • pepr_errors: A counter that increments when an error event occurs in the application.
  • pepr_alerts: A counter that increments when an alert event is triggered in the application.
  • pepr_mutate: A summary that provides the observed durations of mutation events in the application.
  • pepr_validate: A summary that provides the observed durations of validation events in the application.
  • pepr_cache_miss: A gauge that provides the number of cache misses per window.
  • pepr_resync_failure_count: A gauge that provides the number of unsuccessful attempts at receiving an event within the last seen event limit before re-establishing a new connection.

Environment Variables

| PEPR_MAX_CACHE_MISS_WINDOWS | Maximum number windows to emit pepr_cache_miss metrics for | default: Undefined |

API Details

Method: GET

URL: /metrics

Response Type: text/plain

Status Codes:

  • 200 OK: On success, returns the current metrics from the application.

Response Body: The response body is a plain text representation of the metrics data, according to the Prometheus exposition formats. It includes the metrics mentioned above.

Examples

Request

GET /metrics

Response

  `# HELP pepr_errors Mutation/Validate errors encountered
  # TYPE pepr_errors counter
  pepr_errors 5

  # HELP pepr_alerts Mutation/Validate bad api token received
  # TYPE pepr_alerts counter
  pepr_alerts 10

  # HELP pepr_mutate Mutation operation summary
  # TYPE pepr_mutate summary
  pepr_mutate{quantile="0.01"} 100.60707900021225
  pepr_mutate{quantile="0.05"} 100.60707900021225
  pepr_mutate{quantile="0.5"} 100.60707900021225
  pepr_mutate{quantile="0.9"} 100.60707900021225
  pepr_mutate{quantile="0.95"} 100.60707900021225
  pepr_mutate{quantile="0.99"} 100.60707900021225
  pepr_mutate{quantile="0.999"} 100.60707900021225
  pepr_mutate_sum 100.60707900021225
  pepr_mutate_count 1

  # HELP pepr_validate Validation operation summary
  # TYPE pepr_validate summary
  pepr_validate{quantile="0.01"} 201.19413900002837
  pepr_validate{quantile="0.05"} 201.19413900002837
  pepr_validate{quantile="0.5"} 201.2137690000236
  pepr_validate{quantile="0.9"} 201.23339900001884
  pepr_validate{quantile="0.95"} 201.23339900001884
  pepr_validate{quantile="0.99"} 201.23339900001884
  pepr_validate{quantile="0.999"} 201.23339900001884
  pepr_validate_sum 402.4275380000472
  pepr_validate_count 2

  # HELP pepr_cache_miss Number of cache misses per window
  # TYPE pepr_cache_miss gauge
  pepr_cache_miss{window="2024-07-25T11:54:33.897Z"} 18
  pepr_cache_miss{window="2024-07-25T12:24:34.592Z"} 0
  pepr_cache_miss{window="2024-07-25T13:14:33.450Z"} 22
  pepr_cache_miss{window="2024-07-25T13:44:34.234Z"} 19
  pepr_cache_miss{window="2024-07-25T14:14:34.961Z"} 0

  # HELP pepr_resync_failure_count Number of retries per count
  # TYPE pepr_resync_failure_count gauge
  pepr_resync_failure_count{count="0"} 5
  pepr_resync_failure_count{count="1"} 4

Prometheus Operator

If using the Prometheus Operator, the following ServiceMonitor example manifests can be used to scrape the /metrics endpoint for the admission and watcher controllers.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: admission
spec:
  selector:
    matchLabels:
      pepr.dev/controller: admission
  namespaceSelector:
    matchNames:
    - pepr-system
  endpoints:
  - targetPort: 3000
    scheme: https
    tlsConfig:
      insecureSkipVerify: true
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: watcher
spec:
  selector:
    matchLabels:
      pepr.dev/controller: watcher
  namespaceSelector:
    matchNames:
    - pepr-system
  endpoints:
  - targetPort: 3000
    scheme: https
    tlsConfig:
      insecureSkipVerify: true