Telemetry signals in operators

Telemetry data is useful for your own observability and monitoring systems. There is no "call-home" functionality. Data is not exported out of the container unless explicitly configured.

Since SDP 25.7.0, all Stackable operators emit telemetry data in the following ways:

  • Console logs in plain and JSON format

  • File logs in JSON format (with the option of Rolling files)

  • OpenTelemetry logs exported via OTLP

  • OpenTelemetry traces exported via OTLP

Every subscriber can be toggled and customized using Helm values. All fields are grouped under the top-level telemetry section.

The following sections describe the available fields as well as their default and supported values. If the Helm values explained below are not overridden, the following operator defaults apply:

  • Levels for all signals are set to INFO.

  • Console logs are enabled

  • File logs are disabled

  • File logs rotation period is set to Never

  • File logs max files is unset

  • OpenTelemetry logs and traces are disabled

  • OTLP endpoints are set to http://localhost:4317

Console logs

This signal prints log messages at the selected level and in the selected format to STDOUT. These logs are useful for quick debugging. For a more complete debugging experience, we recommend the OpenTelemetry signals.

telemetry:
  consoleLog:
    enabled: true (1)
    level: null   (2)
    format: null  (3)
1 Boolean: true, false
2 String: error, warn, info, debug, trace, off (or more complex filters)
3 Enum: plain, json

File logs

This signal writes log messages at the selected level in JSON to (rolling) log file(s). These logs can be picked up by a log aggregation system, like Vector.

telemetry:
  fileLog:
    enabled: false         (1)
    level: null            (2)
    rotationPeriod: hourly (3)
    maxFiles: 6            (4)
1 Boolean: true, false
2 String: error, warn, info, debug, trace, off (or more complex filters)
3 Enum: never, daily, hourly, minutely
4 Unsigned Integer

OpenTelemetry logs and traces

These two signal export OpenTelemetry logs and traces to OTLP enabled collectors. These signals can be visualized using tools, like Grafana, Loki, and Jaeger.

telemetry:
  otelLogExporter:
    enabled: false (1)
    level: null    (2)
    endpoint: null (3)
  otelTraceExporter:
    enabled: false (1)
    level: null    (2)
    endpoint: null (3)
1 Boolean: true, false
2 String: error, warn, info, debug, trace, off (or more complex filters)
3 String: E.g. https://my-collector:4317 (Note: it must contain the scheme)

OpenTelemetry with Vector

OpenTelemetry signals can be configured to behave like product logging. Using the vector helmChart requires to configure containerPorts and service.ports as well as customConfig.sources.otel:

containerPorts:
  - {name: vector,    containerPort: 6000, protocol: TCP}
  - {name: otel-grpc, containerPort: 4317, protocol: TCP}
  - {name: otel-http, containerPort: 4318, protocol: TCP}
service:
  ports:
  - {name: vector,    port: 6000, targetPort: 6000, protocol: TCP}
  - {name: otel-grpc, port: 4317, targetPort: 4317, protocol: TCP}
  - {name: otel-http, port: 4318, targetPort: 4318, protocol: TCP}
customConfig:
  sources:
    otel:
      type: opentelemetry
      grpc:
        address: 0.0.0.0:4317
      http:
        address: 0.0.0.0:4318

The endpoint in the operator’s Helm values must point to the ports defined above.

telemetry:
  otelLogExporter:
    enabled: true
    endpoint: http://vector-aggregator.<namespace>.svc.cluster.local:4317
  otelTraceExporter:
    enabled: true
    endpoint: http://vector-aggregator.<namespace>.svc.cluster.local:4317

Normalizing operator logs to the Stackable log schema

Operators do not run vector sidecars. Using OpenSearch, a VRL transform that maps the OTLP fields onto the Stackable log schema is needed to avoid re-indexing:

customConfig:
  transforms:
    normalize_otel_logs:
      type: remap
      inputs:
        - otel.logs (1)
      source: |
        service_name = get(.resources, ["service.name"]) ?? null
        .pod         = service_name (2)
        .container   = service_name (2)
        .namespace   = get(.resources, ["k8s.namespace.name"]) ?? null
        .logger      = string(.scope.name) ?? null (3)
        .level       = string(.severity_text) ?? null
        .built_info  = object(.attributes) ?? {} (4)
        .cluster     = null
        .role        = null
        .roleGroup   = null
        .file        = null
        .source_type = "opentelemetry"
        del(.severity_text)
        del(.severity_number)
        del(.resources)
        del(.attributes)
        del(.scope)
        del(.trace_id)
        del(.span_id)
        del(.flags)
        del(.observed_timestamp)
        del(.dropped_attributes_count)
  sinks:
    opensearch:
      inputs:
        - vector
        - normalize_otel_logs (5)
1 The opentelemetry source fans out into otel.logs, otel.metrics, and otel.traces. Unconsumed sub-streams produce a startup warning.
2 Operators emit service.name (e.g. stackable-airflow-operator) as their primary identity. It is mapped to both .pod and .container but can be adapted to individual needs.
3 The OTLP scope.name contains the Rust module path (e.g. kube_runtime::controller) and maps to the .logger field.
4 Log-record attributes (e.g. built_info.*) are collected into a single object to keep the top-level schema flat.
5 Reference transformer output as sinks input.

Namespace information is currently not emitted by the operator