signalfx/splunk-otel-collector v0.155.0 on GitHub

This Splunk OpenTelemetry Collector release includes changes from the opentelemetry-collector v0.155.0
and the opentelemetry-collector-contrib v0.155.0 releases where appropriate.

🛑 Breaking changes 🛑

(Splunk) signalfx-forwarder: Remove the signalfx-forwarder and trace-forwarder monitors after their deprecation period. (#7698)
The signalfx-forwarder and trace-forwarder monitors have been removed.
Please use the
signalfxreceiver
instead.
(Core) processor/memory_limiter: Rename deprecated memory limiter metrics to include the memory_limiter prefix (e.g. otelcol_processor_memory_limiter_*) to clarify they are specific to this processor. (#11203)

(Contrib) exporter/signalfx: Stop calculating per-core cpu.* metrics disabled by default. (#49247)
The default transformations still create aggregate CPU metrics. However, per-core cpu.* metrics which are disabled by default aren't produced by the default transformations anymore.
This change doesn't have any impact unless any of cpu.* metrics are explicitly enabled with the cpu attribute in signalfx exporter with configuration like this:

exporters:
  signalfx:
    include_metrics:
      - metric_name: cpu.idle
        dimensions:
          cpu: ["*"]

In that case, the same metrics can be restore by applying the transform processor the following way:

receivers:
  hostmetrics:
    scrapers:
      cpu:
        metrics:
          system.cpu.time:
            enabled: true
            attributes: [cpu, state]
processors:
  transform/cpu_idle_per_core:
    error_mode: ignore
    metric_statements:
      - context: metric
        statements:
          - copy_metric(name="cpu.idle") where metric.name == "system.cpu.time"
      - context: datapoint
        statements:
          - set(datapoint.value_double, 0.0) where metric.name == "cpu.idle" and datapoint.attributes["state"] != "idle"
      - context: metric
        statements:
          - aggregate_on_attributes("sum", ["cpu"]) where metric.name == "cpu.idle"
          - scale_metric(100.0) where metric.name == "cpu.idle"
      - context: datapoint
        statements:
          - set(datapoint.value_int, Int(datapoint.value_double)) where metric.name == "cpu.idle"
service:
  pipelines:
    metrics:
      receivers: [hostmetrics]
      processors: [transform/cpu_idle_per_core]
      exporters: [signalfx]

(Contrib) exporter/signalfx: Stop calculating cpu.utilization_per_core disabled by default. (#49243)
The exporter still creates the aggregate cpu.utilization metric by default. However, cpu.utilization_per_core which is disabled by default isn't produced by the default transformations anymore.
This change doesn't have any impact unless cpu.utilization_per_core metric is explicitly enabled in signalfx exporter with configuration like this:

exporters:
  signalfx:
    include_metrics:
      - metric_name: cpu.utilization_per_core

In that case, the same metric can be restore by applying the transform processor the following way:

receivers:
  hostmetrics:
    scrapers:
      cpu:
        metrics:
          system.cpu.utilization:
            enabled: true
            attributes: [cpu, state]
processors:
  transform/cpu_utilization_per_core:
    error_mode: ignore
    metric_statements:
      - context: metric
        statements:
          - set(metric.name, "cpu.utilization_per_core") where metric.name == "system.cpu.utilization"
      - context: datapoint
        statements:
          - set(datapoint.value_double, 0.0) where metric.name == "cpu.utilization_per_core" and datapoint.attributes["state"] == "idle"
      - context: metric
        statements:
          - aggregate_on_attributes("sum", ["cpu"]) where metric.name == "cpu.utilization_per_core"
service:
  pipelines:
    metrics:
      receivers: [hostmetrics]
      processors: [transform/cpu_utilization_per_core]
      exporters: [signalfx]

(Contrib) processor/k8s_attributes: Remove deprecated gate k8sattr.labelsAnnotationsSingular.allow (#48977)
(Contrib) processor/tail_sampling: Remove stable gate processor.tailsamplingprocessor.disableinvertdecisions (#48976)
(Contrib) receiver/oracledb: Set db.namespace to database name and add oracle.db.service attribute on query sample and top query events. (#48996)
On db.server.query_sample and db.server.top_query events, db.namespace
now reports the database name instead of the Oracle service
name. The service name moves to the new oracle.db.service attribute.

🚩 Deprecations 🚩

(Contrib) receiver/splunk_enterprise: Rename receiver type from splunkenterprise to splunk_enterprise (#45339)

💡 Enhancements 💡

(Splunk) config: Add AWS Secrets Manager (secretsmanager:) and Google Secret Manager (googlesecretmanager:) confmap providers (#7682)
Collector configurations can now reference secrets directly using the new URI schemes:
- secretsmanager:NAME_OR_ARN[#json_key][:-default] for AWS Secrets Manager
- googlesecretmanager:projects/PROJECT_ID/secrets/SECRET_NAME/versions/VERSION for Google Secret Manager
(Splunk) packaging: Update Splunk OpenTelemetry Java agent to v2.29.0 (#7687)
(Splunk) packaging: Update JMX metrics gatherer to v1.58.0 (#7688)
(Splunk) packaging: Update Splunk OpenTelemetry Node.js agent to v4.9.0 (#7681)
(Contrib) connector/routing: Add connector.routing.defaultErrorModeIgnore feature gate to change default error_mode from propagate to ignore (#48418)
(Contrib) exporter/file: Add feature gate for native file-level compression in file exporter (#44077)
A new alpha feature gate exporter.file.nativeCompression enables native file-level zstd compression.
When enabled, the exporter produces standard .zst files that can be decompressed with zstd -d,
unlike the legacy per-message compression format which requires custom tooling.
(Contrib) exporter/google_cloud_storage: Add universe_domain config option to support Sovereign Google Cloud regions. Setting this field passes option.WithUniverseDomain to the underlying Google API client. (#48924)
(Contrib) exporter/google_cloud_storage: Add resource_attrs_to_gcs to partition objects by a resource attribute value. (#49136)
When resource_attrs_to_gcs.prefix is set, the value of the given resource attribute
(read from the first resource of each batch) is inserted as a partition path segment
between bucket.partition.prefix and the time-based bucket.partition.format, mirroring
the awss3exporter resource_attrs_to_s3 behavior.
(Contrib) exporter/kafka: Add producer.max_broker_write_bytes config (#47492)
The maximum size of a single write to a broker was previously fixed at the underlying
franz-go default of 100 MiB and could not be configured. As a result, setting
producer.max_message_bytes above 100 MiB passed configuration validation but caused the
collector to fail on startup with an unrecoverable error ("max broker write bytes ... is
erroneously less than max record batch bytes ...").
The new producer.max_broker_write_bytes setting (default 104857600, i.e. 100 MiB) exposes
this limit. To send messages larger than 100 MiB, raise it so it is greater than or equal to
max_message_bytes. Configuration is now validated up front: the collector reports a clear
error if max_broker_write_bytes is below the 100 MiB minimum or smaller than
max_message_bytes, rather than failing at runtime.
(Contrib) exporter/load_balancing: Reduce CPU usage and memory allocations when routing traces by traceID (the default routing key) (#48983)
Routing decisions are unchanged. Spans are now regrouped per backend, so the exact
ResourceSpans/ScopeSpans grouping of exported traces may differ from the input. If a downstream
consumer is sensitive to this, a groupbyattrsprocessor on the receiving end can recompact the
ResourceSpans.
(Contrib) exporter/splunk_hec: Support exporting profiles (#48598)
(Contrib) processor/resource_detection: Add GCP Cloud Run Worker Pool detector to the resource detection processor (#48931)
(Contrib) processor/transform: Improve merge_histogram_buckets with method="limit_buckets" to compact buckets closer to the configured limit. (#49020)
(Contrib) receiver/awscloudwatch: Adds a new configuration option initial_lookback to the AWS CloudWatch Logs receiver for specifying how far back from the collector's startup time to begin collecting logs. (#47754)
(Contrib) receiver/azure_event_hub: Add the ability to use encoding extensions to the Azure Event Hub receiver. (#48753)
(Contrib) receiver/file_log: Improve polling performance when watching many files by indexing fingerprint matching. (#27404)
(Contrib) receiver/googlecloudpubsub: Add universe_domain config option to support Sovereign Google Cloud regions. Setting this field passes option.WithUniverseDomain to the underlying Google API client. (#48924)
(Contrib) receiver/host_metrics: Enable the Android platform in the process scraper. (#47296)
(Contrib) receiver/host_metrics: Add AIX-specific process scraper implementation. (#47095)
Implements AIX versions of the platform-specific process scraper hooks
(CPU time/utilization recording, process name, executable, and command
extraction), replacing the previous empty stubs that the "others"
fallback provided.
(Contrib) receiver/oracledb: Enhance SQL obfuscation to anonymize comments while preserving query structure (#48508)
- Query structure, formatting, and whitespace are now preserved during obfuscation for improved readability.
(Contrib) receiver/oracledb: Add additional attributes to the Oracle query execution plan. (#48965)
Extend the execution plan with five additional attributes:
- OBJECT_NAME: Name of the object referenced by the plan step
- OBJECT_TYPE: Type of the referenced object (e.g., table, index)
- FILTER_PREDICATES: Predicates applied as a filter during the step
- PARTITION_START: Starting partition for partitioned access
- PARTITION_STOP: Ending partition for partitioned access
(Contrib) receiver/oracledb: Add oracledb.plan.first_load attribute to the db.server.top_query event (#48998)
(Contrib) receiver/oracledb: Add SQL comment extraction support. Users can now configure allowed_comment_keys to extract key-value pairs from leading SQL block comments and include them as the db.query.comment_tags telemetry attribute. (#48338)
(Contrib) receiver/prometheus: Add scrape_on_shutdown, discovery_reload_on_startup, and initial_scrape_offset configuration options to allow tuning startup and shutdown scrape behavior in serverless environments. (#48979)
- scrape_on_shutdown (default: false): Enables a final scrape before the receiver closes.
- discovery_reload_on_startup (default: false): Enables discovering targets immediately on startup.
- initial_scrape_offset (default: 0s): Adds a fixed delay before the initial scrape of targets.
(Contrib) receiver/sqlserver: Add sqlserver.query.plan.creation_time attribute to the db.server.top_query event. (#49018)
(Contrib) receiver/sqlserver: Add service.name and service.namespace opt-in resource attributes and allow overriding any resource attribute via override_value. (#46176)
When service.name is enabled, the receiver sets it to unknown_service:microsoft.sql_server per OTel specification.
When service.namespace is enabled, it defaults to an empty string until set via configuration.
Each resource attribute now accepts an override_value under resource_attributes, letting users pin
values such as service.name, service.namespace, or service.instance.id to uniquely identify
database instances across environments.

🧰 Bug fixes 🧰

(Splunk) installer: Fix Windows installation script to not overwrite custom properties passed to the MSI (#7700)
(Splunk) installer: Fix the Windows MSI so the collector service starts when SPLUNK_CONFIG is provided as an install property. (#7701)
Since v0.154.0, tooling that passes SPLUNK_CONFIG as an MSI property (e.g. the Puppet and Ansible
modules) produced a service with no config, so it failed to start. The MSI now routes a supplied
SPLUNK_CONFIG into COLLECTOR_SVC_ARGS as a --config "<path>" argument.
(Contrib) exporter/load_balancing: Fix Kubernetes resolver initialization to allow exporter creation outside k8s cluster by deferring client creation to start time (#42293)
(Contrib) exporter/load_balancing: Fix a wait-group leak on the trace routing path that could cause Shutdown to hang when backend resolution fails partway through a batch (#48983)
(Contrib) extension/google_cloud_logentry_encoding: Accept short ALPN protocol tokens (e.g. h2, h3) in httpRequest.protocol that do not contain a /; previously any protocol string without a slash was rejected with an error, causing log entries from Google Cloud Load Balancers that switched to reporting h2 for HTTP/2 to be silently dropped. (#45214)
(Contrib) processor/k8s_attributes: Prevent unbounded memory growth by cleaning up stale pod identifiers, including container.id entries left behind after container restarts (#48398)
(Contrib) processor/metrics_transform: Add required-field validation for combine action: return an error if new_name or aggregation_type is missing, preventing silent data loss and empty metric names. (#48871)
(Contrib) processor/redaction: Fix a panic in database attribute sanitization when traces are processed concurrently. (#49048)
(Contrib) processor/transform: Fix transform processor config unmarshaling to return an error for empty statement list items instead of panicking. (#49245)
(Contrib) receiver/azure_monitor: Fix discovery and collection of custom metric namespace definitions (e.g. azure.vm.linux.guestmetrics published by Azure Monitor Agent / MetricsExtension) (#40989)
The MetricDefinitions API only returns custom namespace metrics when the metricnamespace query parameter is explicitly set. Previously, metrics configured under receiver::metrics for a custom namespace were silently dropped because the API call used no filter and only returned the resource's default namespace. The receiver now makes an additional namespace-filtered call for each custom namespace in the metrics config that was not returned by the default call.
(Contrib) receiver/http_check: Stop emitting two httpcheck.tls.cert_remaining data points per scrape (#47740)
(Contrib) receiver/kafka_metrics: use kadm.Client.Lag and do not record negative values (#48701)
(Contrib) receiver/oracledb: Clamp negative DURATION_SEC and SESSION_DURATION_SEC values in the query sample to zero. (#48901)
Fix negative duration values in query sample metrics by clamping DURATION_SEC and SESSION_DURATION_SEC to zero minimum.
(Contrib) receiver/purefa: Fix the receiver failing to start due to an invalid internal Prometheus scrape configuration. (#48847)
The receiver used a *discovery.StaticConfig pointer when building the Prometheus scrape
configs. Only the value type discovery.StaticConfig is registered for YAML marshaling, so
the prometheus receiver failed on startup with "cannot marshal unregistered Config type:
*discovery.StaticConfig". The config now uses the value type.
(Contrib) receiver/statsd: Clean up stale unix socket file on startup to prevent "address already in use" errors after unclean shutdown. (#44866)
(Contrib) receiver/statsd: Skip empty tag entries instead of aborting the tag parse loop, so valid tags after an empty entry are no longer dropped. (#48483)
Previously, tags containing an empty entry (e.g. from consecutive commas like |#,,key:value) caused the
parser to exit the loop on the first empty entry, silently dropping all subsequent valid tags. Empty entries
are now skipped, matching the permissive behavior of the Datadog agent.

Docker Image Manifests:

Linux (amd64, arm64, ppc64le) and Windows (2019 amd64, 2022 amd64):

quay.io/signalfx/splunk-otel-collector:0.155.0
digest: sha256:df3c302ca23928d7fb5031e52a174b253b44b14c325bbad9fe4dcab36b7e8efa