grafana/beyla v1.9.0 on GitHub

What's Changed

Beyla 1.9.0 is released with major internal changes, in preparation to what's coming for the future Beyla 2.0 release.

Breaking changes 🔨

Removed `override_instance_id` configuration option

This option was aimed uniquely for debugging purposes.

More info: #1125

Fix instance and job in Prometheus exporter

Renaming target_instance Prometheus attribute to instance. Also, the job attribute has been added to Prometheus.

Now, all the metrics are consistent, no matter they are exported via OTEL or Prometheus.

More info: #1130

Set OTEL service name and namespace from application environment variables

If the application has set the OTEL_SERVICE_NAME or OTEL_SERVICE_NAMESPACE variables in its environment,
Beyla will use them to set the reported service name and namespace.

If the variables are not there, Beyla will use the previously existing mechanism to set service name and namespace.

Bug fixes 🐞

Fix cgroup ID parsing in newest Docker versions

More info: #1287

Fix OS capability checking

There were few bugs in the OS capability checking which are being fixed with this PR:

If SYS_ADMIN is present, it effectively means all capabilities.
If we have kernel older than 5.8, SYS_ADMIN is a must, the others weren't split off yet.
If we have NET_ADMIN we also have NET_RAW, so we can relax that check.

More info: #1131

What's new

Introduce option for high volume request tracking

Beyla tracks the full request completion time, this typically means we look to see if the application is responding
with more data after the first HTTP response. One example would be a large file download, where the majority of the time
is actually serializing the data on the wire. When the client uses keep-alive, we don't necessarily see the connection
close event, but we tell by new pushed requests that we should terminate an earlier request.

This approach doesn't work well in when there's high volume of requests, e.g. beyond our current map sizing. The delayed
requests will likely be booted out of the map before we have a chance to complete them.

The BEYLA_BPF_HIGH_REQUEST_VOLUME configuration option forces Beyla to complete the request as soon as the response
is finished. It will produce less accurate accounting for large file downloads, but it will avoid no data for high
volume of requests.

More info: #1192

Use `scratch` as the base to build the Beyla docker images

It provides smaller images, as well as removing the risk for any potential vulnerability in the base image.

More info: #1367

Kubernetes: no need for a privileged init container anymore

The way Beyla internally mounts and shares some eBPF data structures has changed. This removes the necessity of
giving Beyla elevated privileges, or creating a privileged init container to mount the BPF file system.

More info: #1251

Experimental: Kubernetes API cache service

⚠️ This is an experimental service aimed only for developer preview. Expect breaking changes. Make sure that the
deployed image of the cache service (grafana/beyla-k8s-cache:1.9.x) matches the
version of the Beyla image

To decorate the traces and metrics with Kubernetes metadata, each Beyla instance establishes a connection to the
Kubernetes cache service. On big clusters (500+ nodes, 500+ Beyla instances), this action could greatly overload the
Kubernetes API because listening for cluster-global resources is really expensive.

Experimentally, you can configure Beyla to move the Kube API subscription logic to an external service (with fewer
instances), and connect Beyla to the Kubernetes API cache service instead of the Kubernetes API directly.

The easiest way to enable this service is via our latest Helm chart, in values.yml:

k8sCache:
  replicas: <typically 1 cache replica for 50 Beyla instances>

Other changes/additions

Add 'watch services' permission to unprivileged example by @marevers in #1126
Deduplicate instance ids and restore target_instance in Prometheus by @mariomac in #1129
Update OTEL collector library to v0.108.1 by @mariomac in #1133
Helm chart: allow unprivileged deployment of Beyla by @marevers in #1128
Update OTEL collector library to v0.108.1 (1.8 backport) by @mariomac in #1134
Automatic update of offsets.json by @github-actions in #1136
Docs: Fix link to 'Beyla and Kubernetes walkthrough' by @marevers in #1141
Update rust test dependencies versions by @rafaelroquetto in #1142
Automatic update of offsets.json by @github-actions in #1149
Refactor to have only one Go tracer by @marctc in #1132
Update rails test Dockerfile by @rafaelroquetto in #1148
Add target for ARM integration tests by @rafaelroquetto in #1139
Avoid that a Pod update removes the container metadata by @mariomac in #1156
Add Linux Traffic Control probes for App O11y by @grcevski in #1160
Increase buffer size to 192 to capture longer URLs by @marevers in #1150
Process metrics dashboard by @mariomac in #1109
Automatic update of offsets.json by @github-actions in #1163
Propagate context through TCP packets by @grcevski in #1161
Allow filtering by client/server in application traces by @mariomac in #1166
Fixing Docker Generator build action by @mariomac in #1164
feat(helm): additional labels for ServiceMonitor by @nlamirault in #1167
Revert OTel expiration code by @grcevski in #1143
Fix bounds check in kafka parsing by @grcevski in #1171
Enforce clang-format for C source files by @rafaelroquetto in #1177
Fix clang-format-check workflow file by @rafaelroquetto in #1179
Support for RHEL 4.18 kernels by @rafaelroquetto in #1175
Add two ports to service, daemonset and servicemonitor conditionally by @marevers in #1168
Split eBPF load and attach for Go programs by @grcevski in #1169
Add some default settings for beyla application metrics by @xujiaxj in #1184
Use git-lfs to track .o files by @rafaelroquetto in #1183
Use clang-tidy on ebpf code by @rafaelroquetto in #1180
Automatic update of offsets.json by @github-actions in #1191
Add clang-tidy make target by @rafaelroquetto in #1189
Add quickstart build instructions to the README file by @rafaelroquetto in #1188
Move bin files back to git lfs by @rafaelroquetto in #1193
Introduce option for high volume request tracking by @rafaelroquetto in #1196
Add workflow for checking git-lfs files by @rafaelroquetto in #1194
Use struct with pid and Go routine addr for Go BPF maps by @marctc in #1182
Fix linting/compilation on Darwin environments by @mariomac in #1199
Add metrics to measure latency of k8s informer by @marctc in #1200
Extract ReplicaSet name from pod name by @mariomac in #1202
Try to fix unmounting of BPF FS during integration tests by @mariomac in #1205
Remove ReplicaSet informer by @mariomac in #1204
Use struct with pid and Go routine addr for Go BPF maps by @marctc in #1201
Discover service names from process env vars by @grcevski in #1195
Add option to skip ConfigMap check by @marevers in #1208
Use only the required informers by @mariomac in #1210
Allow configuring informer resync time by @mariomac in #1216
Automatic update of offsets.json by @github-actions in #1220
update helm chart to use Beyla 1.8.4 by @mariomac in #1223
Account for deleted files in workflow files by @rafaelroquetto in #1218
Always decorate k8s_owner_name by @mariomac in #1226
Make EBPF tracer config visible by @mariomac in #1222
Move already instrumented executable log messages to Debug level by @mariomac in #1227
Automatic update of offsets.json by @github-actions in #1230
Revert "Add metrics to measure latency of k8s informer (#1200)" by @marctc in #1214
Revert "Add some default settings for beyla application metrics (#1184)" by @mariomac in #1231
update helm chart version before re-releasing by @mariomac in #1233
Don't wait for BPF unmount more than 5 seconds by @grcevski in #1238
Rework TC context propagation to use the IP options by @grcevski in #1237
Fix traces sampler by @grcevski in #1240
Revert: reducing scope of informer by @mariomac in #1245
Unify HTTP SSL, K probes and NodeJS tracer in a single tracer by @marctc in #1215
fix flaky K8s network integration test by @mariomac in #1250
Fix edge condition with kafka request parsing by @grcevski in #1252
Share bpf maps internally and remove pinning / bpffs requirement by @rafaelroquetto in #1251
Automatic update of offsets.json by @github-actions in #1257
Better Java context propagation by @grcevski in #1260
Update vendored dependencies & fix Darwin compilation by @mariomac in #1262
Use informer code from beyla-k8s-cache by @mariomac in #1256
Restoring disabled informers in beyla-k8s-cache by @mariomac in #1264
parse sql host address and port by @esara in #1255
Automatic update of offsets.json by @github-actions in #1268
Docs information architecture refactor pass 1 by @grafsean in #1259
Split tc programs from generic tracer by @rafaelroquetto in #1267
Enabling external informer by @mariomac in #1266
K8s integration tests: export logs before killing beyla by @mariomac in #1274
flaky language detection test: add extra logs by @mariomac in #1275
Replace drone by github actions for image publishing by @mariomac in #1271
Fix wrong language detection by @mariomac in #1276
Moving here the code from beyla-k8s-meta repository by @mariomac in #1278
Helm chart: enable profile_port by @mariomac in #1272
Fix network flows flaky test by @mariomac in #1282
K8s cache: fix coverage report and add graceful stop by @mariomac in #1281
Move K8s cache Docker publish actions here by @mariomac in #1284
Remote K8s meta service: wait for synchronization at startup by @mariomac in #1283
Check third-party licenses on PR by @mariomac in #1285
K8s env vars by @grcevski in #1279
Unblock remote cache synchronization by @mariomac in #1289
Typo in docs by @duncan485 in #1288
VM tests: explicit install bash by @rafaelroquetto in #1293
Fix the return value of bpf_strstr_tp_loop when it does not meet the … by @tsint in #1294
Try to fix build-push-to-dockerhub by @mariomac in #1292
Fix repository in docker push action by @mariomac in #1297
separate image builders by architecture by @mariomac in #1298
GitHub Actions: revert separate docker publish builders by @mariomac in #1299
GitHub Action, docker build: replace arm64 runner by amd64 runner by @mariomac in #1300
regenerate BPF binaries after PR #1294 by @mariomac in #1295
Fix docker image generation with LFS by @mariomac in #1301
Fix codecov flags of VM integration tests by @mariomac in #1303
Tune up versioned docker release scripts by @mariomac in #1302
Add configuration options to Kube Cache service by @mariomac in #1304
add namespace to server and peer name across namespaces by @esara in #1247
Complete the work on TCP packet context propagation by @grcevski in #1290
Nuke nodejs uretprobes by @rafaelroquetto in #1305
Change kafka to use Statement instead of Othernamespace by @grcevski in #1306
Use topic provided in the key first by @grcevski in #1307
(Experimental) Trace context propagation via HTTP headers by @rafaelroquetto in #1291
Add K8s metadata cache service to helm chart by @mariomac in #1296
Update opentelemetry collector library to 0.112.0 by @mariomac in #1310
Ensure http clients can nest under SQL for Go by @grcevski in #1308
Fix missing store cleanup on podsByContainer by @grcevski in #1312
Helm chart: remove duplicity of labels by @mariomac in #1313
Add Formatting to Variable Name by @SeamusGrafana in #1316
Helm chart: fix cache port configuration by @mariomac in #1319
Add missing mutex in kube store functions by @marctc in #1320
Fix potential deadlock in store.go by @mariomac in #1321
Refactor TC code for code reuse by @grcevski in #1314
Refactoring of the L7 CP BPF code and some other tweaks by @grcevski in #1315
K8s store: fix access to mutex to avoid concurrent map read/write by @mariomac in #1328
Update github workflows to use upload-artifact@v4 by @marctc in #1329
Automatic update of offsets.json by @github-actions in #1334
Informers cache: don't send updates for non-meaningful Pod/Service updates by @mariomac in #1330
Helm chart: allow setting limits to beyla cache by @mariomac in #1335
Add default excluded services to our Beyla Helm chart by @grcevski in #1332
Fix crash on start by @mariomac in #1337
Automatic update of offsets.json by @github-actions in #1340
Reverting kubernetes version library by @mariomac in #1341
Fix memory leak on kubernetes metadata store by @mariomac in #1342
Improve k8s env parsing by @grcevski in #1339
Fix imagepullpolicy in helm chart for k8s cache by @mariomac in #1343
Fix crash on k8s container env parsing by @mariomac in #1344
Automatic update of offsets.json by @github-actions in #1345
Stop flooding cache logs on client disconnection/context cancelation by @mariomac in #1347
Add timeout to grpc server.Send by @mariomac in #1350
K8s cache: Improving performance of client cancellation by @mariomac in #1353
make sure cache service connection stops on error by @mariomac in #1355
Asynchronous synchronization of Beyla cache by @mariomac in #1358
Enable batching for traces by @marctc in #1352
Kube meta store: Make sure all the object metadata is deleted by @mariomac in #1359
Some informer optimizations by @mariomac in #1360
Revert "K8s cache: Improving performance of client cancellation (#1353)" by @mariomac in #1362
internal instrumentation for k8s cache by @mariomac in #1365
Use scratch for the rest of Beyla images by @marctc in #1373
Ensure ring buf maps have sane max entries values by @rafaelroquetto in #1374
Fix helm chart selector conflicts and update Beyla version to 1.8.8 by @mariomac in #1376
Fix flaky unit test by @mariomac in #1378
Helm chart: removed unneeded exposed port from Cache Service by @mariomac in #1380
Rename Beyla cache internal metrics to use attributes by @mariomac in #1382
Fix rounding function for max entries by @rafaelroquetto in #1381
Fix environment variable name for configuring otel traces features by @bjor-joh in #1387
Set instance ID from pod:container and let setting metadata from annotations by @mariomac in #1391
Rename svc.ID to svc.Attrs by @mariomac in #1393

New Contributors

@duncan485 made their first contribution in #1288
@tsint made their first contribution in #1294
@bjor-joh made their first contribution in #1387

Full Changelog: v1.8.8...v1.9.0