github tphakala/birdnet-go nightly-20251012
Nightly Build 20251012

pre-release18 hours ago

🚀 Major Features & Enhancements

RTSP Stream Health Monitoring with Real-Time SSE (#1382)

  • Feature: Comprehensive API endpoints for monitoring RTSP stream health status
  • REST Endpoints (authenticated):
    • GET /api/v2/streams/health - Detailed health status of all streams
    • GET /api/v2/streams/health/:url - Stream-specific health status
    • GET /api/v2/streams/status - High-level summary with healthy/unhealthy counts
  • SSE Streaming (authenticated):
    • GET /api/v2/streams/health/stream - Real-time push updates via Server-Sent Events
    • 10 distinct event types (stream_added, state_change, health_recovered, error_detected, etc.)
    • Rate limit: 5 connections/minute per IP
    • Intelligent change detection with 1-second polling
    • 30-second heartbeat, 30-minute max duration per connection
  • Data Provided:
    • Process state tracking (idle, starting, running, restarting, backoff, circuit_open, stopped)
    • Error diagnostics with troubleshooting steps
    • Error history (last 10 errors per stream)
    • State transition history
    • Data flow metrics (bytes received, bytes/second)
    • Restart counts and timestamps
  • Security: URL credentials automatically sanitized, authentication required
  • Impact: Real-time visibility into stream health for monitoring dashboards and integrations

FFmpeg Stream Reliability Improvements (PR Series #1372, #1374, #1375, #1376, #1377)

A comprehensive series of improvements to address RTSP stream stability issues (#1264):

Process State Machine (#1372)

  • Problem: Difficult to debug RTSP stream issues - unclear what state FFmpeg processes are in
  • Solution: Implemented explicit state machine with 7 distinct states:
    • StateIdle - Stream created but not started
    • StateStarting - FFmpeg process starting
    • StateRunning - Process running and processing audio
    • StateRestarting - Restart requested
    • StateBackoff - Exponential backoff before retry
    • StateCircuitOpen - Circuit breaker cooldown
    • StateStopped - Permanently stopped
  • Features:
    • Thread-safe state transitions with logging
    • State transition history (bounded to 100 entries)
    • State visible in logs and health checks
    • Simplified IsRestarting() logic (20 lines → 12 lines)
  • Impact: Better debugging and monitoring capabilities, clearer visibility into stream lifecycle

RTSP Transport Change Detection (#1374)

  • Problem: Changing RTSP transport (TCP ↔ UDP) in config didn't affect running streams
  • Solution: Automatic detection and restart of streams when transport changes
  • Features:
    • Compares stream transport against configuration
    • Automatic stop/restart with new transport
    • Clean state transitions via state machine
    • Clear logging: "🔄 Transport changed for rtsp://... tcp → udp (restarting)"
  • Impact: Configuration changes take effect immediately without manual intervention

lastDataTime Reset and Zero Time Handling (#1375)

  • Problem: Confusing "inactive for 0 seconds" log messages for streams that never received data
  • Root Cause: lastDataTime not properly reset when starting new FFmpeg process
  • Solution:
    • Explicit reset of lastDataTime to zero time in processAudio()
    • Reset totalBytesReceived for clean state
    • Enhanced zero time logging: "never received data" instead of "0 seconds ago"
    • Improved silence detection logging with process runtime context
  • Testing: Comprehensive test coverage with race detector
  • Impact: Clear distinction between "never" and "X seconds ago" in logs

Stream Watchdog for Automatic Recovery (#1376)

  • Problem: Streams could get stuck in unhealthy states indefinitely
  • Solution: Automatic watchdog that detects and force-resets stuck streams
  • Configuration:
    • 15-minute threshold for force reset
    • 5-minute check interval
    • 30-second stop/start delay
    • Cooldown period to prevent thrashing
  • Features:
    • Defensive stream removal if stop fails
    • Full reset cycle: Stop → force-remove if needed → cleanup delay → StartStream
    • Clear logging with 🚨 emoji for easy identification
    • Complements existing health checks (handles long-term stuck states)
  • Impact: Automatic recovery from prolonged unhealthy states without manual intervention

Better Context Cancellation Diagnostics (#1377)

  • Enhancement: Using Go 1.20+ context.WithCancelCause for better diagnostics
  • Features:
    • Meaningful cancellation causes logged
    • Specific error messages about why contexts were cancelled
    • Improved troubleshooting for unexpected shutdowns
    • Clean code with reduced cyclomatic complexity
  • Impact: Easier debugging in production environments

🐛 Critical Bug Fixes & Performance Improvements

Notification URL Generation for Reverse Proxies (#1366)

  • Problem: Users behind reverse proxies (nginx, Cloudflare Tunnel) got notification URLs with localhost:8080 instead of their public hostname
  • Root Cause: security.host config was empty, BuildBaseURL defaulted to localhost
  • Solution: Hybrid host resolution strategy with priority chain:
    1. security.host config (explicit user setting)
    2. BIRDNET_HOST environment variable (Docker/container-friendly) ⭐ NEW
    3. localhost fallback (with warning log)
  • Configuration Options:
    • Config file: security.host: "birdnet.home.arpa"
    • Environment variable: BIRDNET_HOST=birdnet.home.arpa
    • Docker: -e BIRDNET_HOST=birdnet.home.arpa
  • Testing: 40+ test cases covering real-world scenarios, edge cases, and priority resolution
  • Impact: Correct notification URLs for all deployment scenarios

Analytics Unique Species Count Capped at 100 (#1367)

  • Problem: Analytics Overview showed maximum of 100 unique species, causing user confusion when Species page showed actual count (e.g., 163)
  • Root Cause: Hardcoded limit: '100' parameter in fetchSummaryData() API request
  • Solution: Removed hardcoded limit, let backend handle full dataset
  • Why Safe:
    • Backend has 30-second query timeout
    • Small data volume (163 species ≈ 8 KB JSON)
    • Frontend only displays count, not rendering all items
    • Species page already fetches all without issues
  • Impact: Accurate unique species count across all pages

Spectrogram Generation Nil Pointer Panic (#1360)

  • Problem: Nil pointer dereference panic in spectrogram generation when logger was nil
  • Root Cause: spectrogramLogger could be nil in edge cases during initialization or when file logger creation failed
  • Solution:
    • Added safe logger accessor: getSpectrogramLogger() with fallback to slog.Default()
    • Enhanced init() fallback chain: file logger → default logger → stdout logger
    • Added nil checks throughout logger calls
  • Impact: Reliable spectrogram generation without crashes

Zombie FFmpeg/SoX Processes on Raspberry Pi (#1368)

  • Problem: Zombie processes accumulated on Raspberry Pi during spectrogram generation, causing failures after first 5 recordings
  • Root Causes:
    • Missing Wait() after Kill() - killed SoX but didn't wait for exit
    • No timeout on Wait() - could block indefinitely on hung processes
    • Incomplete cleanup on error paths
  • Solution:
    • New helper functions: waitWithTimeout() and waitWithTimeoutErr() with 5-second timeout
    • Deferred cleanup ensures Wait() is ALWAYS called after Kill()
    • Enhanced logging with process PID and lifecycle tracking
    • Go 1.23+ optimizations with CommandContext
  • Impact: Unlimited spectrogram generations without zombie process accumulation on resource-constrained devices

Range Filter Daily Update Race Condition (#1369)

  • Problem: Species detected outside configured range filter list
  • Root Cause: Multiple concurrent goroutines could trigger daily range filter updates simultaneously, causing species list to flip-flop
  • Solution:
    • Added atomic ShouldUpdateRangeFilterToday() function with mutex
    • Only first goroutine on any given day returns true
    • Immediately updates LastUpdated to prevent duplicate updates
    • Enhanced logging and error handling
  • Testing: 100 concurrent goroutines verified with race detector
  • Impact: Consistent species filtering with exactly one range filter update per day

📚 Documentation & Developer Experience

  • Notification URL Configuration Guide (#1366): Setup methods, reverse-proxy guidance, examples, and troubleshooting
  • Stream Health API Documentation (#1382): Comprehensive guides with example requests/responses, error context details, state semantics, and integration tips

🎯 Developer Notes

This release focuses on RTSP stream reliability and monitoring capabilities. The FFmpeg improvements series (#1372, #1374, #1375, #1376, #1377) provides a solid foundation for diagnosing and automatically recovering from stream issues that users have reported in #1264. The new health monitoring API (#1382) enables real-time monitoring dashboards and integrations.

Key architectural improvements include:

  • Explicit state machine tracking for clear visibility into stream lifecycle
  • Automatic watchdog recovery for long-term stuck states
  • Better context cancellation diagnostics for production debugging
  • Real-time health monitoring via SSE for external integrations

These improvements make the system more observable, reliable, and easier to troubleshoot in production environments.


Full Changelog: nightly-20251008...nightly-20251012

Contributors: Special thanks to all contributors who helped identify, test, and resolve these issues.

Don't miss a new birdnet-go release

NewReleases is sending notifications on new releases.