tphakala/birdnet-go nightly-20251012 on GitHub

🚀 Major Features & Enhancements

RTSP Stream Health Monitoring with Real-Time SSE (#1382)

Feature: Comprehensive API endpoints for monitoring RTSP stream health status
REST Endpoints (authenticated):
- GET /api/v2/streams/health - Detailed health status of all streams
- GET /api/v2/streams/health/:url - Stream-specific health status
- GET /api/v2/streams/status - High-level summary with healthy/unhealthy counts
SSE Streaming (authenticated):
- GET /api/v2/streams/health/stream - Real-time push updates via Server-Sent Events
- 10 distinct event types (stream_added, state_change, health_recovered, error_detected, etc.)
- Rate limit: 5 connections/minute per IP
- Intelligent change detection with 1-second polling
- 30-second heartbeat, 30-minute max duration per connection
Data Provided:
- Process state tracking (idle, starting, running, restarting, backoff, circuit_open, stopped)
- Error diagnostics with troubleshooting steps
- Error history (last 10 errors per stream)
- State transition history
- Data flow metrics (bytes received, bytes/second)
- Restart counts and timestamps
Security: URL credentials automatically sanitized, authentication required
Impact: Real-time visibility into stream health for monitoring dashboards and integrations

FFmpeg Stream Reliability Improvements (PR Series #1372, #1374, #1375, #1376, #1377)

A comprehensive series of improvements to address RTSP stream stability issues (#1264):

Process State Machine (#1372)

Problem: Difficult to debug RTSP stream issues - unclear what state FFmpeg processes are in
Solution: Implemented explicit state machine with 7 distinct states:
- StateIdle - Stream created but not started
- StateStarting - FFmpeg process starting
- StateRunning - Process running and processing audio
- StateRestarting - Restart requested
- StateBackoff - Exponential backoff before retry
- StateCircuitOpen - Circuit breaker cooldown
- StateStopped - Permanently stopped
Features:
- Thread-safe state transitions with logging
- State transition history (bounded to 100 entries)
- State visible in logs and health checks
- Simplified IsRestarting() logic (20 lines → 12 lines)
Impact: Better debugging and monitoring capabilities, clearer visibility into stream lifecycle

RTSP Transport Change Detection (#1374)

Problem: Changing RTSP transport (TCP ↔ UDP) in config didn't affect running streams
Solution: Automatic detection and restart of streams when transport changes
Features:
- Compares stream transport against configuration
- Automatic stop/restart with new transport
- Clean state transitions via state machine
- Clear logging: "🔄 Transport changed for rtsp://... tcp → udp (restarting)"
Impact: Configuration changes take effect immediately without manual intervention

lastDataTime Reset and Zero Time Handling (#1375)

Problem: Confusing "inactive for 0 seconds" log messages for streams that never received data
Root Cause: lastDataTime not properly reset when starting new FFmpeg process
Solution:
- Explicit reset of lastDataTime to zero time in processAudio()
- Reset totalBytesReceived for clean state
- Enhanced zero time logging: "never received data" instead of "0 seconds ago"
- Improved silence detection logging with process runtime context
Testing: Comprehensive test coverage with race detector
Impact: Clear distinction between "never" and "X seconds ago" in logs

Stream Watchdog for Automatic Recovery (#1376)

Problem: Streams could get stuck in unhealthy states indefinitely
Solution: Automatic watchdog that detects and force-resets stuck streams
Configuration:
- 15-minute threshold for force reset
- 5-minute check interval
- 30-second stop/start delay
- Cooldown period to prevent thrashing
Features:
- Defensive stream removal if stop fails
- Full reset cycle: Stop → force-remove if needed → cleanup delay → StartStream
- Clear logging with 🚨 emoji for easy identification
- Complements existing health checks (handles long-term stuck states)
Impact: Automatic recovery from prolonged unhealthy states without manual intervention

Better Context Cancellation Diagnostics (#1377)

Enhancement: Using Go 1.20+ context.WithCancelCause for better diagnostics
Features:
- Meaningful cancellation causes logged
- Specific error messages about why contexts were cancelled
- Improved troubleshooting for unexpected shutdowns
- Clean code with reduced cyclomatic complexity
Impact: Easier debugging in production environments

🐛 Critical Bug Fixes & Performance Improvements

Notification URL Generation for Reverse Proxies (#1366)

Problem: Users behind reverse proxies (nginx, Cloudflare Tunnel) got notification URLs with localhost:8080 instead of their public hostname
Root Cause: security.host config was empty, BuildBaseURL defaulted to localhost
Solution: Hybrid host resolution strategy with priority chain:
1. security.host config (explicit user setting)
2. BIRDNET_HOST environment variable (Docker/container-friendly) ⭐ NEW
3. localhost fallback (with warning log)
Configuration Options:
- Config file: security.host: "birdnet.home.arpa"
- Environment variable: BIRDNET_HOST=birdnet.home.arpa
- Docker: -e BIRDNET_HOST=birdnet.home.arpa
Testing: 40+ test cases covering real-world scenarios, edge cases, and priority resolution
Impact: Correct notification URLs for all deployment scenarios

Analytics Unique Species Count Capped at 100 (#1367)

Problem: Analytics Overview showed maximum of 100 unique species, causing user confusion when Species page showed actual count (e.g., 163)
Root Cause: Hardcoded limit: '100' parameter in fetchSummaryData() API request
Solution: Removed hardcoded limit, let backend handle full dataset
Why Safe:
- Backend has 30-second query timeout
- Small data volume (163 species ≈ 8 KB JSON)
- Frontend only displays count, not rendering all items
- Species page already fetches all without issues
Impact: Accurate unique species count across all pages

Spectrogram Generation Nil Pointer Panic (#1360)

Problem: Nil pointer dereference panic in spectrogram generation when logger was nil
Root Cause: spectrogramLogger could be nil in edge cases during initialization or when file logger creation failed
Solution:
- Added safe logger accessor: getSpectrogramLogger() with fallback to slog.Default()
- Enhanced init() fallback chain: file logger → default logger → stdout logger
- Added nil checks throughout logger calls
Impact: Reliable spectrogram generation without crashes

Zombie FFmpeg/SoX Processes on Raspberry Pi (#1368)

Problem: Zombie processes accumulated on Raspberry Pi during spectrogram generation, causing failures after first 5 recordings
Root Causes:
- Missing Wait() after Kill() - killed SoX but didn't wait for exit
- No timeout on Wait() - could block indefinitely on hung processes
- Incomplete cleanup on error paths
Solution:
- New helper functions: waitWithTimeout() and waitWithTimeoutErr() with 5-second timeout
- Deferred cleanup ensures Wait() is ALWAYS called after Kill()
- Enhanced logging with process PID and lifecycle tracking
- Go 1.23+ optimizations with CommandContext
Impact: Unlimited spectrogram generations without zombie process accumulation on resource-constrained devices

Range Filter Daily Update Race Condition (#1369)

Problem: Species detected outside configured range filter list
Root Cause: Multiple concurrent goroutines could trigger daily range filter updates simultaneously, causing species list to flip-flop
Solution:
- Added atomic ShouldUpdateRangeFilterToday() function with mutex
- Only first goroutine on any given day returns true
- Immediately updates LastUpdated to prevent duplicate updates
- Enhanced logging and error handling
Testing: 100 concurrent goroutines verified with race detector
Impact: Consistent species filtering with exactly one range filter update per day

📚 Documentation & Developer Experience

Notification URL Configuration Guide (#1366): Setup methods, reverse-proxy guidance, examples, and troubleshooting
Stream Health API Documentation (#1382): Comprehensive guides with example requests/responses, error context details, state semantics, and integration tips

🎯 Developer Notes

This release focuses on RTSP stream reliability and monitoring capabilities. The FFmpeg improvements series (#1372, #1374, #1375, #1376, #1377) provides a solid foundation for diagnosing and automatically recovering from stream issues that users have reported in #1264. The new health monitoring API (#1382) enables real-time monitoring dashboards and integrations.

Key architectural improvements include:

Explicit state machine tracking for clear visibility into stream lifecycle
Automatic watchdog recovery for long-term stuck states
Better context cancellation diagnostics for production debugging
Real-time health monitoring via SSE for external integrations

These improvements make the system more observable, reliable, and easier to troubleshoot in production environments.

Full Changelog: nightly-20251008...nightly-20251012

Contributors: Special thanks to all contributors who helped identify, test, and resolve these issues.

tphakala/birdnet-go nightly-20251012 Nightly Build 20251012 on GitHub