🚀 Major Features & Enhancements
RTSP Stream Health Monitoring with Real-Time SSE (#1382)
- Feature: Comprehensive API endpoints for monitoring RTSP stream health status
- REST Endpoints (authenticated):
GET /api/v2/streams/health
- Detailed health status of all streamsGET /api/v2/streams/health/:url
- Stream-specific health statusGET /api/v2/streams/status
- High-level summary with healthy/unhealthy counts
- SSE Streaming (authenticated):
GET /api/v2/streams/health/stream
- Real-time push updates via Server-Sent Events- 10 distinct event types (stream_added, state_change, health_recovered, error_detected, etc.)
- Rate limit: 5 connections/minute per IP
- Intelligent change detection with 1-second polling
- 30-second heartbeat, 30-minute max duration per connection
- Data Provided:
- Process state tracking (idle, starting, running, restarting, backoff, circuit_open, stopped)
- Error diagnostics with troubleshooting steps
- Error history (last 10 errors per stream)
- State transition history
- Data flow metrics (bytes received, bytes/second)
- Restart counts and timestamps
- Security: URL credentials automatically sanitized, authentication required
- Impact: Real-time visibility into stream health for monitoring dashboards and integrations
FFmpeg Stream Reliability Improvements (PR Series #1372, #1374, #1375, #1376, #1377)
A comprehensive series of improvements to address RTSP stream stability issues (#1264):
Process State Machine (#1372)
- Problem: Difficult to debug RTSP stream issues - unclear what state FFmpeg processes are in
- Solution: Implemented explicit state machine with 7 distinct states:
- StateIdle - Stream created but not started
- StateStarting - FFmpeg process starting
- StateRunning - Process running and processing audio
- StateRestarting - Restart requested
- StateBackoff - Exponential backoff before retry
- StateCircuitOpen - Circuit breaker cooldown
- StateStopped - Permanently stopped
- Features:
- Thread-safe state transitions with logging
- State transition history (bounded to 100 entries)
- State visible in logs and health checks
- Simplified
IsRestarting()
logic (20 lines → 12 lines)
- Impact: Better debugging and monitoring capabilities, clearer visibility into stream lifecycle
RTSP Transport Change Detection (#1374)
- Problem: Changing RTSP transport (TCP ↔ UDP) in config didn't affect running streams
- Solution: Automatic detection and restart of streams when transport changes
- Features:
- Compares stream transport against configuration
- Automatic stop/restart with new transport
- Clean state transitions via state machine
- Clear logging: "🔄 Transport changed for rtsp://... tcp → udp (restarting)"
- Impact: Configuration changes take effect immediately without manual intervention
lastDataTime Reset and Zero Time Handling (#1375)
- Problem: Confusing "inactive for 0 seconds" log messages for streams that never received data
- Root Cause:
lastDataTime
not properly reset when starting new FFmpeg process - Solution:
- Explicit reset of
lastDataTime
to zero time inprocessAudio()
- Reset
totalBytesReceived
for clean state - Enhanced zero time logging: "never received data" instead of "0 seconds ago"
- Improved silence detection logging with process runtime context
- Explicit reset of
- Testing: Comprehensive test coverage with race detector
- Impact: Clear distinction between "never" and "X seconds ago" in logs
Stream Watchdog for Automatic Recovery (#1376)
- Problem: Streams could get stuck in unhealthy states indefinitely
- Solution: Automatic watchdog that detects and force-resets stuck streams
- Configuration:
- 15-minute threshold for force reset
- 5-minute check interval
- 30-second stop/start delay
- Cooldown period to prevent thrashing
- Features:
- Defensive stream removal if stop fails
- Full reset cycle: Stop → force-remove if needed → cleanup delay → StartStream
- Clear logging with 🚨 emoji for easy identification
- Complements existing health checks (handles long-term stuck states)
- Impact: Automatic recovery from prolonged unhealthy states without manual intervention
Better Context Cancellation Diagnostics (#1377)
- Enhancement: Using Go 1.20+
context.WithCancelCause
for better diagnostics - Features:
- Meaningful cancellation causes logged
- Specific error messages about why contexts were cancelled
- Improved troubleshooting for unexpected shutdowns
- Clean code with reduced cyclomatic complexity
- Impact: Easier debugging in production environments
🐛 Critical Bug Fixes & Performance Improvements
Notification URL Generation for Reverse Proxies (#1366)
- Problem: Users behind reverse proxies (nginx, Cloudflare Tunnel) got notification URLs with
localhost:8080
instead of their public hostname - Root Cause:
security.host
config was empty,BuildBaseURL
defaulted to localhost - Solution: Hybrid host resolution strategy with priority chain:
security.host
config (explicit user setting)BIRDNET_HOST
environment variable (Docker/container-friendly) ⭐ NEWlocalhost
fallback (with warning log)
- Configuration Options:
- Config file:
security.host: "birdnet.home.arpa"
- Environment variable:
BIRDNET_HOST=birdnet.home.arpa
- Docker:
-e BIRDNET_HOST=birdnet.home.arpa
- Config file:
- Testing: 40+ test cases covering real-world scenarios, edge cases, and priority resolution
- Impact: Correct notification URLs for all deployment scenarios
Analytics Unique Species Count Capped at 100 (#1367)
- Problem: Analytics Overview showed maximum of 100 unique species, causing user confusion when Species page showed actual count (e.g., 163)
- Root Cause: Hardcoded
limit: '100'
parameter infetchSummaryData()
API request - Solution: Removed hardcoded limit, let backend handle full dataset
- Why Safe:
- Backend has 30-second query timeout
- Small data volume (163 species ≈ 8 KB JSON)
- Frontend only displays count, not rendering all items
- Species page already fetches all without issues
- Impact: Accurate unique species count across all pages
Spectrogram Generation Nil Pointer Panic (#1360)
- Problem: Nil pointer dereference panic in spectrogram generation when logger was nil
- Root Cause:
spectrogramLogger
could be nil in edge cases during initialization or when file logger creation failed - Solution:
- Added safe logger accessor:
getSpectrogramLogger()
with fallback toslog.Default()
- Enhanced
init()
fallback chain: file logger → default logger → stdout logger - Added nil checks throughout logger calls
- Added safe logger accessor:
- Impact: Reliable spectrogram generation without crashes
Zombie FFmpeg/SoX Processes on Raspberry Pi (#1368)
- Problem: Zombie processes accumulated on Raspberry Pi during spectrogram generation, causing failures after first 5 recordings
- Root Causes:
- Missing
Wait()
afterKill()
- killed SoX but didn't wait for exit - No timeout on
Wait()
- could block indefinitely on hung processes - Incomplete cleanup on error paths
- Missing
- Solution:
- New helper functions:
waitWithTimeout()
andwaitWithTimeoutErr()
with 5-second timeout - Deferred cleanup ensures
Wait()
is ALWAYS called afterKill()
- Enhanced logging with process PID and lifecycle tracking
- Go 1.23+ optimizations with
CommandContext
- New helper functions:
- Impact: Unlimited spectrogram generations without zombie process accumulation on resource-constrained devices
Range Filter Daily Update Race Condition (#1369)
- Problem: Species detected outside configured range filter list
- Root Cause: Multiple concurrent goroutines could trigger daily range filter updates simultaneously, causing species list to flip-flop
- Solution:
- Added atomic
ShouldUpdateRangeFilterToday()
function with mutex - Only first goroutine on any given day returns true
- Immediately updates LastUpdated to prevent duplicate updates
- Enhanced logging and error handling
- Added atomic
- Testing: 100 concurrent goroutines verified with race detector
- Impact: Consistent species filtering with exactly one range filter update per day
📚 Documentation & Developer Experience
- Notification URL Configuration Guide (#1366): Setup methods, reverse-proxy guidance, examples, and troubleshooting
- Stream Health API Documentation (#1382): Comprehensive guides with example requests/responses, error context details, state semantics, and integration tips
🎯 Developer Notes
This release focuses on RTSP stream reliability and monitoring capabilities. The FFmpeg improvements series (#1372, #1374, #1375, #1376, #1377) provides a solid foundation for diagnosing and automatically recovering from stream issues that users have reported in #1264. The new health monitoring API (#1382) enables real-time monitoring dashboards and integrations.
Key architectural improvements include:
- Explicit state machine tracking for clear visibility into stream lifecycle
- Automatic watchdog recovery for long-term stuck states
- Better context cancellation diagnostics for production debugging
- Real-time health monitoring via SSE for external integrations
These improvements make the system more observable, reliable, and easier to troubleshoot in production environments.
Full Changelog: nightly-20251008...nightly-20251012
Contributors: Special thanks to all contributors who helped identify, test, and resolve these issues.