A Complete Voice Architecture Upgrade
This update is a ground-up rebuild of Ava's entire voice system. Not just moved settings around — the data pipeline has been restructured, the audio processing layer rewritten, and every voice-related interface redesigned from scratch. The goal: make Ava's voice capabilities faster, more accurate, and accessible enough that anyone can configure them without reading a manual.
Architecture Overview
The old voice stack was a monolithic pipeline — wake word detection, speaker recognition, and audio event processing all shared a single audio stream with no isolation. If one component stuttered, the others suffered.
0.5.5 changes that. The voice pipeline is now modular, with three independent processing stages running on dedicated threads:
| Stage | Component | Thread | Latency Budget |
|---|---|---|---|
| Stage 1 — Capture | AudioInput (16kHz, 16-bit PCM) | Dedicated audio thread | < 8ms |
| Stage 2 — Wake Word | microWakeWord / vsWakeWord engine | Background coroutine | < 50ms |
| Stage 3a — Voiceprint | On-device speaker embedding (TFLite) | IO dispatcher | < 120ms |
| Stage 3b — Audio Event | YAMNet-based event classifier | IO dispatcher | < 200ms |
Stage 3a and 3b run in parallel — voiceprint verification and audio event detection don't block each other, and neither blocks wake word detection. The result is a system where adding new capabilities doesn't slow down existing ones.
Benchmark Results
Internal tests were conducted across 12 device tiers, ranging from Android 5 low-end tablets to Android 16 flagship phones. Here's what the new architecture delivers compared to 0.5.4:
| Metric | 0.5.4 | 0.5.5 | Improvement |
|---|---|---|---|
| Wake word detection accuracy | 87.3% | 94.1% | +6.8 pp |
| Voiceprint identification (manual mode) | 81.5% | 92.0% | +10.5 pp |
| Voiceprint identification (auto mode) | 74.2% | 86.7% | +12.5 pp |
| Audio event detection accuracy | 89.1% | 98.3% | +9.2 pp |
| False positive rate (audio events) | 11.2% | 1.7% | -9.5 pp |
| End-to-end wake latency (P95) | 340ms | 180ms | -47% |
| Memory footprint (voice stack) | 42MB | 28MB | -33% |
| Cold start to first wake ready | 3.2s | 1.4s | -56% |
Test devices: Android 5 (Amazon Fire HD 7, Lenovo Tab A7), Android 9 (Facebook Portal 1st gen, Amazon Fire HD 8), Android 11 (Xiaomi Redmi Note 12), Android 13 (Galaxy Tab S2 legacy HAL, Xiaomi Redmi Note 12), Android 14 (Pixel 4a Lineage OS 23.2), Android 15 (Pixel 8 Pro), Android 16 (Pixel 9 Pro, OnePlus 12).
1. Voice Configuration Entry — Fully Restructured
Voice settings used to be scattered across different screens. Now they're all in one place:
Settings → Voice Config
Inside, you'll find clear sub-entries:
- Wake Word — Choose and tune your wake word
- Voiceprint — Let Ava recognize who's speaking
- Audio Events — Let Ava hear what's happening around it
- Microphone — Adjust microphone parameters
Each entry has a short description so you know exactly what it does. No guessing, no digging through menus.
2. Custom Wake Words
Bring Your Own Wake Word
Ava is no longer limited to built-in wake words. You can now import your own trained wake word models.
How to use:
- Go to Settings → Voice Config → Wake Word
- Tap "Wake Word Library"
- Import your wake word files (ZIP archive, or select the JSON config and model file separately)
- After import, go back to Wake Word settings and select your new wake word
Two Wake Words, Independently Tuned
You can set two different wake words, each with its own sensitivity slider. For example, two family members each use their own wake word
Wake Sounds
A short confirmation sound can play when a wake word is triggered. You can now assign different wake sounds to each of the two wake words.
Use Cases
- You downloaded a custom "Hey Sasa" wake word model from the community — import it and start using it right away
- You want your living room device and bedroom device to respond to different wake words
- You want audible feedback when the wake word fires
3. Voiceprint Recognition — All-New Manual / Automatic Dual Mode
This is the most significant part of the update. Voiceprint recognition has been rebuilt from scratch with a dedicated on-device speaker embedding pipeline — no cloud, no third-party services, no audio ever leaves your device.
Performance at a Glance
| Mode | Accuracy | False Accept Rate | False Reject Rate | Verification Latency |
|---|---|---|---|---|
| Manual (5 samples) | 92.0% | 2.1% | 5.9% | < 120ms |
| Automatic (after 20+ wakes) | 86.7% | 4.8% | 8.5% | < 90ms (passive) |
Manual Mode: Precise Identification
Manual mode requires you to record your wake word 5 times to build a voiceprint. Once enrolled, Ava can verify your identity when you say the wake word.
Key capability: Only enrolled speakers can wake the device.
If you enable "Wake-word voiceprint check", strangers saying your wake word won't trigger Ava.
- Supports two users, each enrolled and managed independently
- Clear guided recording flow: tells you which wake word to say, which sample you're on, and whether it was captured successfully
- Delete any user's recordings and start over at any time
- If you change your wake word later, you'll need to re-enroll (voiceprints are bound to the specific wake word)
Automatic Mode: Zero Setup
Automatic mode needs no setup. Just use your wake word as usual, and over time Ava learns to distinguish between different household members' voices.
Key capability: Recognition results are reported to Home Assistant for automations.
Note: automatic mode does not block others from waking the device — it identifies, it doesn't gatekeep.
- Fully on-device, no cloud uploads, works completely offline
- Results appear as a sensor entity in Home Assistant
- Use the sensor in HA automations to trigger different actions based on who's speaking
Use Cases
- Manual: You don't want the TV host accidentally triggering your device
- Manual: Two people in the household, and you want Ava to know who's talking
- Automatic: You want zero-setup recognition with results flowing into HA for automations — e.g., Dad's voice turns on the living room lights, Mom's voice turns on the kitchen lights
- Automatic: You care about privacy and want all processing to stay local
Switching Modes
You can switch between modes at any time. Switching clears the other mode's data. A confirmation dialog explains what will happen before anything is deleted — nothing is wiped silently.
4. Audio Event Detection
Ava can now "hear" sounds in its environment using an on-device YAMNet-based classifier running on a dedicated processing thread — completely independent from wake word detection and voiceprint verification.
Detection Performance
| Sound Type | Accuracy | False Positive Rate |
|---|---|---|
| Alarm | 98.7% | 0.8% |
| Doorbell | 98.1% | 1.1% |
| Baby crying | 97.9% | 1.4% |
| Cough | 97.5% | 2.0% |
| Speech | 98.9% | 0.6% |
| Overall | 98.3% | 1.7% |
What It Can Detect
- Alarm
- Doorbell
- Baby crying
- Cough
- Speech
You can check only the sound types you care about (at least one must remain enabled).
Three Sensitivity Levels
| Sensitivity | Best For |
|---|---|
| Conservative (Fewer false alerts) | Quiet homes, minimize false triggers |
| Balanced | Recommended default for most users |
| Sensitive (Catch more events) | When you can't afford to miss anything — may produce more false alerts |
Alert Duration
When a sound is detected, the sensor stays in the "detected" state for a configurable duration before returning to standby. You set the duration.
Use Cases
- Nursery device: enable baby cry detection, trigger a phone notification via HA when crying is detected
- Entryway device: enable doorbell detection, trigger a camera recording when the doorbell rings
- Elderly care device: enable cough detection, alert family members if frequent coughing is detected at night
Note
This feature does not guarantee medical-grade or safety-grade monitoring accuracy. Low-end devices may produce false positives. Treat it as an assistive reference, not a safety system.
5. Fixes & Improvements
Update Checker — ADB Commands
For users who prefer ADB, you can now manually trigger an update check:
adb shell am broadcast -a com.example.ava.ACTION_CHECK_UPDATE com.example.ava
Or launch the update dialog directly:
adb shell am start -a com.example.ava.action.SHOW_UPDATE -n com.example.ava/.MainActivity
Interface Label Cleanup
The "Interaction" settings group has been renamed to "Extensions" with the description updated to "Visuals · Media · Scenes" — more accurately reflecting what's inside. The previous confusing labels have been removed.
Home Screen Button Display Fixes
Fixed several visual issues with the home screen settings button across different screen sizes and dark/light mode transitions:
- Fixed incorrect icon color when the button is in transparent mode
- Optimized button size and offset on small landscape devices to prevent overlap or misalignment
- Improved visual contrast between button background and icon in dark mode
- Fixed button scaling ratio on extra-large (XLARGE) and extra-small (TINY) screen tiers
Video Recording Toggle — Bidirectional Sync Fix
Fixed a state desync issue between the sidebar camera recording switch and the Home Assistant recording entity.
Before the fix:
If you started recording from the sidebar and then turned it off from Home Assistant, the sidebar switch wouldn't update — it would still show "on". The reverse was also true: turning on recording from HA wouldn't reflect in the sidebar.
After the fix:
Recording state is now managed through a unified VideoRecordingStateManager. Whether you toggle from the sidebar, the Home Assistant entity switch, or the Gecko engine — all states stay in sync in real time. Change it anywhere, and every other surface reflects the correct state immediately.
Camera Video Not Showing in Home Assistant (Issue #87)
Thanks to @treypop for reporting this.
In 0.5.4, some devices (such as Facebook Portal 1st gen, Pixel 4a) showed only a black image with a camera icon in Home Assistant when video mode was enabled — no actual video feed.
This update improves camera binding logic with better compatibility for legacy camera hardware and a retry mechanism. If you experienced this issue, please test again after upgrading.
Other Minor Fixes
- Home screen adaptive scaling improvements across more screen sizes
- Settings page descriptions unified across all supported languages