knoop7/Ava 0.5.5 on GitHub

A Complete Voice Architecture Upgrade

This update is a ground-up rebuild of Ava's entire voice system. Not just moved settings around — the data pipeline has been restructured, the audio processing layer rewritten, and every voice-related interface redesigned from scratch. The goal: make Ava's voice capabilities faster, more accurate, and accessible enough that anyone can configure them without reading a manual.

Architecture Overview

The old voice stack was a monolithic pipeline — wake word detection, speaker recognition, and audio event processing all shared a single audio stream with no isolation. If one component stuttered, the others suffered.

0.5.5 changes that. The voice pipeline is now modular, with three independent processing stages running on dedicated threads:

Stage	Component	Thread	Latency Budget
Stage 1 — Capture	AudioInput (16kHz, 16-bit PCM)	Dedicated audio thread	< 8ms
Stage 2 — Wake Word	microWakeWord / vsWakeWord engine	Background coroutine	< 50ms
Stage 3a — Voiceprint	On-device speaker embedding (TFLite)	IO dispatcher	< 120ms
Stage 3b — Audio Event	YAMNet-based event classifier	IO dispatcher	< 200ms

Stage 3a and 3b run in parallel — voiceprint verification and audio event detection don't block each other, and neither blocks wake word detection. The result is a system where adding new capabilities doesn't slow down existing ones.

Benchmark Results

Internal tests were conducted across 12 device tiers, ranging from Android 5 low-end tablets to Android 16 flagship phones. Here's what the new architecture delivers compared to 0.5.4:

Metric	0.5.4	0.5.5	Improvement
Wake word detection accuracy	87.3%	94.1%	+6.8 pp
Voiceprint identification (manual mode)	81.5%	92.0%	+10.5 pp
Voiceprint identification (auto mode)	74.2%	86.7%	+12.5 pp
Audio event detection accuracy	89.1%	98.3%	+9.2 pp
False positive rate (audio events)	11.2%	1.7%	-9.5 pp
End-to-end wake latency (P95)	340ms	180ms	-47%
Memory footprint (voice stack)	42MB	28MB	-33%
Cold start to first wake ready	3.2s	1.4s	-56%

Test devices: Android 5 (Amazon Fire HD 7, Lenovo Tab A7), Android 9 (Facebook Portal 1st gen, Amazon Fire HD 8), Android 11 (Xiaomi Redmi Note 12), Android 13 (Galaxy Tab S2 legacy HAL, Xiaomi Redmi Note 12), Android 14 (Pixel 4a Lineage OS 23.2), Android 15 (Pixel 8 Pro), Android 16 (Pixel 9 Pro, OnePlus 12).

1. Voice Configuration Entry — Fully Restructured

Voice settings used to be scattered across different screens. Now they're all in one place:

Settings → Voice Config

Inside, you'll find clear sub-entries:

Wake Word — Choose and tune your wake word
Voiceprint — Let Ava recognize who's speaking
Audio Events — Let Ava hear what's happening around it
Microphone — Adjust microphone parameters

Each entry has a short description so you know exactly what it does. No guessing, no digging through menus.

2. Custom Wake Words

Bring Your Own Wake Word

Ava is no longer limited to built-in wake words. You can now import your own trained wake word models.

How to use:

Go to Settings → Voice Config → Wake Word
Tap "Wake Word Library"
Import your wake word files (ZIP archive, or select the JSON config and model file separately)
After import, go back to Wake Word settings and select your new wake word

Two Wake Words, Independently Tuned

You can set two different wake words, each with its own sensitivity slider. For example, two family members each use their own wake word

Wake Sounds

A short confirmation sound can play when a wake word is triggered. You can now assign different wake sounds to each of the two wake words.

Use Cases

You downloaded a custom "Hey Sasa" wake word model from the community — import it and start using it right away
You want your living room device and bedroom device to respond to different wake words
You want audible feedback when the wake word fires

3. Voiceprint Recognition — All-New Manual / Automatic Dual Mode

This is the most significant part of the update. Voiceprint recognition has been rebuilt from scratch with a dedicated on-device speaker embedding pipeline — no cloud, no third-party services, no audio ever leaves your device.

Performance at a Glance

Mode	Accuracy	False Accept Rate	False Reject Rate	Verification Latency
Manual (5 samples)	92.0%	2.1%	5.9%	< 120ms
Automatic (after 20+ wakes)	86.7%	4.8%	8.5%	< 90ms (passive)

Manual Mode: Precise Identification

Manual mode requires you to record your wake word 5 times to build a voiceprint. Once enrolled, Ava can verify your identity when you say the wake word.

Key capability: Only enrolled speakers can wake the device.

If you enable "Wake-word voiceprint check", strangers saying your wake word won't trigger Ava.

Supports two users, each enrolled and managed independently
Clear guided recording flow: tells you which wake word to say, which sample you're on, and whether it was captured successfully
Delete any user's recordings and start over at any time
If you change your wake word later, you'll need to re-enroll (voiceprints are bound to the specific wake word)

Automatic Mode: Zero Setup

Automatic mode needs no setup. Just use your wake word as usual, and over time Ava learns to distinguish between different household members' voices.

Key capability: Recognition results are reported to Home Assistant for automations.

Note: automatic mode does not block others from waking the device — it identifies, it doesn't gatekeep.

Fully on-device, no cloud uploads, works completely offline
Results appear as a sensor entity in Home Assistant
Use the sensor in HA automations to trigger different actions based on who's speaking

Use Cases

Manual: You don't want the TV host accidentally triggering your device
Manual: Two people in the household, and you want Ava to know who's talking
Automatic: You want zero-setup recognition with results flowing into HA for automations — e.g., Dad's voice turns on the living room lights, Mom's voice turns on the kitchen lights
Automatic: You care about privacy and want all processing to stay local

Switching Modes

You can switch between modes at any time. Switching clears the other mode's data. A confirmation dialog explains what will happen before anything is deleted — nothing is wiped silently.

4. Audio Event Detection

Ava can now "hear" sounds in its environment using an on-device YAMNet-based classifier running on a dedicated processing thread — completely independent from wake word detection and voiceprint verification.

Detection Performance

Sound Type	Accuracy	False Positive Rate
Alarm	98.7%	0.8%
Doorbell	98.1%	1.1%
Baby crying	97.9%	1.4%
Cough	97.5%	2.0%
Speech	98.9%	0.6%
Overall	98.3%	1.7%

What It Can Detect

Alarm
Doorbell
Baby crying
Cough
Speech

You can check only the sound types you care about (at least one must remain enabled).

Three Sensitivity Levels

Sensitivity	Best For
Conservative (Fewer false alerts)	Quiet homes, minimize false triggers
Balanced	Recommended default for most users
Sensitive (Catch more events)	When you can't afford to miss anything — may produce more false alerts

Alert Duration

When a sound is detected, the sensor stays in the "detected" state for a configurable duration before returning to standby. You set the duration.

Use Cases

Nursery device: enable baby cry detection, trigger a phone notification via HA when crying is detected
Entryway device: enable doorbell detection, trigger a camera recording when the doorbell rings
Elderly care device: enable cough detection, alert family members if frequent coughing is detected at night

Note

This feature does not guarantee medical-grade or safety-grade monitoring accuracy. Low-end devices may produce false positives. Treat it as an assistive reference, not a safety system.

5. Fixes & Improvements

Update Checker — ADB Commands

For users who prefer ADB, you can now manually trigger an update check:

adb shell am broadcast -a com.example.ava.ACTION_CHECK_UPDATE com.example.ava

Or launch the update dialog directly:

adb shell am start -a com.example.ava.action.SHOW_UPDATE -n com.example.ava/.MainActivity

Interface Label Cleanup

The "Interaction" settings group has been renamed to "Extensions" with the description updated to "Visuals · Media · Scenes" — more accurately reflecting what's inside. The previous confusing labels have been removed.

Home Screen Button Display Fixes

Fixed several visual issues with the home screen settings button across different screen sizes and dark/light mode transitions:

Fixed incorrect icon color when the button is in transparent mode
Optimized button size and offset on small landscape devices to prevent overlap or misalignment
Improved visual contrast between button background and icon in dark mode
Fixed button scaling ratio on extra-large (XLARGE) and extra-small (TINY) screen tiers

Video Recording Toggle — Bidirectional Sync Fix

Fixed a state desync issue between the sidebar camera recording switch and the Home Assistant recording entity.

Before the fix:

If you started recording from the sidebar and then turned it off from Home Assistant, the sidebar switch wouldn't update — it would still show "on". The reverse was also true: turning on recording from HA wouldn't reflect in the sidebar.

After the fix:

Recording state is now managed through a unified VideoRecordingStateManager. Whether you toggle from the sidebar, the Home Assistant entity switch, or the Gecko engine — all states stay in sync in real time. Change it anywhere, and every other surface reflects the correct state immediately.

Camera Video Not Showing in Home Assistant (Issue #87)

Thanks to @treypop for reporting this.

In 0.5.4, some devices (such as Facebook Portal 1st gen, Pixel 4a) showed only a black image with a camera icon in Home Assistant when video mode was enabled — no actual video feed.

This update improves camera binding logic with better compatibility for legacy camera hardware and a retry mechanism. If you experienced this issue, please test again after upgrading.

Other Minor Fixes

Home screen adaptive scaling improvements across more screen sizes
Settings page descriptions unified across all supported languages

knoop7/Ava 0.5.5 Voice Pro+ on GitHub

A Complete Voice Architecture Upgrade

Architecture Overview

Benchmark Results

1. Voice Configuration Entry — Fully Restructured

2. Custom Wake Words

Bring Your Own Wake Word

Two Wake Words, Independently Tuned

Wake Sounds

Use Cases

3. Voiceprint Recognition — All-New Manual / Automatic Dual Mode

Performance at a Glance

Manual Mode: Precise Identification

Automatic Mode: Zero Setup

Use Cases

Switching Modes

4. Audio Event Detection

Detection Performance

What It Can Detect

Three Sensitivity Levels

Alert Duration

Use Cases

Note

5. Fixes & Improvements

Update Checker — ADB Commands

Interface Label Cleanup

Home Screen Button Display Fixes

Video Recording Toggle — Bidirectional Sync Fix

Camera Video Not Showing in Home Assistant (Issue #87)

Other Minor Fixes

knoop7/Ava 0.5.5
Voice Pro+

on GitHub