✨ Highlights
A big release: two new voice models, on-device image upscaling, a Gemini Nano vision upgrade,
a much-improved Markdown/LaTeX renderer, new language models, and a pile of fixes.
🎙️ Voice & Audio
- SenseVoice — fast multilingual speech-to-text (new). A new card in the Voice tab. Transcribes
Chinese, English, Japanese, Korean and Cantonese, runs fully offline, and is roughly 5× faster
than Whisper on CPU. Includes a live "listening" preview while you talk, a multi-message transcript
log (copy/delete/clear), a language picker, punctuation/number formatting, and optional emotion &
audio-event tags. - Supertonic — fast multilingual text-to-speech (new). A new card in the Voice tab. Lightweight
(~66M params) on-device speech synthesis in English, Korean, Spanish, Portuguese and French, with
multiple built-in voices and adjustable speed. Fully offline — text never leaves your device.
🖼️ Image
- AI Image Upscaling / Super-Resolution (new). A new "Upscale" feature in the image tab. Pick any
photo, enhance and enlarge it 4× on-device, and save the result to your gallery — fully offline.
Choose between XLSR (fastest, tiny), Real-ESRGAN General (balanced) and Real-ESRGAN x4plus
(highest quality). All three models are bundled in the app — no download required. Photos are
auto-rotated correctly before upscaling. - Gemini Nano Vision: visual overlays (stock build). Pose detection now draws a skeleton overlay
and Face Mesh draws a 468-point mesh directly on the camera preview and still images (previously
text-only). Plus: copy buttons on every vision result, an adjustable live refresh rate
(Fast/Balanced/Slow/Power-saver) with a Freeze/Resume toggle, front/rear camera switching on
all modes, and image upload from your gallery.
🤖 Models
- New language models: TinyLlama 1.1B, Phi-4-mini, TinySwallow 1.5B, VibeThinker 1.5B, and Qwen3 8B.
- Models browser organized by type. The model list is now grouped into Language models /
Speech-to-Text / Text-to-Speech / Image generation / Other instead of one flat list. - Clearer model guidance. Gemma 4 E2B is labelled "Recommended" and E4B "Best overall for
flagship devices," with cleaned-up, less cluttered model descriptions.
✍️ Markdown & LaTeX rendering
- Major rendering fixes (#42). Headers, bullet/numbered lists and bold text now render correctly
even when mixed with inline math on the same line, bold that spans a math expression no longer
shows literal**, and wide display equations scroll instead of being clipped.
🧹 Interface
- Removed promotional banners/links from the MCP and Agent screens for a cleaner first-run experience
(sample-prompt chips kept). - Saving an upscaled image now shows a "Saving…" indicator and writes safely — leaving the screen
mid-save no longer produces a cropped/partial image in your gallery.
🛠️ Fixes & under the hood
- Fixed a crash on some Snapdragon NPU devices when using vision/audio models (#59).
- Leftover model files are cleaned up after app updates (#61).
- Fixed a speech-recognition hang on GrapheneOS (custom-ROM build, #65).
- Fixed a crash opening the settings dialog on small-context-window models.
- Fixed the notification tap target (deep link).
- Updated to Android SDK 37.
Build variants
- Main (stock Android): includes the Gemini Nano Hub & Vision features (needs AICore / Pixel 9+ for
AICore modes; ML-Kit vision runs on any device). - custom-rom-support (GrapheneOS / LineageOS / CalyxOS): SenseVoice, Supertonic and Upscaling are all
included; no Google-services-dependent features.