jegly/Box v1.0.12 on GitHub

✨ Highlights

A big release: two new voice models, on-device image upscaling, a Gemini Nano vision upgrade,
a much-improved Markdown/LaTeX renderer, new language models, and a pile of fixes.

🎙️ Voice & Audio

SenseVoice — fast multilingual speech-to-text (new). A new card in the Voice tab. Transcribes
Chinese, English, Japanese, Korean and Cantonese, runs fully offline, and is roughly 5× faster
than Whisper on CPU. Includes a live "listening" preview while you talk, a multi-message transcript
log (copy/delete/clear), a language picker, punctuation/number formatting, and optional emotion &
audio-event tags.
Supertonic — fast multilingual text-to-speech (new). A new card in the Voice tab. Lightweight
(~66M params) on-device speech synthesis in English, Korean, Spanish, Portuguese and French, with
multiple built-in voices and adjustable speed. Fully offline — text never leaves your device.

🖼️ Image

AI Image Upscaling / Super-Resolution (new). A new "Upscale" feature in the image tab. Pick any
photo, enhance and enlarge it 4× on-device, and save the result to your gallery — fully offline.
Choose between XLSR (fastest, tiny), Real-ESRGAN General (balanced) and Real-ESRGAN x4plus
(highest quality). All three models are bundled in the app — no download required. Photos are
auto-rotated correctly before upscaling.
Gemini Nano Vision: visual overlays (stock build). Pose detection now draws a skeleton overlay
and Face Mesh draws a 468-point mesh directly on the camera preview and still images (previously
text-only). Plus: copy buttons on every vision result, an adjustable live refresh rate
(Fast/Balanced/Slow/Power-saver) with a Freeze/Resume toggle, front/rear camera switching on
all modes, and image upload from your gallery.

🤖 Models

New language models: TinyLlama 1.1B, Phi-4-mini, TinySwallow 1.5B, VibeThinker 1.5B, and Qwen3 8B.
Models browser organized by type. The model list is now grouped into Language models /
Speech-to-Text / Text-to-Speech / Image generation / Other instead of one flat list.
Clearer model guidance. Gemma 4 E2B is labelled "Recommended" and E4B "Best overall for
flagship devices," with cleaned-up, less cluttered model descriptions.

✍️ Markdown & LaTeX rendering

Major rendering fixes (#42). Headers, bullet/numbered lists and bold text now render correctly
even when mixed with inline math on the same line, bold that spans a math expression no longer
shows literal **, and wide display equations scroll instead of being clipped.

🧹 Interface

Removed promotional banners/links from the MCP and Agent screens for a cleaner first-run experience
(sample-prompt chips kept).
Saving an upscaled image now shows a "Saving…" indicator and writes safely — leaving the screen
mid-save no longer produces a cropped/partial image in your gallery.

🛠️ Fixes & under the hood

Fixed a crash on some Snapdragon NPU devices when using vision/audio models (#59).
Leftover model files are cleaned up after app updates (#61).
Fixed a speech-recognition hang on GrapheneOS (custom-ROM build, #65).
Fixed a crash opening the settings dialog on small-context-window models.
Fixed the notification tap target (deep link).
Updated to Android SDK 37.

Build variants

Main (stock Android): includes the Gemini Nano Hub & Vision features (needs AICore / Pixel 9+ for
AICore modes; ML-Kit vision runs on any device).
custom-rom-support (GrapheneOS / LineageOS / CalyxOS): SenseVoice, Supertonic and Upscaling are all
included; no Google-services-dependent features.

jegly/Box v1.0.12 Box v1.0.12 on GitHub

✨ Highlights

🎙️ Voice & Audio

🖼️ Image

🤖 Models

✍️ Markdown & LaTeX rendering

🧹 Interface

🛠️ Fixes & under the hood

Build variants

jegly/Box v1.0.12
Box v1.0.12

on GitHub