github jegly/Box v1.0.12
Box v1.0.12

7 hours ago
banner-box002

✨ Highlights

A big release: two new voice models, on-device image upscaling, a Gemini Nano vision upgrade,
a much-improved Markdown/LaTeX renderer, new language models, and a pile of fixes.


🎙️ Voice & Audio

  • SenseVoice — fast multilingual speech-to-text (new). A new card in the Voice tab. Transcribes
    Chinese, English, Japanese, Korean and Cantonese, runs fully offline, and is roughly 5× faster
    than Whisper
    on CPU. Includes a live "listening" preview while you talk, a multi-message transcript
    log (copy/delete/clear), a language picker, punctuation/number formatting, and optional emotion &
    audio-event tags.
  • Supertonic — fast multilingual text-to-speech (new). A new card in the Voice tab. Lightweight
    (~66M params) on-device speech synthesis in English, Korean, Spanish, Portuguese and French, with
    multiple built-in voices and adjustable speed. Fully offline — text never leaves your device.

🖼️ Image

  • AI Image Upscaling / Super-Resolution (new). A new "Upscale" feature in the image tab. Pick any
    photo, enhance and enlarge it 4× on-device, and save the result to your gallery — fully offline.
    Choose between XLSR (fastest, tiny), Real-ESRGAN General (balanced) and Real-ESRGAN x4plus
    (highest quality). All three models are bundled in the app — no download required. Photos are
    auto-rotated correctly before upscaling.
  • Gemini Nano Vision: visual overlays (stock build). Pose detection now draws a skeleton overlay
    and Face Mesh draws a 468-point mesh directly on the camera preview and still images (previously
    text-only). Plus: copy buttons on every vision result, an adjustable live refresh rate
    (Fast/Balanced/Slow/Power-saver) with a Freeze/Resume toggle, front/rear camera switching on
    all modes, and image upload from your gallery.

🤖 Models

  • New language models: TinyLlama 1.1B, Phi-4-mini, TinySwallow 1.5B, VibeThinker 1.5B, and Qwen3 8B.
  • Models browser organized by type. The model list is now grouped into Language models /
    Speech-to-Text / Text-to-Speech / Image generation / Other
    instead of one flat list.
  • Clearer model guidance. Gemma 4 E2B is labelled "Recommended" and E4B "Best overall for
    flagship devices,"
    with cleaned-up, less cluttered model descriptions.

✍️ Markdown & LaTeX rendering

  • Major rendering fixes (#42). Headers, bullet/numbered lists and bold text now render correctly
    even when mixed with inline math on the same line, bold that spans a math expression no longer
    shows literal **, and wide display equations scroll instead of being clipped.

🧹 Interface

  • Removed promotional banners/links from the MCP and Agent screens for a cleaner first-run experience
    (sample-prompt chips kept).
  • Saving an upscaled image now shows a "Saving…" indicator and writes safely — leaving the screen
    mid-save no longer produces a cropped/partial image in your gallery.

🛠️ Fixes & under the hood

  • Fixed a crash on some Snapdragon NPU devices when using vision/audio models (#59).
  • Leftover model files are cleaned up after app updates (#61).
  • Fixed a speech-recognition hang on GrapheneOS (custom-ROM build, #65).
  • Fixed a crash opening the settings dialog on small-context-window models.
  • Fixed the notification tap target (deep link).
  • Updated to Android SDK 37.

Build variants

  • Main (stock Android): includes the Gemini Nano Hub & Vision features (needs AICore / Pixel 9+ for
    AICore modes; ML-Kit vision runs on any device).
  • custom-rom-support (GrapheneOS / LineageOS / CalyxOS): SenseVoice, Supertonic and Upscaling are all
    included; no Google-services-dependent features.

Don't miss a new Box release

NewReleases is sending notifications on new releases.