chigkim/VOCR v2.0.0-beta.1 on GitHub

Changelog

Pre-release channel
Kill other running instances VOCR.
Store API key in Keychain
- Quit VOCR
- Delete ~/Library/Preferences/com.chikim.VOCR.plist permanently with Command+Option+Delete
- Reboot.
Added permission for notification center
Fixed when menu is not working after closing a window
Logger creates file when the file is deleted.
Check for update when launching
Increased timeout for request to 10 minutes
Play sound when VOCR is launched and ready.
Alert update through notification center
Fixed error when encountering Ollama model with no families.
Realtime OCR shortcut toggles the feature.
Autoupdater
Implemented logger
Ask which model for Ollama to use if multiple clip models are found.
You can also select a model for Ollama by just click Ollama in the model menu.
Ask for a prompt after taking a screenshot.
New prompt for explore
Explore no longer generates images meant for debugging.
Presents the same menu when launched by shortcut or clicking statusbar.
Reports more errors when request fails.
Cancels previous request when making new request
Ollama support
Use original screenshot resolution instead of window resolution point except explore mode.
New Workflow: Use Command+Control+Shift+W/V to set the target to a window/VOCursor and perform the OCR scan. After that, the features such as real-time OCR, explore, and ask will use the target.
Reset shortcut if there are different features after an update
Bug fix: global shortcuts sometimes not active
Customize shortcuts
Token usage at the end of description
Support system prompt for GPT
Setting to toggle use last prompt without asking
Save last screenshot
Dismiss menu with command+Z instead of esc if realtime or navigation is active.
You can just press return to ask GPT without editing.
Changed diff algorithm for less verbose realtime OCR.
Realtime OCR remains active at its initial location, allowing you to move the VOCursor during the process. To perform realtime OCR in a different location, stop the OCR, move the VOCursor, then restart realtime OCR.
Realtime OCR of VOCursor: Command+Control+Shift+r
Able to toggle obbject detection from the setings.
OCR Window: Command+Control+Shift+w
OCR VOCursor: Command+Control+Shift+v
Ask GPT about VOCursor: Command+Control+Shift+a
Settings: Command+Control+Shift+S
Faster screenshot of VOCursor
Open an image file in VOCR from finder to ask GPT
Gpt response gets copied to the clipboard, so you can paste somewhere if you miss it.
Object Detection through rectangles: Any boxes without text such as icons.
Moved save OCR result to the menu.
Moved target window to settings menu.
auto Scan: Thanks @vick08
Readme Improvement: Thanks @ssawczyn

The GPT features utilize GPT-4V, and they require your own OpenAI API key.

The usage cost from VOCR is an estimate. For the official usage and cost, please refer to the Usage Dashboard on OpenAI website. Also you can create an monthly limit and alert on the website as well.

Explore feature only works with GPT, and location information from the model is extremely unreliable and inaccurate.

Instruction for Ollama

Download Ollama and install.
Open terminal, and type "ollama run llava" without the quotes.
Wait until you get the prompt >>> send a message
Then type /bye and press return
Quit terminal
Go to VOCR menu > Settings > Models and select Ollama

Experimental

These features may not make into the public release.

Identify object when navigation is active: Command+Control+I
Explore window with GPT: Command+Control+Shift+e
an option to switch to using a local model such as Llava using llama.cpp instead of GPT.

Warning: It's very complex to set your own Llama.cpp server.

chigkim/VOCR v2.0.0-beta.1 VOCR v2.0.0-beta.1 on GitHub

Changelog

Instruction for Ollama

Experimental

chigkim/VOCR v2.0.0-beta.1
VOCR v2.0.0-beta.1

on GitHub