github chigkim/VOCR v2.0.0-alpha.16
VOCR v2.0.0-alpha.16

latest releases: v2.1.0, v2.0.1, v2.0.0...
pre-release10 months ago

Changelog

  • Ask which model for Ollama to use if multiple clip models are found.
  • You can also select a model for Ollama by just click Ollama in the model menu.
  • Ask for a prompt after taking a screenshot. Fixes #29
  • New prompt for explore
  • Explore no longer generates images meant for debugging.
  • Presents the same menu when launched by shortcut or clicking statusbar.
  • Reports more errors when request fails.
  • Cancels previous request when making new request
  • Ollama support
  • Use original screenshot resolution instead of window resolution point except explore mode.
  • New Workflow: Use Command+Control+Shift+W/V to set the target to a window/VOCursor and perform the OCR scan. After that, the features such as real-time OCR, explore, and ask will use the target.
  • Reset shortcut if there are different features after an update
  • Bug fix: global shortcuts sometimes not active
  • Customize shortcuts
  • Token usage at the end of description
  • Support system prompt for GPT
  • Setting to toggle use last prompt without asking
  • Save last screenshot
  • Dismiss menu with command+Z instead of esc if realtime or navigation is active.
  • You can just press return to ask GPT without editing.
  • Changed diff algorithm for less verbose realtime OCR.
  • Realtime OCR remains active at its initial location, allowing you to move the VOCursor during the process. To perform realtime OCR in a different location, stop the OCR, move the VOCursor, then restart realtime OCR.
  • Realtime OCR of VOCursor: Command+Control+Shift+r
  • Able to toggle obbject detection from the setings.
  • OCR Window: Command+Control+Shift+w
  • OCR VOCursor: Command+Control+Shift+v
  • Ask GPT about VOCursor: Command+Control+Shift+a
  • Settings: Command+Control+Shift+S
  • Faster screenshot of VOCursor
  • Open an image file in VOCR from finder to ask GPT
  • Gpt response gets copied to the clipboard, so you can paste somewhere if you miss it.
  • Object Detection through rectangles: Any boxes without text such as icons.
  • Moved save OCR result to the menu.
  • Moved target window to settings menu.
  • auto Scan: Thanks @vick08
  • Readme Improvement: Thanks @ssawczyn

The GPT features utilize GPT-4V, and they require your own OpenAI API key.

Explore feature only works with GPT, and location information from the model is extremely unreliable and inaccurate.

Instruction for Ollama

  • Download Ollama and install.
  • Open terminal, and type "ollama run llava" without the quotes.
  • Also you can execute "ollama run llava:13b" to download a bigger model for better accuracy but slower speed.
  • Wait until you get the prompt >>> send a message
  • Then type /bye and press return
  • Quit terminal
  • Go to VOCR menu > Settings > Models and select Ollama

Experimental

These features may not make into the public release.

  • Identify object when navigation is active: Command+Control+I
  • Explore window with GPT: Command+Control+Shift+e
  • an option to switch to using a local model such as Llava using llama.cpp instead of GPT.

Warning: It's very complex to set your own Llama.cpp server.

Don't miss a new VOCR release

NewReleases is sending notifications on new releases.