Guidance 0.2.2

Better support for OpenAI, we fixed some bugs too (we tried to at least)!

Included are 0.2.1 changes as we didn't record it previously.

Added

Token probabilities and backtracking support in the Jupyter widget.
Support for offline notebook rendering.
Higher-level AST for representing various input/output types for models (including JSON outputs, image inputs, special tokens, selects, joins, strings, etc.)
Interpreter abstraction layer between Model and Engine classes, allowing for flexible dispatching of various AST node types, e.g. using the OpenAI structured output API for JSON generation.
Adds more complete support for OpenAI models, including image inputs, audio inputs and outputs, and json outputs via their structured output API.
Experimental support for vLLM and LiteLLM models
Expanded support for more tokenizers, using llguidance's native implementations where possible.

The bench module has been removed. JSON schema is now benchmarked here.
Wall time metric has been temporarily removed from the notebook widget.
A number of remote endpoints are no-longer supported, including Cohere, Gemini, GoogleAI, VertexAI, and Anthropic (note: previous support was extremely limited). Intent to provide support for these models under the new Interpreter framework in a future release.

Notebook widget is visually compressed, controls and metrics are minimized in real estate to prioritize the output space.
An exception is now raised when using a with role block inside of a stateless function (previously, these silently failed)
AzureOpenAI and AzureInference model constructors have been refactored somewhat; to be stabilized in a future release.

The dependency guidance-stitch is now pinned, if you run into notebook visualization issues please run pip install -U guidance-stitch independently.
Majority of key errors for notebook widgets are fixed.
On some operating systems, metric generation would crash guidance, now fixed.
Occasionally, a token would be duplicated onscreen in a notebook, the notebook view should now map to str(lm)
Fixed HybridCache not subscriptable bug when using non-legacy transformers Cache objects
Fix bug that was skewing token probabilities when masking was applied