Guidance 0.2.4

Better sampling, better metrics, llama-cpp-python fixes (update to latest please!), uncountable visualization fixes.

Added

Allow changing sampling_params (top_p/top_k, min_p, repetition_penalty) on the fly Model.with_sampling_params(...)
Add top_k tokens back into vis after temporary removal in previous refactor

Model.token_count removed in favor of (currently private) Model._get_usage().output_tokens

Bookkeeping of metrics such as input_tokens, output_tokens, ff_tokens, token_savings, avg_latency_ms have been added to State and are now accessible via (private for now) Model._get_usage(). This replaces bookkeeping that was previously attached to Engine instances.
Factory functions create_azure_openai_model() and create_azure_openai_model() for accessing models hosted in AzureAI

Intermittent double widget render fixed.
Widget doesn't always complete running, fixed.
Widget backtracking bug fixed
Widget now always show both inputs and outputs, sometimes would fail.
TraceHandler forests stripped of extra trace nodes, sometimes caused render glitches.
Widget latency displays now render.
Widget early race condition resolved (sometimes widget is ready after backend is firing messages)
Various linting and build improvements
Tokens generated with OpenAI now correctly tagged as generated for vis
Fix compatability with llama-cpp-python 0.3.12, bump dependency from 0.3.9 to 0.3.12 (first contrib: @jovemexausto)