oobabooga/text-generation-webui v1.12 on GitHub

Backend updates

Transformers: bump to 4.43 (adds Llama 3.1 support).
ExLlamaV2: bump to 0.1.8 (adds Llama 3.1 support).
AutoAWQ: bump to 0.2.6 (adds Llama 3.1 support).
- Remove AutoAWQ as a standalone loader. I found that hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4 works better when loaded directly through Transformers, and that's what the README recommends. AutoAWQ is still used in the background.

Make text between quote characters colored in chat and chat-instruct modes.
Prevent LaTeX from being rendered for inline "$", as that caused problems for phrases like "apples cost $1, oranges cost $2".
Make the markdown cache infinite and clear it when switching to another chat. This cache exists because the markdown conversion is CPU-intensive. By making it infinite, messages in a full 128k context will be cached, making the UI more responsive for long conversations.

Fix a race condition that caused the default character to not be loaded correctly on startup.
Fix Linux shebangs (#6110). Thanks @LuNeder.

Make the Google Colab notebook use the one-click installer instead of its own Python environment for better stability.
Disable flash-attention on Google Colab by default, as its GPU models do not support it.