Changes
- Major UI optimization: use the morphdom library to make incremental updates to the Chat tab during streaming (#6653). With this:
- The CPU usage is drastically reduced for long contexts or high tokens/second.
- The UI doesn't become sluggish in those scenarios anymore.
- You can select and copy text or code from previous messages during streaming, as those elements remain static with the "morphing" operations performed by morphdom. Only what has changed gets updated.
- Add a button to copy the raw message content below each chat message.
- Add a button to regenerate the reply below the last chat message.
- Activate "auto_max_new_tokens" by default, to avoid having to "continue" the chat reply for every 512 tokens.
- Installer:
- Update Miniconda to 24.11.1 (latest version). Note: Miniconda is only used during the initial setup.
- Make the checksum verification for the Miniconda installer more robust on Windows, to account for systems where it was previously failing to execute at all.
Bug fixes
- Unescape backslashes in html_output (#6648). Thanks @mamei16.
- Fix the gallery extension (#6656). Thanks @TheLounger.
- HTML: Fix quote pair RegEx matching for all quote types (#6661). Thanks @Th-Underscore.
Backend updates
- Transformers: bump to 4.48.
- flash-attention: bump to 2.7.3.