Adds new Anthropic Claude 3 models.
- Backend now uses the
messagesAPI for Claude 2.1+ models. - Adds the
systemmessage parameter in Claude settings.
Adds browser-sandboxed Python with pyodide
You can now run Python in a safe sandbox entirely in the browser, provided you do not need to import third-party libraries.
The web-hosted version at chainforge.ai/play now has Python evaluators unlocked:
The local version of ChainForge includes a toggle to turn sandboxing on or off:
If you turn sandboxing off, you go back to the previous Python evaluator, executed on your local machine through the Flask backend. In the non-sandboxed eval node you can import any libraries available in your Python environment.
Why sandboxing?
The benefit of sandboxing is that ChainForge can now be used to execute Python code generated by LLMs, using eval() or exec() in your evaluation function. This was possible before but dangerous and unsafe. Benchmarks that do not rely on third-party libraries, like HumanEvals at pass@1 rate, could be run within ChainForge entirely in the web browser (if anyone wants to set this up, let me know!).