SWE-agent is SOTA on offensive cybersecurity
SWE-agent EnIGMA (Enhanced Interactive Generative Model Agent) is SOTA on offensive cybersecurity challenges, with a 3.3x improvement over previous agents on the NYU CTF challenge dataset. The EnIGMA project introduces multiple novelties that are available to all use cases of SWE-agent, such as Interactive Agent Tools and a Summarizer to handle long outputs.
Major additions
- Capability to run over CTF challenges
- Interactive Agent Tools, including
gdb
- Summarizers to handle long outputs
Smaller additions
- Add filemap command in the spirit of repomap by @samuela in #619
- Create config to run human eval style challenges by @ofirpress in #658
- Add claude 3.5 sonnet to models by @carlosejimenez in #601
- Enh: Warn if scrolling >= 3 times by @klieret in #626
- feat: support deepseek-coder LLM by @jcraftsman in #638
- Enh: Make timeout for agent commands configurable by @klieret in #674
- Add support for new gpt-4o-mini model by @ivan4722 in #693
- Groq Models Integration by @MohammedNagdy in #721
- Make log level configurable; add TRACE level by @klieret in #612
Fixes
- Compatibility with SWE-bench 2.0 by @klieret in #671
- ensure variables work in special command docstring by @forresty in #628
- Important fix: Catch CostLimitExceeded in retry because of format/block by @klieret in #682
- Fix: Handle empty traj in should_skip by @klieret in #616
- Fix for end-marker communicate: Exit status always 0/invalid by @klieret in #644
- Fix: Insufficient quoting of git commit message by @klieret in #646
- Fix nonsensical trajectory formatting for PRs by @klieret in #647
- Fix: sweunexpected keyword 'python_version' by @klieret in #692
- Fix: Use LONG_TIMEOUT for pre_install commands by @klieret in #695
- Fix: UnboundLocalError when catching decoding issue by @klieret in #709
- Also create empty patch files for completeness by @klieret in #725
- Fix: Raise ContextWindowExceeded instead of exit_cost by @klieret in #727
- Fix: Deal with non-utf8 encoded bytes in comm by @klieret in #731
- Fix: Handle spaces in repo names by @klieret in #734
- Fix: Ensure utils is part of package by @klieret in #742
- Fix: Submitting ' ' in human mode crashes container by @klieret in #749
- Fix: Block su as command by @klieret in #752
- Fix: SWE_AGENT_MODEL_MAX_RETRIES needs casting by @klieret in #757
New Contributors
🎉 @talorabr, @udiboy1209, @haoranxi, @NickNameInvalid, @rollingcoconut joined the team to build EnIGMA 🎉
- @carlosejimenez made their first contribution in #601
- @samefarrar made their first contribution in #606
- @hubstrauss made their first contribution in #625
- @samuela made their first contribution in #619
- @forresty made their first contribution in #628
- @jcraftsman made their first contribution in #638
- @ivan4722 made their first contribution in #693
- @JoshuaPurtell made their first contribution in #703
- @MohammedNagdy made their first contribution in #721
- @pdemro made their first contribution in #729
Full Changelog: v0.6.1...v0.7.0