SWE-agent is SOTA on offensive cybersecurity

SWE-agent EnIGMA (Enhanced Interactive Generative Model Agent) is SOTA on offensive cybersecurity challenges, with a 3.3x improvement over previous agents on the NYU CTF challenge dataset. The EnIGMA project introduces multiple novelties that are available to all use cases of SWE-agent, such as Interactive Agent Tools and a Summarizer to handle long outputs.

Major additions

Capability to run over CTF challenges
Interactive Agent Tools, including gdb
Summarizers to handle long outputs

Smaller additions

Add filemap command in the spirit of repomap by @samuela in #619
Create config to run human eval style challenges by @ofirpress in #658
Add claude 3.5 sonnet to models by @carlosejimenez in #601
Enh: Warn if scrolling >= 3 times by @klieret in #626
feat: support deepseek-coder LLM by @jcraftsman in #638
Enh: Make timeout for agent commands configurable by @klieret in #674
Add support for new gpt-4o-mini model by @ivan4722 in #693
Groq Models Integration by @MohammedNagdy in #721
Make log level configurable; add TRACE level by @klieret in #612

Fixes

Compatibility with SWE-bench 2.0 by @klieret in #671
ensure variables work in special command docstring by @forresty in #628
Important fix: Catch CostLimitExceeded in retry because of format/block by @klieret in #682
Fix: Handle empty traj in should_skip by @klieret in #616
Fix for end-marker communicate: Exit status always 0/invalid by @klieret in #644
Fix: Insufficient quoting of git commit message by @klieret in #646
Fix nonsensical trajectory formatting for PRs by @klieret in #647
Fix: sweunexpected keyword 'python_version' by @klieret in #692
Fix: Use LONG_TIMEOUT for pre_install commands by @klieret in #695
Fix: UnboundLocalError when catching decoding issue by @klieret in #709
Also create empty patch files for completeness by @klieret in #725
Fix: Raise ContextWindowExceeded instead of exit_cost by @klieret in #727
Fix: Deal with non-utf8 encoded bytes in comm by @klieret in #731
Fix: Handle spaces in repo names by @klieret in #734
Fix: Ensure utils is part of package by @klieret in #742
Fix: Submitting ' ' in human mode crashes container by @klieret in #749
Fix: Block su as command by @klieret in #752
Fix: SWE_AGENT_MODEL_MAX_RETRIES needs casting by @klieret in #757

New Contributors

🎉 @talorabr, @udiboy1209, @haoranxi, @NickNameInvalid, @rollingcoconut joined the team to build EnIGMA 🎉

@carlosejimenez made their first contribution in #601
@samefarrar made their first contribution in #606
@hubstrauss made their first contribution in #625
@samuela made their first contribution in #619
@forresty made their first contribution in #628
@jcraftsman made their first contribution in #638
@ivan4722 made their first contribution in #693
@JoshuaPurtell made their first contribution in #703
@MohammedNagdy made their first contribution in #721
@pdemro made their first contribution in #729

princeton-nlp/SWE-agent v0.7.0 SWE-agent EnIGMA (0.7.0) on GitHub

SWE-agent is SOTA on offensive cybersecurity

Major additions

Smaller additions

Fixes

New Contributors

princeton-nlp/SWE-agent v0.7.0
SWE-agent EnIGMA (0.7.0)

on GitHub