plandex-ai/plandex cli/v2.1.0 on GitHub

🚀 OpenRouter only for BYO key

When using a BYO key mode (either cloud or self-hosted), you can now use Plandex with only an OpenRouter.ai account and OPENROUTER_API_KEY set. A separate OpenAI account is no longer required.
You can still use a separate OpenAI account if desired by setting the OPENAI_API_KEY environment variable in addition to OPENROUTER_API_KEY. This will cause OpenAI models to make direct calls to OpenAI, which is slightly faster and cheaper.

Google's Gemini 2.5 Pro Preview is now available as a built-in model, and is the new default model when context is between 200k and 1M tokens.
A new gemini-preview model pack has been added, which uses Gemini 2.5 Pro Preview for planning and coding, and default models for other roles. You can use this pack by running the REPL with the --gemini-preview flag (plandex --gemini-preview), or with \set-model gemini-preview from inside the REPL. Because this model is still in preview, a fallback to Gemini 1.5 Pro is used on failure.
Google's Gemini Flash 2.5 Preview is also now available as a built-in model. While it's not currently used by default in any built-in model packs, you can use with \set-model or a custom model pack.

OpenAI's o4-mini is now available as a built-in model with high, medium, and low reasoning effort levels. o3-mini has been replaced by the corresponding o4-mini models across all model packs, with a fallback to o3-mini on failure. This improves Plandex's file edit reliability and performance with no increase in costs. o4-mini-medium is also the new default planning model for the cheap model pack.
OpenAI's o3 is now available as a built-in model with high, medium, and low reasoning effort levels. Note that if you're using Plandex in BYO key mode, OpenAI requires an organization verification step before you can use o3.
o3-high is the new default planning model for the strong model pack, replacing o1. Due to the verification requirements for o3, the strong pack falls back to o4-mini-high for planning if o3 is not available.
OpenAI's gpt-4.1, gpt-4.1-mini, and gpt-4.1-nano have been added as built-in models, replacing gpt-4o and gpt-4o-mini in all model packs that used them previously.
gpt-4.1 is now used as a large context fallback for the default coder role, effectively increasing the context limit for the implementation phase from 200k to 1M tokens.
gpt-4.1 is also the new coder model in the cheap model pack, and is also the new main planning and coding model in the openai model pack.

In order to better incorporate newly released models and preview models that may have initial reliability or capacity issues, a more robust fallback and retry system has been implemented. This will allow for faster introduction of new models in the future while still maintaining a high level of reliability.
Fallbacks for 'context length exceeded' errors have also been improved, so that these errors will now trigger an automatic fallback to a model with a larger context limit if one is defined in the model pack. This will fix issues like #232 where the stream errors with a 400 or 413 error when context is exceeded instead of falling back correctly.

Gemini models now support prompt caching, significantly reducing costs and latency during planning, implementation, and builds when using Gemini models.

When using Claude 3.7 Sonnet thinking model in the reasoning AND strong model packs, reasoning is no longer included by default. This clears up some issues that were caused by output with specific formatting that Plandex takes action on being duplicated between the reasoning and the main output. It also feels a bit more relaxed to keep the reasoning behind-the-scenes, even though there can be a longer wait for the initial output.

Additional handling of possibly incorrect or mistyped commands in the REPL. Now apart from suggesting commands only based on possibly mistyped backslash commands, any likely command with or without the backslash will suggest possible commands rather than sending the prompt straight to the AI model, which can waste tokens due to minor typos or a missing backslash.

If you started a free trial of Plandex Cloud with BYO Key mode, you can now switch to a trial of Integrated Models mode if desired from your billing dashboard (use \billing from the REPL to open the dashboard).
When doing a trial in Integrated Models mode, you will now be warned when your trial credits balance goes below $1.00.
In Integrated Models mode, the required number of credits to send a prompt is now much lower, so you can use more credits before getting an 'Insufficient credits' message.

Fix for 'Plan replacement failed' error during file edits on Windows that was caused by mismatched line endings.
Fix for 'tool calls not supported' error for custom models that use the XML output format (#238).
Fix for errors in some roles with Anthropic models when only a single system message was sent (#208).
Fix for potential back-pressure issue with large/concurrent project map operations.
Plandex Cloud: fix for JSON parsing error on payment form when the card is declined. It will now show the proper error message.