I’ve been working with Kilo CLI for the last day. So, this is first impressions. This is coming from working with Codex CLI for a few months. The following post comes from this experience. In this case I’m using GPT-5.5 medium in both Kilo and Codex.
The Kilo AI CLI has some cool features. I really appreciate the ability to connect with your existing provider. I like the auto-model routing. The interface is pretty cool, with a lot of themes to choose from. There is a persistent status section in the CLI that shows usage, context available, tokens used, modified files. It’s really cool.
However, I’ve consistently run into issues since attempting to use Kilo. And it’s not with the interface or the functionality of the software. It’s with the underlying tooling. Something about Kilo causes the LLM to do things that I don’t want it to do. Here’s an excerpt from my AGENTS.md file.
When the user pastes review feedback, follow the Review Feedback section below. Treat the feedback as a claim to verify, explain the conclusion, recommend the smallest correct fix, and ask before implementing unless the user explicitly asks for changes.
...
## Review Feedback
When addressing merge request or code review feedback, treat each comment as a claim to verify, not as an instruction to apply automatically.
- First confirm whether the reported problem is real by reading the relevant code paths and, when practical, reproducing or reasoning through the behavior.
- Validate suggested solutions before implementing them; they may be incomplete, too broad, or aimed at the wrong boundary.
- If the issue is valid, choose the smallest correct fix that changes the fewest lines and preserves existing intended behavior.
- If the issue is not valid, explain why with concrete code references or behavior.
- If a minimal fix could hide an inconsistent state, distinguish the legitimate case from the suspicious case and handle each explicitly.
That’s pretty clear about what I expect. My expectation is no changes unless it’s verified with me. Or unless I tell the agent to explicitly fix it. Over the last day I’ve given feedback to the agent and it continues to ignore my instructions, opting instead to make a change without verifying it with me. That’s frustrating.
Kilo Modes
Kilo comes with several modes out of the box. There’s a Code mode, a Plan mode, an Ask mode, a Debug mode, and an Orchestrator mode. Now, this frustration could be my fault for using the tool wrong. I started in “Code mode”. Perhaps, code mode, means that Kilo is going to strongly tell the LLM to write code, even if it contradicts my instructions. It could be my fault for not starting in ASK mode and then flipping to CODE mode. But… seriously. I’m not going to sit here switching modes all day when I expect the tool to be smart enough to follow instructions. If the tool is in code mode, and the instructions are “ask before coding” then the tool should ask. The mode should be inferred from the prompt. If the prompt is a question, then assume an answer is wanted. If the prompt is, fix this issue, then assume that a combination of debug+code is wanted. If the prompt is, implement this feature, then assume orchestrator+code is likely wanted.
Maybe that’s just me. But switching between modes seems like busy work for me.
Kilo Tooling
I have RTK set up on my machine and integrated into my AI agents. The tool exists as rtk and is callable by the AI agents. However, the code I was writing was in php and I don’t have php installed on my system. I do have it running in a container running on my system though. I have explicit instructions in my AGENTS.md file telling the Agent to look at local machine notes. My local machine notes tell the Agent all about how this current system uses Fedora and Podman for containers.
Today, Kilo failed hard on a few things:
– It did not use rtk. It attempted to use rtk php and when that failed it assumed rtk did not exist and didn’t use it anymore.
– It did not read my local notes. And because it did not read my local notes (despite being told to in the AGENTS file) it failed to validate my code using php
– It tried to use docker multiple times even though the local notes tell it to use podman
Working Verdict
I really like the concept of Kilo. Super cool interface, super cool themes, super cool status window, an auto-model selector. Kilo has a lot of good stuff going for it.
I’ve worked with Kilo and attempted to use it. It’s been an incredibly frustrating experience. Kilo is not reliable and it doesn’t listen.
Despite all the cool, and the hype, when the rubber meets the road Kilo is not up to snuff. Kilo is not up to snuff because Kilo ignores explicit directions.
My plan is to use Pi.dev. I’ve heard that it will work with the Kilo gateway. I probably won’t come back to Kilo, but who knows, they might update the tool to be more reliable. If they do, I’ll try it again.











