About Tests and AI

26.08.2025

This past week I programmed significantly more than usual. At work there's a small side project - a console Kotlin application for exporting the design system from Figma, running on CI, which I support solo from time to time. That's where I was programming. And since I don't return to it very often - I forget what clever logic for traversing nodes is written there, not to mention there have always been and will be some workarounds that deal with the imperfections of the designers' Figma.

But actually, that's not what this is about. My point is that once every few months, some breaking changes accumulate from designers or something new gets added. You need to go in there, refine something, and it's scary because breaking things is easy as pie. Especially considering that lately I've been vibe-coding everything there. This is the point where I realize I need tests. Honestly, they're not even as necessary in the main project, where there are always dozens of people who are somehow involved in it and can check things. Whether this is good or bad, by the way, is a question. But it's definitely addictive.

So I decided to combine business with pleasure, so I armed myself with an agent and went to cover it with tests. Not just as an end in itself, but also to experience and reflect on AI testing, to then transfer this experience to the big project. And here's what I realized. The thoughts are fairly predictable, but in terms of writing tests I'm completely self-taught, many things take me a long time to understand.

Like with any other code, the agent writes poorly if you don't outline boundaries for it. I've talked about this many times, documentation "how to write tests" is absolutely a must-have. Framework, approaches, examples, style, what to pay attention to, what not to.

If you approach this task naively and just ask it to write tests for some existing class/function, you'll get mocks on top of mocks driven by mocks. And in some cases, testing code turns into a meaningless test of mocks, not to mention that mocks themselves are quite fragile and unpleasantly brittle. IMO, if there's an option to write a fake implementation, that's almost always cleaner. I'm not denying that mocks are sometimes needed, but they also need to be used properly, not slapped everywhere as initially desired.

The funniest thing, by the way, is that it's exactly the same with people. If you ask an average developer to cover their code with tests without teaching them good practices, you'll get the same nonsense.

So most of the work here consisted of me refactoring the source code itself during the process, so that good tests could be written for it. Inverting dependencies, abstracting code that's insignificant for tests behind interfaces, all that stuff. Naturally, I didn't do this myself either, but with my feet on the table and hands behind my head, I dictated a stream of thoughts to the agent through superwhisper. I edited the code from time to time, of course, but overall it works.

And after refactoring the code, the tests themselves landed on the first try in such quality that I didn't write a single line by hand the entire time. For several thousand lines of "production" code, there ended up being even more in tests. Coverage in key places skyrocketed and I had a blast adding a couple new features with Claude.

Conclusions? You still need to think. You or someone in written documentation should have a vision of excellence that will be transmitted to the neural networks. If you have a vision of excellence, it already works great. But training yourself to have a vision of excellence is quite a task.