The LLM as Teammate: Rethinking Software Development

In this era, Artificial Intelligence engines now help people build software. I am not talking about embedding AI capabilities into existing applications — that is another topic altogether — but about using AI to write software itself, removing the programming language barrier for the person building it. In the past, if we needed to build software with a particular programming language, we needed to be proficient in it, or we needed to know someone who was. With AI, I could be less proficient in a platform yet still build something meaningful with it. As long as the LLM has proficiency in that programming language, all is well. And with the LLM joining as part of the "programming team," the limitation of working hours almost vanishes. If I still have sufficient LLM token quota, even starting some coding work at 11 PM is possible — no need to wake a programmer for a task. In some respects that is a win, but in others we get less work-life balance, and the livelihood of the junior programmer is more at risk.

The Culprit for Low Quality Software

In the past, we had programmers with various levels of skill. The quality of the software product would be determined during the testing process, which we usually performed at the end of each iteration or development milestone. Even if a programmer's work was not quite on par with others, we would usually find out during the integration testing phase — after unit testing performed by each programmer — and sometimes in the SIT (system integration testing) phase or, worse, the UAT (user acceptance testing) phase. As a software engineer I would tell myself that if there is an issue in the end product, we are doing something wrong in either the testing phase or the design phase — incomplete requirements or inconsistent design — and try not to blame the person doing the programming itself.

In this GPT and Claude era, we have various LLMs at various price points. We also have open-weight models like Qwen, where the only cost is that of the base hardware and electricity — which we would assume is less than what we would pay OpenAI or Anthropic. In several programming tasks, I found that deploying open-weight models has limitations such as a much smaller token context window and intermittent response timeouts — these I would call physical limitations. On the other hand, the more I tried open-weight variations, I found that the process has more dependency on the specification provided by the user. If the user has incomplete domain knowledge or technical knowledge, they will be burned more often by incomplete coding, or the LLM will just go into an endless loop of code, test, recode, retest. Increasingly, I find that with the pricier LLMs we could omit parts of the prompt or instruction and the LLM will figure it out on its own — it has enough inner knowledge to complete the missing parts. Using a less capable LLM carries the risk that the work will not be finished in the required timeframe (say, 30 minutes for a simple task). Give the same prompt to a more capable LLM, and it finishes the same task within 10 minutes. This runs counter to the mindset that software issues are more likely rooted in incomplete requirements or inconsistent design. For this matter, I can only point to the incapable LLM. More cost will often get you more quality and more speed. The culprit for low quality software can simply be using the wrong LLM for the job.

Roles And Specialization

In my line of work, we usually have specialization: one person is more capable in Java backend programming, another is a business analyst who only deals with requirements and analysis. Or one person is stronger in Python while another is stronger in JavaScript. Using LLMs, specialization becomes more about the role we assign to a particular LLM agent. The same LLM can be given different roles, but their underlying capabilities remain the same — they simply focus on one aspect of development versus another. You can also ask the same LLM to handle several roles at once, just by asking. It is simply a matter of which hat you put on the LLM. For software development teams, specialization is often divided by platform (Python vs. .NET) or by the separation of frontend and backend teams. This is where the LLM resembles a chameleon: its color depends entirely on the surface it is asked to work on.

In theory, we could fine-tune an LLM to give it deeper knowledge in one area of software development. But in current conditions, the cost and effort are not worth it — the more specialized the LLM, the smaller the market it can serve, and individual companies may not have the cost, hardware, or time to prepare a fine-tuned model. A more practical current solution is the Skills feature, where a master skill list enables the LLM to load specific instructions for a given domain when needed. This works well for surface-level specialization. For truly deep expertise, however, a skills file alone is unlikely to substitute for a purpose-trained model — that gap remains an open challenge.

Documentation

Software documentation more often than not becomes stale during the development process. In human teams with stale documentation, knowledge is distilled in the heads of individual team members, and effective communication between them is what allows deep, system-specific issues to be resolved. Using LLMs, missing documentation does not become a blocker in the same way — an LLM can read source code and analyze each module's mechanism directly. The real blocker becomes sprawling source code where variable names or method names do not match their actual behavior, forcing the LLM to read every single line to understand what is happening. The context window limit becomes very visible in these cases. A better approach is to have the LLM summarize each module or class into an index file and refer to it for future tasks. For this reason, short and up-to-date documentation plays a more pragmatic role in LLM-assisted software development than it ever did in purely human teams.

Where Does This Leave Us?

The arrival of capable LLMs in the software development workflow is neither a simple upgrade nor a straightforward threat — it is a shift in the texture of the work itself. The language barrier is lower, the hours are more flexible, and a single developer can now wear many hats by delegating to an AI agent. But the trade-offs are real: the choice of LLM matters enormously and has direct cost implications, junior roles face genuine disruption, and the old defenses against poor quality — rigorous testing phases, strong team communication — are being replaced by new ones, namely the quality of your prompts, the clarity of your requirements, and the freshness of your documentation.

If anything, working with LLMs has made the fundamentals of good software engineering more visible, not less. Clear requirements, consistent design, and readable code still determine the quality of the end product. The LLM just makes it harder to hide when those fundamentals are missing.

This article used AI-assisted polish, 88% human and 12% AI

Inventor's Paradox