When it comes to AI-based tools and agents for software engineering, there exists both an unfortunate amount of pessimism and a nauseating amount of hype as to what AI can and cannot do. As both an engineer who uses these tools on a daily basis and who has been building on GenAI since GPT2, I wanted to write up an opinionated perspective in the defense of using more AI for engineering.
Over the last few months, while many engineers have dramatically ramped up their usage of AI-based developer tools, it’s understandable why many software engineers (SWEs) are still pessimistic. They might dislike the idea because:
Large Language Models (LLMs) for code constantly and consistently hallucinate, adding slop code to clean up and creating frustration (not only for them but for their peers and code reviewers)
Writing code is only a small part of the SWE role (at least for higher SWE levels) so these coding tools don’t actually help the “important” things
Their enterprise codebases are too complex and often heavily underdocumented — sure AI can work well for toy projects but not in large-scale repositories
They might worry AI developer tools, if more widely adopted, would mean they would lose their job
Perhaps surprisingly, I think these are all somewhat true. However, counterintuitively, I also believe that any person/organization that uses these as the reason they don’t adopt AI in their developer workflows is doing it wrong.
Just like ChatGPT (no matter how smart, i.e. GPT10) can’t predict lottery numbers, today’s models won’t universally provide value without working around their innate limitations, strengths, and context. Rather than thinking of these tools as an in-place replacement for your role, it’s best to see them as a new kind of framework for how you solve problems and build applications.
In this post, I’ll define this new kind of framework and what this potentially means for the modern SWE role and SaaS companies.
Today’s Models
Models have come a long way from just an intelligent auto complete with modern tools able to understand large parts of codebases and write complete sections of code with minimal guidance. Under the hood, the tools almost always directly leverage a variety of large language models optimized for speed (quickly and cheaply predicting what you plan to type next) or intelligence (long form refactors, code Q&A, etc; now typically frontier models like OpenAI’s o1 or Anthropic’s Sonnet 3.5).
There are a ton of up-and-coming products in the space (see Gartner’s Magic Quadrant) but my personal use breaks down into:
Cursor - a VS Code based IDE with several LLM integrations (tab complete, chat-with-code, and a more powerful code “composer” for implementing several files)
This ~15% of code I’m writing, often starting with a high-level plan for a fixed set of files and letting it edit away. Making small fixes/follow-ups and then putting the pull request.
e.g. “Fix files B, C, D to all follow the same API as file A”, “Add do_xyz() here using the foo.bar package, insert unit tests into /path/to/tests”
Typically the only times it’s not doing the heavy typing is if I feel the change is faster to type than the prompt or I happen to be experimenting with something without much of a higher level plan in mind.
O1 via ChatGPT-like-UI - OpenAI’s latest model used via the API (chat-like Q&A)
This is for everything non-code:
Writing/editing design plans, presentations, documentation
Learning about a new technology, framework, API, etc.
Summarizing work into status updates
Summarizing feature requests into a roadmap/themes
Finding information (via custom RAG-based plugins)
I avoid using ChatGPT itself just because I prefer limitless usage along with usage-based pricing.
I’ve found o1 notably better at following a long list of instructions and more ambiguous multi-step questions but otherwise GPT-4o and Sonnet 3.5 work great for these as well.
My expectation is that as the foundational LLMs, which these are tied to, significantly improve over the coming years (as we throw much more compute at it), so will these AI developer tools. Expanding their understanding to entire enterprise codebases (certain products may say they can do this now — they do not) and executing increasingly ambiguous and complex implementations.
The AI Engineering Framework
Motivation
While AGI may take years to decades, just engineers using AI as just a developer tool have the potential to dramatically increase both their speed and effective level. With this speed and quality increase, there will be natural selection of (SaaS) companies that come out ahead due to their ability to ship features faster, with higher reliability, and lower organizational overhead. Early adopters may see the most advantage (having a workforce and set of processes tuned to be “AI-augmented”) while the rest of the industry catches up as this type of development becomes more mainstream and the tools mature.
In this AI-engineer augmented future, a (SaaS) company’s coding talent, team, and velocity will potentially be less of a defining factor as:
Its ability to design high-value products that attract and build attachment with customers
Its ability to architect its R&D and customer value in a way that can scale with the capabilities and cost of AI models
Meeting AI in the Middle
Despite today’s models reaching “PhD” level, they struggle fundamentally in larger codebases when it comes to assumptions (“why do things need to be set up a certain way”) and context (“what other things exist that I need to keep in mind”). The framework, in the immediate term, is to remove the need for assumptions and context from your codebase. If it helps, think of this as your strategy to take advantage of a group of 10,000 interns who work for free and will have less than 1 week to understand and develop high-quality features. Some of these suggestions may seem like good engineering practices generally, however the ROI to implement them is tremendously higher in the context of using AI tools.
Reduce Assumptions
Lean towards generic and less (as much as possible) business-specific abstractions. Ask an LLM to architect your feature and move towards the “default” path.
Emphasize building dummy-proof verification interfaces for every type of change in the codebase. If the AI makes the wrong assumption about an API, it should be trivial to verify what’s wrong, why, and feed that back to the AI developer tool.
Prefer the “standard” way of doing a given thing. Whether that means picking a well-known language, framework, or even cloud service.
Avoid runtime ambiguity as much as possible in the code itself using typed languages, well-defined structs, example data, and verbose naming schemes.
Reduce Context
Strive for every developer workflow requiring a single page (or even zero) documentation to complete with hyper-intuitive underlying abstractions. An example of this is having an internal infra provisioning system with the same level of complexity (i.e. very low) as serverless platforms like netlify and modal.
Reduce the need for cross-referencing documentation or code across different sources and formats. If a given feature requires an obvious change to file A, then the best place for content related to that type of change is in file A itself.
Make the codebase as modular as possible with implementations requiring context only into the immediately related files or package. Also leaning towards microservices as ways to organize functionality, performance, and data boundaries.
Use verification processes (e.g. tests) that can be easily understood over a text (or basic visual) interface. This might look like consolidating CI/CD outputs, profiling, and UI/UX renders into a single piece of feedback for the AI tool.
Role of Engineers
Now that the codebase is set up for AI, what do the engineers do? Besides being the ones building in these reduced assumptions and context, it’s reasonable to expect the SWE role to evolve with these tools.
They’ll:
Act more like architects than coders, focusing even more of their time on designing (rather than implementing) the interfaces and abstractions to build their product. Often defining the codebase declaratively and in natural language.
Be organized in flatter organizational hierarchies with fewer engineers assigned to a given problem area. I’m not sure if you could get it down to a single-person billion-dollar company but minimally an organization could scale far more sub-linearly relative to the complexity of the application it’s building.
Be rewarded for having high-level knowledge over several engineering domains as opposed to depth in a specific one. A super-infra-security-ml-full-stack engineer who knows enough to design an application but not the nuances of particular domain-specific code patterns, cloud services, or 3rd party libraries.
Care less about code reviews, pull requests, and tech-debt. Software verification is still critical but it’s more likely to occur at a higher level of change than the existing patch → PR → review → merge cycle.
An interesting side-effect of all of this is that, as far as I can tell, the current entry-level SWE role (often focusing on well-defined software tasks) could be deprecated. My expectation is that a demand for more experienced SWE architect roles will still exist and over time colleges (and bootcamps) will adapt their curriculum to train for consistent AI-gaps in the role. For new engineers, this might look like a shift to product and AI tool/management curricula as opposed to learning to write code.
Resolving Objections
Going back to the objections stated at the beginning:
AIs hallucinate → It’s because your codebase requires too many assumptions.
AIs can only help with concrete coding tasks → The framework can apply to not just code but more generally to your processes around designing, tracking, roadmapping, etc.
AIs can’t help with enterprise codebases → Your codebase requires too many assumptions and each change requires too much context.
AIs will replace human engineers → In the near to medium term, human engineers will be a critical partner for AI tools writing code.
Upcoming Challenges
I see three common challenges with a shift to engineering that’s AI-augmented.
How do you transition your codebase into one where AI’s effective?
It’s non-trivial to go from a codebase where a complex change may require the expertise and tribal knowledge of a 3 year company veteran to one where that change could be completed with little documentation by a team of intern-level AIs. The transition might start with certain specific workflows being migrated until the entire stack can be used with AI effectively.
How do you maintain systems that were not fully written by human engineers?
You’d hope that the AI that wrote the alert you are currently being paged for also wrote up a runbook. While reducing the context on the AI, the inverse is true as well where the SREs/on-calls should also need minimal context to respond and recover their online systems.
How do you trust code not written by human engineers and produced by 3rd party models?
I regularly look through the source code for those “I built ABC app in N days, with only K experience, using AI tool XYZ” posts and I can confirm that they are rampant with serious security vulnerabilities. I expect this needs to be resolved with a mix of AI using secure-by-default building blocks and coding tool providers establishing security certifications. Humans write insecure code too, but this shouldn’t be a trade-off made for the adoption of AI developer tools.
Conclusion
AI tools are rapidly transforming software engineering, and embracing them is essential for staying ahead. By adapting our codebases to reduce assumptions and context, we enable AI to be more effective collaborators. This shift allows engineers to focus on higher-level design and architecture, redefining our roles in the process.
I felt it’d be more interesting to write on this topic with a fairly opinionated stance on what will work and what will happen over the next few years. As a result, there are a decent number of predictions that will probably be wrong, but I still look forward to seeing firsthand how the industry evolves as these models become more intelligent. It’s also likely much of this applies to other industries where AI tools for X (e.g., cybersecurity, sales, and many more) cause significant shifts in how today’s roles operate.
mynamajeff