Software Development in the Age of the LLM

Over the last three months, I’ve radically altered my workflows and processes when writing code. I've been thinking deeply and critically about my own developer tooling, about what to build and when, and about how I build it. I'm biased by 10 years of building software by hand, but it's undeniable that the software world is undergoing a massive change. I’m fully AI-pilled. Every time someone asks me to build something, I say “Start the stopwatch,” and between 10 and 90 minutes later I come back and say “Stop the stopwatch.” I can knock out tasks that took weeks in a matter of minutes.

Vibes and Stakes

I’m currently working with a CTO from a past job, and he made an analogy that works really well here about the software development process in the age of LLMs: the average commuter only needs a honda civic, but to win races you still need custom cars and years of learned skills – and there’s always a market for mechanics. The barrier to entry has gotten so low that the regression to the mean has all code moving towards honda civics, which is great for every industry expert from worlds that software has underserved. It’s true democratization of the field, much like iphone cameras and photography, but it also creates a need for engineering outside of the mean: the F1 crews of the product world.

Engineers in this world primarily think at high levels in systems, patterns, and scale. But every bolt matters, and all of this high level thought needs to be backed by well-considered low level details. These engineers need to understand what building extensively looks like from the start, with solid architecture, and an eye for bugs and security knowing that one day the product will probably need an ISO27001 or SOC2 audit (or, if you are really unlucky, FedRAMP+, NIST overlays, and ATO). When the cost to build software goes to zero, this is where the experience and vision of senior engineers ensure that products work, and work well. Maintenance cost is non-zero, and customer trust is high stakes, even if building the software is low stakes. Engineering is no longer wrenching, but it’s defining torque specs on impact drivers: we don’t need to know what APIs are or what the schema looks like, we need to know how systems break and what scalable architecture looks like. Engineering needs craft.

Post-Code

Between 2017 and 2020, I built “AI” experiences for Google. This included things like smart mirrors powered by custom models that could recognize specific models of sunglass frames and pull costs and reviews, fleet tracking RC cars, and inventory optimization for grocery stores. One thing I quickly realized was that the models themselves were mostly irrelevant, but the underlying data was the real secret sauce. I’m not sure if this is still the case, but at the time certain datasets were export controlled. A dataset built by an academic lab in China was impossible to use in America, and American research datasets were strictly restricted. These were ITAR restrictions, the same rules that govern arms deals.

Today software is not a moat. It’s far from dead, but it’s not a moat. When I can spend 90 minutes and build out a complete tool that uses CV to classify objects, measure them, and recommend U-Line boxes for shipping -- I know that it’s only a matter of time until every existing tool is completely rebuilt in-house. The only things that will keep someone from eating your lunch are proprietary data, expert processes, and specialized context.

Software is cheap, ideas are everywhere, but what do you know better than anyone else?

Systems and Gates

In the software world we call tools that are counterintuitive, prone to user error, and generally dangerous “footguns.” Footguns make it easy to do the wrong thing, may actively prevent you from doing the right thing, and will shoot you in the foot without realizing it. It takes foresight to avoid footguns, and we think in “ramps” and “rails” (flows that force users into doing the right things), “safe defaults,” “fail closed systems” and other codified mechanisms that force engineers to unconsciously work safely and systems to behave safely. Many of these processes come from industrial settings (“every regulation is written in blood”) and NTSB investigations, and have helped create a discipline that despite its lack of a regulated credentialing processes has still managed to be, generally, safe.

In the LLM-powered world, everything is a footgun. Safety and governance are more important than ever, and we need to build systems that incorporate it foundationally. This means we start with “rails”: our design patterns and spec-driven development. We thoroughly test and monitor, against every change, and we avoid deferring our deployment gates to agents. Engineers are anchors, and without them agents will drift into cascading failure.

Death of the Dashboard

I got a master’s degree in something akin to HCI. At the time, ubicomp and wearables were all the rage, raspberry pis were brand new, and the world thought the shrinkage of compute would bring all technology into the physical space. We talked about animism – how technology could have a personality and fit into everyday space – and about tangible computing. My classmates went on to design Voice UI for all of our home assistants. With LLMs, and even moreso with agents, we are finally seeing technology leave our monitors. Background agents can send updates to where we are, and we no longer need our IDE open or our browser logged in to a 3p platform to accomplish our work. The interface has always been a window into the system, and we no longer need that window.

One thing I’ve found enlightening in my time at Google are the ideas of “surfaces” and “modality.” The systems themselves are substructures, and the mode of interaction can be anything: voice, text, image, video, screen, slack workflow, OS notification, on and on to infinity. This is headless engineering, and the surface of the future is where ever you are, not just the webpage.

Practical Magick

Planning Mode first: Plan everything. Planning should take way longer than implementation. Every prompt is a TDD, every plan is an LLD, and every revision is a chance to proactively catch a bug. More than half of the time coding should be spent planning. The best balance here is 60% planning, 30% testing and validating, 10% implementation.
Prompt for your grandma (or, asking questions the Amazon way): Every prompt needs a clear problem, a clear goal, the success criteria, and anything you’ve tried so far. If you can’t explain it in natural language, the LLM probably can’t infer it correctly.
Human in the Loop: Approve all changes, review all code, and verify deployments. We are the experts, and we are ultimately liable for the code our LLMs produce. It’s on us to spot the edge cases, fix the bugs, and maintain these systems.
Build your skills and personas: LLM skills are one of the highest leverage investments if building with AI. Spend the day it takes to create a baseline of skills covering all of the engineering hats you usually wear: one for infra, one for API design, one for system design, one for database design, one for each language, one for testing patterns. Personas like CTOs, Product Managers, and others with strong system prompts also help for sanity testing and ideating.
Iterate: Tight feedback loops are key. Constantly iterate through processes and changes, with a critical eye for rough edges.