You can try this out at prompt-stack.sshh.io (GitHub).
Last weekend, I built v0 from scratch with more than 50% of the code written by AI.
Recently, I’ve been exploring some of the latest tools for building quick no-code MVPs with AI. Historically, I’ve been a bit pessimistic about no-code site builders because, as a software engineer, they don’t solve my core problem with the 'code' approaches. I often still have to spend time learning (and getting stuck) with how to use them to achieve my goals. However, with these AI-based tools, rather than spending time setting up boilerplate, tweaking styles, or learning tool nuances, the pitch is that you can just type 'build me an app to do xyz,' and AI will build it for you.
After using and evaluating v0 by Vercel, Bolt.new, Replit AI (both for work and personal use), I came up with a wishlist for what these tools could do:
Support completely full-stack arbitrary web apps — these tools mainly focus on the frontend, but why not go all the way (e.g., build the Flask routes as well)?
Support good parallel collaboration — Several people should be able to iterate on different parts of the app in parallel and iterate on each other’s changes.
Be open source — Another big ask for a SaaS product, but being able to fork and/or self-host these tools would be awesome for Bring-Your-Own-Stack or BYO-infrastructure use cases.
Charge purely based on usage — I know this is a lot to ask of any SaaS product but for personal use I’m fairly sensitive to paying for something during a month I didn’t end up using it.
In this article, I’ll share how I built my own AI no-code site builder and how I used AI to do it.
Prompt Stack
With these issues in mind and my recent pitches on just how fast AI engineering tools can make you, I decided to try to build my take on these AI app builders and do it all within one weekend. I wanted to use AI tools to build an AI tool that builds other tools (that are potentially AI as well) — very meta.
The night before, I settled on the scope:
It should be good enough that I, for personal use, would prefer it over subscription alternatives.
It should have a clean UI and be functional on mobile.
It should, for the most part, solve all of the issues listed above
Out of scope: Implementing the adapters for other stacks (it should just be in a state where it’s trivial to add others), user/team/admin-management tools, having a low question-to-app latency, and too much prompt-optimization.
I then drafted the tech design below and coded it up the next two days. As of writing this article I spent ~2 more eng days (4 days total) but the core functionality for a basic end-to-end demo was complete within 48 hours of writing up the design. There’s not an easy way to pull the actual number but I think it’s fair to say at least 50+% of the code was written by AI through chat-based prompting rather than through traditional typing or auto-complete.
Success & Pain of Today’s AI IDEs
(All of the comments I have for AI IDEs in this section were specifically in reference to Cursor+Sonnet-3.5 but I’d say they generalize to the state of most tools in this domain.)
One side goal for this project was to pressure test how useful, fast, and big a project I could take on with an AI IDE — especially in the context of building an app from the ground up. Based on how it went, I’ve divided by experience into 3 phases: The Setup, The Fix Up, and Feature Flow.
The Setup
I started by plugging in my tech design doc into Cursor Composer and just let it rip with follow ups here and there to steer it towards what I wanted. Within a few minutes it had written hundreds of lines of code, mostly what I would consider project personalized boilerplate, for both the frontend and backends.
This probably reduced what would have been ~3 hours of work into 30 minutes and helped init a lot of standard design patterns that I probably would have skipped over and needed to refactor into later (i.e. It set up the 'right' way to organize Next.js and fastapi projects — both of which I’ve not really used before!).
The Fix Up
When I started to take a look at some of the feature code and actually run the code, it was clear that it made quite a few mistakes:
It picked stale and incompatible packages in both my
package.json
andrequirements.txt
It re-implemented the same code multiple times, especially on the frontend; it created several API wrappers and duplicates — all slightly different but which should have been the same components.
It missed quite a few edge cases in both the UI and backend endpoints, that I think I would have gotten if I had done it. What made this especially painful was because of just how much code was written by the AI, it was much harder than expected to debug these cases.
I estimate that about 1-2 hours were spent cleaning up bad code and fixing issues to make the app runnable. Doing the math, that’s still a net-positive improvement over doing things by hand based on time saved during The Setup but it was a bit grueling just how much had to be fixed.
Some thoughts:
Using a global instructions file (in this case
.cursorrules
) was key to maintaining the organization and context of the codebase for complex multi-file full-stack changes.AI IDEs might need to own keeping packages and models up-to-date (e.g. by appending the latest relevant docs/versions into system prompts).
AI-generated foundations save hours initially but can create debugging headaches. This is obvious to many critics of AI developer tools but this was the first time where I genuinely hit something like this (and in fairness only after 1k+ lines of AI-generated code).
Feature Flow
Once the codebase was cleaned up and I regained context into what was going on, I entered a flow state of progressively iterating on specific features. This ended up being several loops of:
Identify something that should be improved.
Ask Cursor to implement it
Minor clean up (often with just follow up prompts)
Test
Some fairly complex features were basically just a few short prompts (< 30 m with AI, would have been a few hours without):
Making everything look nice on mobile
Adding image/sketch/screenshot uploads and prompting
Adding anthropic as a model provider
Adding user+team settings page, tables, and endpoints
This is where it really shined with entire core features taking just a few minutes to implement (and implement correctly!).
Prompt Stack’s Stack
For those interested in the technical “how” this was built, here’s the stack:
Backend
The app and postgres backend are hosted on Railway (zero config app hosting). The project websites themselves are hosted in ephemeral Modal sandboxes which each have an SSL tunnel to allow external connections.
The server is Python FastAPI (chosen just because I’ve never used it before) and includes a websocket server for managing live project chats.
Since the modal sandboxes took a bit of time to spin up and only supports a timeout field for termination, additional cron tasks:
Preallocate sandbox volumes to reduce startup times
Terminate sandboxes after project inactivity
Frontend
I chose NextJS/Tailwind/Shadcn because it looked nice, and I hadn’t had much experience with NextJS before.
Chat content is rendered with react-markdown
with custom plugins to support the thinking and output file syntaxes generated by the AI agent.
AI-end
I split LLM usage into “cheap” and “smart” use cases.
Cheap: project naming, chat naming, and follow up questions are just basic prompts using
gpt-4o-mini.
Smart: Originally I went with OpenAI
gpt-4o
but swapping tosonnet-3.5
was a night and day difference. It’s actually so much better for full-file coding that I would go as far as saying that this use case just didn’t work with OpenAI models.
As for agent features:
Planning was done by streaming the results of a planning prompt with markdown headings that were parsed out on the frontend.
Code generation was done with just special text code blocks. The agent can omit parts of the code, so for each file, the cheap model uses the agent’s partial code blocks to patch and regenerate the entire file.
Common coding errors (e.g. using the wrong component versions) were resolved with dynamically injected prompts that trigger for common problematic patterns
Conclusions
While there’s a lot of hype around the “acceleration” we’ll see over the next few years with bigger and better AI models, I like to think this type of project is kind of what it looks like. AI makes it exponentially faster to build other AI tools that make other workflows magnitudes faster. I’d be surprised if the tools Anthropic/OpenAI use to evaluate and build new models aren’t also being built at least partially by the previous versions of those same models. For the next decade, we will see a huge feedback loop of AI making AI better — and shaping how we solve everyday engineering problems faster.