It started with an article.
Denislav Gavrilov published a piece called “Clopus 02: A 24-Hour Claude Code Run”. His experiment in letting Claude Code run autonomously made me realize I could do the same.
Cost had always held me back from building an AI-run company. API usage is unpredictable. You set something loose, it hallucinates into a loop, and suddenly you’re staring at a $400 bill for a weekend experiment. But Claude Code runs on a subscription. Fixed cost. I already knew what it could do—research, documentation, planning, orchestration, creativity. The subscription model meant I could finally experiment without the meter running.
So I went exploring.
The Loop
The core mechanism is embarrassingly simple. A bash script watches an Obsidian file for new tasks:
while true; do
if grep -q "## PENDING:" "$GOALS" 2>/dev/null; then
claude -p "$PROMPT" --dangerously-skip-permissions
sleep 5
else
sleep 30
fi
done When the script sees a task marked ## PENDING:, it spawns Claude Code with a prompt explaining its role. Claude reads the task, does the work, updates a status file, and marks the task as done.
The --dangerously-skip-permissions flag is what makes it autonomous. It bypasses all of Claude Code’s permission prompts—no confirmation dialogs, no human approval needed. Claude can read, write, execute, and delete without asking.
I ran a few experiments—single agents on single projects. It worked. Then I tried queuing tasks before sleep and waking up to them done. That worked too.
But it was still one agent, one project, one task at a time. What if I built a team?
The Agency: A Multi-Agent Company
I asked Claude to help me architect a complete software company with dedicated agents to fulfill different company roles. It came up with this:
| Agent | Role |
|---|---|
| Dispatcher | Triages requests, breaks them into tasks, routes to specialists |
| Architect | Designs systems, makes technical decisions, writes specifications |
| Developer | Implements features, writes code |
| QA | Tests functionality, finds bugs, runs security checks |
| Reviewer | Final quality gate, approves work for shipping |
Each agent has its own personality file, its own goals queue, its own status updates. They communicate through markdown files in a handoffs/ directory. The Architect writes specs and hands them to the Developer. The Developer builds and hands to QA. QA tests and hands to the Reviewer. The Reviewer approves or sends back for changes.
A kanban board (board.md) shows work flowing through the pipeline in real-time. I can watch it in Obsidian as tasks move from Inbox to Design to Development to QA to Review to Done.
One command starts the entire company:
./agency.sh start Five agents wake up. They check for work. They wait.
The Prompt That Started Everything
I dropped a single message into the inbox:
Build a Profitable micro-SaaS
It’s 2025, soon to be 2026. Building software has never been easier, and indie hackers are eating good. AI is the name of the game, and it’s our time to profit off it. Your task is researching, validating, and building a micro-SaaS that can make money. Every decision should be based on real need for such an app.
Then I walked away.
What Happened Next
The Dispatcher picked up the request and created a project file. It assigned the research phase to the Architect with clear acceptance criteria: identify problems, validate demand, recommend ONE idea.
The Architect produced genuine market research. Five ideas emerged:
- AI-Powered Accessibility Checker for SMBs
- Privacy-First Analytics for Indie Makers
- Screenshot/OG Image API for Developers
- Content Repurposing Tool for Solopreneurs
- Email Deliverability Monitor for Cold Outreach
Each was scored on demand, competition, monetization, feasibility, and time to market. The Architect cited sources: Indie Hackers posts, Medium articles, competitor pricing pages.
The winner was an OG Image API. The reasoning:
This idea wins because it sits at the intersection of real need (every site needs OG images), technical feasibility (we can build this well), clear differentiation (simpler than alternatives), and proven monetization (SaaS with clear tiers).
The Architect named it OGSnap: “Beautiful OG images via URL. No design skills needed.”
The Build
The Architect produced a 386-line technical specification—file structures, database schemas, API contracts, a five-phase implementation plan. It included a warning:
SSRF protection: Proxy external images—don’t pass directly to Puppeteer.
That warning would become important later.
The Developer picked up the spec and started building. Phase by phase, handoffs flowed through the pipeline: code to QA, test reports to Reviewer, approvals back to Developer. The kanban board updated in real-time.
QA tested each phase. 40 tests passed. The Reviewer signed off.
Listening to the Agents Think
Every agent’s thinking is documented. I can open their status files and logs and watch them reason through problems.
When the build failed because a gradient template was missing, the Architect analyzed the situation:
The agents understand their place in the pipeline. They visualize it in their status updates:
Architect [PENDING] → Developer [BLOCKED] → QA [BLOCKED] → Reviewer [BLOCKED]
↑
Working on: Template audit, market research, value proposition I was watching the Billions show on my laptop, and out of nowhere, a browser opened through the playwright mcp, took screenshots of the app for review, and then closed it down again. that was the Reviewer agent.
The Security Catch
The Reviewer surprised me.
During Phase 2 review, the Reviewer read through the image generation code and noticed something:
The
avatarandlogoURL parameters from user input are directly embedded into HTML that Puppeteer renders. This allows Server-Side Request Forgery (SSRF) attacks where an attacker could:
- Access internal network resources
- Scan internal ports
- Access cloud metadata endpoints (e.g.,
http://169.254.169.254/)- Use
file://protocol to read local files
The Reviewer remembered the Architect’s warning—“SSRF protection: Proxy external images”—and flagged that it wasn’t implemented. It provided two fix options with actual code:
function isExternalUrl(url: string): boolean {
try {
const parsed = new URL(url);
if (!['http:', 'https:'].includes(parsed.protocol)) return false;
const hostname = parsed.hostname.toLowerCase();
if (
hostname === 'localhost' ||
hostname === '127.0.0.1' ||
hostname.startsWith('192.168.') ||
hostname === '169.254.169.254'
) {
return false;
}
return true;
} catch {
return false;
}
} The Developer implemented the fix. QA then ran 31 security tests:
| Test Category | Payloads Tested | Result |
|---|---|---|
| Localhost variants | localhost, 127.0.0.1, 0.0.0.0 | BLOCKED |
| Private IP ranges | 10.x.x.x, 192.168.x.x, 172.16.x.x | BLOCKED |
| Cloud metadata | 169.254.169.254, metadata.google.internal | BLOCKED |
| Protocol attacks | file:///etc/passwd, javascript:, ftp:// | BLOCKED |
| Valid external URLs | https://example.com/avatar.png | ALLOWED |
All 31 tests passed. The SSRF vulnerability was caught and fixed by autonomous agents before any human saw the code.
The Result
The first run—from inbox message to production-ready build—took about an hour.
The git commit tells the story:
83 files changed, 8,567 insertions(+) What the Agency built:
- Full authentication system (GitHub OAuth, magic links)
- Dashboard with API key generation
- Usage tracking and statistics
- 5 image generation templates
- Stripe integration with 4 pricing tiers ($0/9/29/79)
- Landing page, pricing page, documentation
- SSRF-protected image generation API
- Error boundaries and loading states
- Mobile-responsive design
The board showed the final status:
| Metric | Value |
|---|---|
| MVP Phases | 5/5 COMPLETE |
| Total QA Tests | 100/100 PASS |
| Security Issues | 0 (1 found and fixed) |
| Build Status | PASSING |
I had to wire up Supabase’s GitHub OAuth and run some database migrations. But when I tested the API—constructing a URL with title, author, and template parameters—a beautiful OG image came back.
It worked.
What This Means
I’m not claiming the Agency replaced a team of humans. The code needs review. The product needs iteration. The market assumptions need validation with real users.
The agents made decisions. The Architect chose OGSnap over four other ideas based on analysis. QA wrote test cases I wouldn’t have thought to write.
The file-based communication system means everything is readable. I can open Obsidian and see exactly what each agent was thinking, what decisions they made, and why. The handoffs read like documentation from a real team.
The fixed cost model changes the economics. I didn’t pay per token. I paid my monthly subscription and let the agents run until they were done. For experiments like this, that predictability matters.
The whole thing runs on bash scripts and markdown files. No framework to learn, no orchestration platform to configure. Just Claude Code, some shell scripts, and a folder structure.
What This Means for YOU
If you have a Claude Code subscription, try this. If you prefer Opencode, build the equivalent.
Money-making is just one application. How often do we developers look at repetitive tasks and think, I wish there was an app for this? How often do we build internal tools just for ourselves? Constantly.
Now any problem, any repeated workflow, can be automated, at a predictable cost to run.
What’s Next
OGSnap sits in my projects folder, ready for deployment. The technology works. Now I have to figure out what to do with it.
Honestly, I had no clue about the OG image problem space before this. I didn’t understand what pain point the app solved—despite it being built on my laptop. There’s something magical about that.
Since the first run, I’ve restructured the company and asked the Agency to make the app more lovable. We built new features this way, but there were moments I had to step into the chaos—break things down, voice my opinion. There’s a spectrum from no AI to full vibe-coding with zero human involvement. I’m still searching for the sweet spot.
I see this as a thought experiment in efficiency and coordination—exploring how much control to give autonomous agents and what synchronization patterns work best. It’s about figuring out what makes teams excel, using AI to replicate the human interactions that happen when people work together.
Oh and btw, the OG Image for this article has been generated with the AI-built micro-saas.
_The Agency is open source on GitHub. Fair warning: it works, but it’s not polished. Since writing this article I have reworked the company structure, as the token usage was too big, and the tempo of work too slow. I’m sharing this to let others experiment with their own setups—I’ll keep iterating on mine to figure out what actually makes sense in terms of speed, cost, and coordination. Fork it, break it, make it better.