Back to Blog

I Built an Autonomous AI Company and Told It to Make Me Money

How I went from a simple task-running script to a multi-agent AI company and built a SaaS in under an hour.

It started with an article.

Denislav Gavrilov published a piece called “Clopus 02: A 24-Hour Claude Code Run”. His experiment in letting Claude Code run autonomously made me realize I could do the same.

Cost had always held me back from building an AI-run company. API usage is unpredictable. You set something loose, it hallucinates into a loop, and suddenly you’re staring at a $400 bill for a weekend experiment. But Claude Code runs on a subscription. Fixed cost. I already knew what it could do—research, documentation, planning, orchestration, creativity. The subscription model meant I could finally experiment without the meter running.

So I went exploring.


The Loop

The core mechanism is embarrassingly simple. A bash script watches an Obsidian file for new tasks:

while true; do
    if grep -q "## PENDING:" "$GOALS" 2>/dev/null; then
        claude -p "$PROMPT" --dangerously-skip-permissions
        sleep 5
    else
        sleep 30
    fi
done

When the script sees a task marked ## PENDING:, it spawns Claude Code with a prompt explaining its role. Claude reads the task, does the work, updates a status file, and marks the task as done.

The --dangerously-skip-permissions flag is what makes it autonomous. It bypasses all of Claude Code’s permission prompts—no confirmation dialogs, no human approval needed. Claude can read, write, execute, and delete without asking.

I ran a few experiments—single agents on single projects. It worked. Then I tried queuing tasks before sleep and waking up to them done. That worked too.

But it was still one agent, one project, one task at a time. What if I built a team?


The Agency: A Multi-Agent Company

I asked Claude to help me architect a complete software company with dedicated agents to fulfill different company roles. It came up with this:

AgentRole
DispatcherTriages requests, breaks them into tasks, routes to specialists
ArchitectDesigns systems, makes technical decisions, writes specifications
DeveloperImplements features, writes code
QATests functionality, finds bugs, runs security checks
ReviewerFinal quality gate, approves work for shipping

Each agent has its own personality file, its own goals queue, its own status updates. They communicate through markdown files in a handoffs/ directory. The Architect writes specs and hands them to the Developer. The Developer builds and hands to QA. QA tests and hands to the Reviewer. The Reviewer approves or sends back for changes.

A kanban board (board.md) shows work flowing through the pipeline in real-time. I can watch it in Obsidian as tasks move from Inbox to Design to Development to QA to Review to Done.

One command starts the entire company:

./agency.sh start

Five agents wake up. They check for work. They wait.


The Prompt That Started Everything

I dropped a single message into the inbox:

Build a Profitable micro-SaaS

It’s 2025, soon to be 2026. Building software has never been easier, and indie hackers are eating good. AI is the name of the game, and it’s our time to profit off it. Your task is researching, validating, and building a micro-SaaS that can make money. Every decision should be based on real need for such an app.

Then I walked away.


What Happened Next

The Dispatcher picked up the request and created a project file. It assigned the research phase to the Architect with clear acceptance criteria: identify problems, validate demand, recommend ONE idea.

The Architect produced genuine market research. Five ideas emerged:

  1. AI-Powered Accessibility Checker for SMBs
  2. Privacy-First Analytics for Indie Makers
  3. Screenshot/OG Image API for Developers
  4. Content Repurposing Tool for Solopreneurs
  5. Email Deliverability Monitor for Cold Outreach

Each was scored on demand, competition, monetization, feasibility, and time to market. The Architect cited sources: Indie Hackers posts, Medium articles, competitor pricing pages.

The winner was an OG Image API. The reasoning:

This idea wins because it sits at the intersection of real need (every site needs OG images), technical feasibility (we can build this well), clear differentiation (simpler than alternatives), and proven monetization (SaaS with clear tiers).

The Architect named it OGSnap: “Beautiful OG images via URL. No design skills needed.”


The Build

The Architect produced a 386-line technical specification—file structures, database schemas, API contracts, a five-phase implementation plan. It included a warning:

SSRF protection: Proxy external images—don’t pass directly to Puppeteer.

That warning would become important later.

The Developer picked up the spec and started building. Phase by phase, handoffs flowed through the pipeline: code to QA, test reports to Reviewer, approvals back to Developer. The kanban board updated in real-time.

QA tested each phase. 40 tests passed. The Reviewer signed off.


Listening to the Agents Think

Every agent’s thinking is documented. I can open their status files and logs and watch them reason through problems.

When the build failed because a gradient template was missing, the Architect analyzed the situation:

The agents understand their place in the pipeline. They visualize it in their status updates:

Architect [PENDING] → Developer [BLOCKED] → QA [BLOCKED] → Reviewer [BLOCKED]
     ↑
  Working on: Template audit, market research, value proposition

I was watching the Billions show on my laptop, and out of nowhere, a browser opened through the playwright mcp, took screenshots of the app for review, and then closed it down again. that was the Reviewer agent.


The Security Catch

The Reviewer surprised me.

During Phase 2 review, the Reviewer read through the image generation code and noticed something:

The avatar and logo URL parameters from user input are directly embedded into HTML that Puppeteer renders. This allows Server-Side Request Forgery (SSRF) attacks where an attacker could:

  • Access internal network resources
  • Scan internal ports
  • Access cloud metadata endpoints (e.g., http://169.254.169.254/)
  • Use file:// protocol to read local files

The Reviewer remembered the Architect’s warning—“SSRF protection: Proxy external images”—and flagged that it wasn’t implemented. It provided two fix options with actual code:

function isExternalUrl(url: string): boolean {
	try {
		const parsed = new URL(url);
		if (!['http:', 'https:'].includes(parsed.protocol)) return false;
		const hostname = parsed.hostname.toLowerCase();
		if (
			hostname === 'localhost' ||
			hostname === '127.0.0.1' ||
			hostname.startsWith('192.168.') ||
			hostname === '169.254.169.254'
		) {
			return false;
		}
		return true;
	} catch {
		return false;
	}
}

The Developer implemented the fix. QA then ran 31 security tests:

Test CategoryPayloads TestedResult
Localhost variantslocalhost, 127.0.0.1, 0.0.0.0BLOCKED
Private IP ranges10.x.x.x, 192.168.x.x, 172.16.x.xBLOCKED
Cloud metadata169.254.169.254, metadata.google.internalBLOCKED
Protocol attacksfile:///etc/passwd, javascript:, ftp://BLOCKED
Valid external URLshttps://example.com/avatar.pngALLOWED

All 31 tests passed. The SSRF vulnerability was caught and fixed by autonomous agents before any human saw the code.


The Result

The first run—from inbox message to production-ready build—took about an hour.

The git commit tells the story:

83 files changed, 8,567 insertions(+)

What the Agency built:

  • Full authentication system (GitHub OAuth, magic links)
  • Dashboard with API key generation
  • Usage tracking and statistics
  • 5 image generation templates
  • Stripe integration with 4 pricing tiers ($0/9/29/79)
  • Landing page, pricing page, documentation
  • SSRF-protected image generation API
  • Error boundaries and loading states
  • Mobile-responsive design

The board showed the final status:

MetricValue
MVP Phases5/5 COMPLETE
Total QA Tests100/100 PASS
Security Issues0 (1 found and fixed)
Build StatusPASSING

I had to wire up Supabase’s GitHub OAuth and run some database migrations. But when I tested the API—constructing a URL with title, author, and template parameters—a beautiful OG image came back.

It worked.


What This Means

I’m not claiming the Agency replaced a team of humans. The code needs review. The product needs iteration. The market assumptions need validation with real users.

The agents made decisions. The Architect chose OGSnap over four other ideas based on analysis. QA wrote test cases I wouldn’t have thought to write.

The file-based communication system means everything is readable. I can open Obsidian and see exactly what each agent was thinking, what decisions they made, and why. The handoffs read like documentation from a real team.

The fixed cost model changes the economics. I didn’t pay per token. I paid my monthly subscription and let the agents run until they were done. For experiments like this, that predictability matters.

The whole thing runs on bash scripts and markdown files. No framework to learn, no orchestration platform to configure. Just Claude Code, some shell scripts, and a folder structure.


What This Means for YOU

If you have a Claude Code subscription, try this. If you prefer Opencode, build the equivalent.

Money-making is just one application. How often do we developers look at repetitive tasks and think, I wish there was an app for this? How often do we build internal tools just for ourselves? Constantly.

Now any problem, any repeated workflow, can be automated, at a predictable cost to run.


What’s Next

OGSnap sits in my projects folder, ready for deployment. The technology works. Now I have to figure out what to do with it.

Honestly, I had no clue about the OG image problem space before this. I didn’t understand what pain point the app solved—despite it being built on my laptop. There’s something magical about that.

Since the first run, I’ve restructured the company and asked the Agency to make the app more lovable. We built new features this way, but there were moments I had to step into the chaos—break things down, voice my opinion. There’s a spectrum from no AI to full vibe-coding with zero human involvement. I’m still searching for the sweet spot.

I see this as a thought experiment in efficiency and coordination—exploring how much control to give autonomous agents and what synchronization patterns work best. It’s about figuring out what makes teams excel, using AI to replicate the human interactions that happen when people work together.

Oh and btw, the OG Image for this article has been generated with the AI-built micro-saas.


_The Agency is open source on GitHub. Fair warning: it works, but it’s not polished. Since writing this article I have reworked the company structure, as the token usage was too big, and the tempo of work too slow. I’m sharing this to let others experiment with their own setups—I’ll keep iterating on mine to figure out what actually makes sense in terms of speed, cost, and coordination. Fork it, break it, make it better.