The 'AI Agent' Stack is a Lie (and Why You Still Need It)

10 May 2026 · 4 min read · AI Agents, Next.js, Supabase, Architecture, LangGraph

Most companies are building AI agents wrong, treating them like glorified chatbots wrapped in expensive infrastructure. Here is how I build agents that actually ship code.

I have a controversial opinion: most of you don't need AI agents.

At least, not the kind the hype cycle is selling you right now.

Every other pitch deck I see in Bangkok’s tech scene is about 'autonomous agents' that will revolutionize business. The reality I see on the ground is slightly different. It’s usually a brittle wrapper around OpenAI’s API that hallucinates when asked to do anything more complex than a 'Hello World'.

But, when they work, they change everything. At Thea Tech Solutions, we’ve moved past simple chat completions. We are building systems that act.

If you want to put an AI agent into production without burning your AWS budget on hallucination retries, you have to stop thinking of it as a chatbot and start thinking of it as a microservice with a probabilistic core.

Here is the modern architecture I am using to make these things reliable.

The 'Chatbot' Trap

The biggest mistake I see is engineers trying to fit agent logic into a standard MVC controller. You send a prompt, you get a completion. It works for a demo, but it fails in production because it lacks state and tool verification.

An agent needs to be a loop, not a request-response cycle. It needs to perceive, reason, act, and verify. If you just dump a user's request into gpt-4o, you are praying the model gets the context right. I don't like praying in production. I like deterministic guardrails.

The Architecture: LangGraph + Supabase

I stopped using raw LangChain for complex orchestration months ago. It’s too messy for anything beyond simple chains. Right now, the most robust tool for production agents is LangGraph.

Why? Because it treats the agent workflow as a state machine. This is crucial. If an agent fails to call a tool, you can route it to a human or a fallback routine without crashing the whole user session.

Here is how I architect it:

• The Orchestrator (Node.js/Next.js API Route): This is the entry point. It doesn't know about AI. It just knows about business logic.

• The Graph (LangGraph): This lives server-side. It defines the flow: Input -> Intent Classification -> Tool Router -> Execution -> Validation.

• Memory (Supabase/Postgres): Do not rely on the context window alone. We dump conversation history and session state into Supabase. The agent queries this table to 'remember' context across requests.

The Tooling Problem

The 'intelligence' of an agent is useless if it can't touch your data. But giving an LLM access to your SQL database is a nightmare waiting to happen.

I use a strict 'Tool Use' pattern. The LLM does not write SQL. It never writes SQL. It calls a TypeScript function.

For example, if a user asks, "Did my invoice get paid?", the agent doesn't query the database. It outputs a structured JSON object:

{
  "tool": "check_invoice_status",
  "parameters": {
    "invoice_id": "INV-2024-001"
  }
}

My backend receives this, validates the parameters with Zod, and then executes a pre-vetted Supabase RPC function. The result is passed back to the LLM to formulate a natural language response.

This keeps your data layer secure. The LLM is just a glue layer between the user's intent and your TypeScript functions.

Why I Picked Supabase Over Pinecone

Everyone talks about Vector databases. Pinecone is great, but for 90% of my clients, adding another vendor is overkill.

Supabase has pgvector, and it is performant enough for RAG (Retrieval-Augmented Generation) on a dataset of up to a few million documents.

Here is the stack I recommend for a standard 'Knowledge Base' agent:

• Ingest: PDF/Docx text is extracted.

• Chunk: Text is split (e.g., 1000 characters).

• Embed: OpenAI text-embedding-3-small model (cheaper and better than ada-002).

• Store: Vectors stored directly in Supabase.

This saves me the headache of managing a separate vector index and keeps the latency low because the data and the vectors are in the same Postgres instance.

The Cost Reality

Running agents is expensive. Not just in compute, but in API tokens.

If you route every user message through gpt-4o, you will go broke. I implement a 'cascading model' strategy.

• Router: Use a cheap model like gpt-4o-mini or Llama 3 (hosted on Groq or Cloudflare Workers AI) to classify intent.

• Execute: If the intent is complex (e.g., 'Refund this order and email the customer'), escalate to gpt-4o.

This simple step cut our token costs by ~70% last month for a client's support bot.

Deployment: Keep it Serverless

I deploy these agents on Vercel (for Next.js frontends) or Cloudflare Workers.

Why Cloudflare? Because AI agents often involve chaining multiple API calls. If you are running on a server in a single region, the latency to OpenAI's API can vary. Cloudflare Workers run everywhere. Plus, their new AI binding lets you run smaller open-source models (like Gemma or Llama) directly on the edge, completely bypassing the OpenAI tax for simple tasks.

The Takeaway

Don't build a 'chatbot.' Build a service.

The winning stack for me right now isn't about finding the smartest model. It's about control.

Use LangGraph for the brain, Supabase for the memory, and TypeScript for the hands. If you can't debug your agent's workflow using standard developer tools (like tracing and logging), you aren't building software; you're just gambling with prompts.

Treat the AI as a probabilistic function in your codebase, not the product itself. That is how you get from a cool demo to a production-ready system.