The AI Agent Stack is Broken (And How We're Fixing It)

Most developers are over-engineering their AI agent infrastructure. Here is why I'm moving away from complex chains and back to simple, stateless functions.

I spent the last three months trying to turn a simple customer support bot into a fully autonomous "agent." The result? A tangled mess of LangChain graphs, state management nightmares, and a bill for OpenHub API tokens that made my accountant cry.

We are over-engineering the AI stack. I see it constantly in the Bangkok tech scene and in my work at Thea Tech Solutions. Everyone wants to build the next JARVIS, but they end up with a glitchy chatbot that hallucinates refund policies.

This week, I stripped our entire agent architecture down to the studs. I moved away from heavy orchestration frameworks and went back to basics. The code is faster, cheaper, and—ironically—smarter.

Here is why the current obsession with complex agent chains is a trap and what I am using instead.

The Complexity Trap

When I started integrating LLMs into production, I fell into the same trap as everyone else. I grabbed a framework. I started defining tools, building chains, and managing memory vectors. It felt powerful. I could tell the system to "research this, summarize that, and format it as JSON."

But the failure rate was unacceptable. In a chain of five calls, if one link hallucinates or fails a schema validation, the whole request dies. Debugging a multi-step agent loop is hell. You cannot easily unit-test a "reasoning" step.

Worse, the latency was killing the user experience. Users don't want to wait fifteen seconds for an agent to "think" about a password reset. They want an answer now.

My New Stack: Less is More

I have pivoted to a "dumb wrapper, smart model" approach. I stopped trying to outsmart the model with Python code and started trusting the model's context window.

1. Orchestration? Just Use Functions

I dropped LangChain and its cousins. Now, I write standard TypeScript functions. If an agent needs to check an order status in Supabase, I write a function called checkOrderStatus. I pass the description of this function to the LLM via the OpenHub SDK or Vercel AI SDK.

The model decides when to call the function. It doesn't need a complex graph to tell it that.

import { OpenAI } from 'openai';
import { supabase } from './supabase';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const tools = {
  getOrderStatus: {
    description: 'Get the current status of a specific order ID',
    parameters: {
      type: 'object',
      properties: {
        orderId: { type: 'string' },
      },
      required: ['orderId'],
    },
    execute: async (args) => {
      const { data } = await supabase
        .from('orders')
        .select('status, total')
        .eq('id', args.orderId)
        .single();
      return data;
    },
  },
};

This is deterministic. I can test getOrderStatus without invoking an LLM. I can cache the results. I can swap the database implementation without breaking the agent.

2. State Management is a Database Problem

Stop storing chat history in the LLM's context window for every single request. It's expensive and unnecessary.

I treat chat history like any other user data. I store messages in Supabase. When a user sends a message, I fetch the last 10 messages, inject them into the system prompt, and send the request.

For "long-term memory" (facts about the user), I use a simple RAG (Retrieval-Augmented Generation) setup. I vectorize key user profile data and store it in Supabase's pgvector. Before the LLM answers, I run a similarity search to pull relevant context.

This keeps the token count low and the context relevant.

3. The "Evals" Safety Net

The biggest fear with agents is them going rogue. You cannot ship an agent to production without a safety net.

I use a dual-layer approach:

• Input Guardrails: Before the user's prompt hits the LLM, a smaller, faster model (like GPT-4o-mini or a local Llama 3 instance) checks for malicious intent or prompt injection.

• Output Validation: I use a library like Zod or invariant to enforce a JSON schema on the output. If the agent tries to call a tool with a string where an integer is required, the code catches it and asks the model to correct itself before executing.

The Infrastructure Reality

Hosting these agents doesn't require a Kubernetes cluster. In fact, it's a bad idea. You want to be as close to the inference provider as possible to reduce latency.

I deploy my agent backends on Vercel or Cloudflare Workers. The edge network minimizes the cold start time, and the serverless nature scales perfectly with the sporadic traffic of a chat application.

For background tasks—like an agent that needs to process a large PDF or run a long data analysis—I offload the work to a queue. I use Inngest or AWS SQS to trigger a worker. The user gets a "I'm working on it" message, and the worker processes the job asynchronously, updating the database when done.

The Takeaway

The hype around autonomous agents is just that—hype. We aren't at the stage where we can set a bot loose and trust it to manage our SaaS.

Focus on building robust tools that the LLM can use. Trust the model's reasoning, but verify its inputs and outputs. Keep your state in a database, not in the context window.

If you are starting a new AI project today, don't install a framework. Write a function. Call an API. Ship it. You can add the complexity later if you actually need it.