I have seen the same pattern in dozens of repositories this month. A developer spins up a Next.js app, drops in an API key for OpenAI, and builds a UI that sends a prompt to the gpt-4-turbo endpoint and streams the text back to a div. They call it an "AI Feature." I call it a missed opportunity.
We are past the era of the chatbot. If your application simply talks to the user and forgets the context the moment the page reloads, you are building a toy. Real value in 2024 comes from Agents—systems that can reason, act, and retain state.
Running Thea Tech Solutions here in Bangkok, I have had to pivot my clients' thinking from "ChatGPT wrappers" to "Agentic Workflows." The difference is night and day. A chatbot answers questions. An agent gets work done.
The Problem with Stateless Chats
The standard LLM integration is stateless. It is a request-response cycle. You send a prompt, you get a completion. This is fine for generating a blog post or refactoring a function, but it is terrible for productivity tools.
If I ask an AI to "Draft an email to the supplier," and then follow up with "Send it," a standard implementation fails. It does not know who the supplier is, it does not have access to my email client, and it certainly cannot send anything. It just generates text.
To move forward, we need three specific layers: Context, Tools, and Memory.
1. Context: RAG is Not a Buzzword
The first step is grounding the model in your data. At Thea Tech, we use Supabase for almost everything, and their pgvector extension makes this trivial.
I do not dump my entire database into the prompt. That is expensive and slow. Instead, I use Retrieval-Augmented Generation (RAG). When a user sends a query, I generate an embedding for that query, query the vector store for the nearest neighbors, and inject those results into the system prompt.
The Stack:* Database: Supabase (Postgres + pgvector)
* Embedding Model: text-embedding-3-small (Cheaper and faster than Ada-002)
* Orchestration: Next.js API Route
This ensures the agent isn't hallucinating facts about my business. It is reading the documentation before it speaks.
2. Tools: Giving the AI Hands
This is the biggest shift. Modern models (GPT-4o, Claude 3.5 Sonnet) support Function Calling (or tool calling). This allows the LLM to output a structured JSON object asking your backend to execute a specific function, rather than just text.
I recently built a logistics dashboard where the agent could query shipment status. Instead of the LLM guessing the tracking number, I defined a tool:
const tools = [
{
type: "function",
function: {
name: "get_shipment_status",
description: "Get the current status of a shipment by tracking ID",
parameters: {
type: "object",
properties: {
tracking_id: {
type: "string",
description: "The tracking ID, e.g., TN-12345"
}
},
required: ["tracking_id"]
}
}
}
];
When the user asks, "Where is shipment TN-12345?", the model stops generating text and returns a function call. My backend executes the SQL query against Supabase, gets the result, and feeds it back to the model. The model then formulates the final answer for the user.
This is the loop of an Agent. It is not just chatting; it is querying your API.
3. Memory: The Assistants API
For a long time, I rolled my own memory management. I stored conversation threads in Redis and managed the context window manually. It was a pain. You have to truncate old messages, summarize summaries, and handle token limits.
I have recently started using OpenAI's Assistants API for production clients, and despite the initial latency issues, it solves the memory problem elegantly.
The Assistants API handles Threads and Runs server-side. I do not need to send the chat history back and forth. I just pass the Thread ID, and OpenAI manages the state. It supports Code Interpreter (for data analysis) and Retrieval (for file uploading) out of the box.
Why this matters:If a user uploads a CSV of sales data and asks, "Create a Python script to forecast next month's revenue," the Assistant spins up a sandboxed Python environment, writes the code, executes it, and returns the chart. I did not have to write a single line of Python execution logic. That is powerful.
The Architecture I Recommend
If you are building an AI feature today, do not start with the chat.completions endpoint. Start here:
This setup decouples your logic from the model. If OpenAI goes down or changes their pricing, you can swap the model provider without rewriting your database logic.
The Practical Takeaway
Stop asking, "How can I add chat to my app?" Start asking, "What job do I want the AI to do?"
If the job requires knowledge of your data, implement RAG. If the job requires interacting with your system, implement Tool Calling. If the job requires long-term reasoning, use the Assistants API.
The complexity increases, but so does the value. A chatbot is a novelty. An agent that can query your database, update your CRM, and draft emails is a product.