MCP for Databases ~ Michal Vávra

Ask a plain-English business question and AI will happily give you a query. It looks right. It runs. The answer is wrong.

We ran into this at Adrez while building AI access to our Snowflake warehouse. The hard part was not getting the agent to write SQL. It was getting it to understand what the data actually means.

That is the subtle failure mode. The query is plausible, the execution succeeds, and the answer is still wrong. This post walks through the pattern we used to fix that.

The problem

You ask an AI agent: “monthly revenue by region.”

Without any schema, the agent guesses. It invents table names, picks plausible column names, and writes a query that looks like SQL but has no connection to your actual database.

So you give it the schema. Now it knows the tables and columns. It writes this:

SELECT
  r.name                              AS region,
  date_trunc('month', o.created_at)   AS month,
  sum(o.amount)                       AS revenue
FROM orders o
JOIN regions r ON r.id = o.region_id
GROUP BY 1, 2
ORDER BY 2

Better. The tables exist, the columns are real. But the query is still wrong.

The table is called transactions, not orders. The right time column is completed_at, not created_at, and the query includes refunded orders that should be excluded.

The schema tells the agent what exists, not what things mean, which columns to prefer, or which filters matter. That’s what a data catalog is for.

Handing over the catalog

We could try to stuff the whole catalog into the prompt, or into the agent’s context. We tried that, of course. It didn’t work well.

Even with large context windows¹, a real catalog is too big, too noisy, and too expensive to carry on every request.

So we tried a few other ways too.

Skills were nice in theory. In practice they needed a CLI, and stuffing the catalog into reference files felt wrong, and agents often ignored it anyway.

Custom agents gave us more flexibility. They also gave us unreliable tool calls, retries, and all that with extra cost. Also there was no way to use them with our AI assistants (Claude Cowork, Codex, …).

Snowflake Cortex was interesting. But our catalog lives outside of Snowflake, and we wanted more control than it could give us.

MCP got us closer, but two practical problems remained. Agents still pulled more context than they needed, and tool calling was still hard to control reliably.

A better approach

Then a new MCP pattern started showing up: Code Mode². Instead of exposing MCP tools directly to the agent, we could expose an API. The agent would write sandboxed code against it. That meant fewer roundtrips and less noise in the context.

Later on Cloudflare published Code Mode MCP. Their use case were APIs. Ours was a data warehouse. The interface was small: search and execute tools. For our use case, those two tools were enough.

How a question becomes a query

The next question was how to structure the catalog so the agent could use it. We ended up with a stripped-down Apache Atlas style catalog in a single JSON file:

a glossary that maps business terms to tables and columns,
entity definitions with preferred time columns and default filters,
and relationships that describe how tables join.

With Code Mode, the agent writes JavaScript that runs in an isolated sandbox. In search, the agent is instructed to write code that uses the CATALOG object. We tell it its shape with a TypeScript definition. The agent itself doesn’t have a direct access to the CATALOG.

To the agent, search looks like this:

Find the right entities and columns, then build a join plan.
Inspect a `CATALOG` object with this shape:

declare const CATALOG: {
  glossary: GlossaryItem[];
  entities: Entity[];
  relationships: Relationship[];
};

Write JavaScript as an arrow function.

Back to our question: “monthly revenue by region.” The agent executes search with this input:

() => {
  const terms = ["revenue", "order", "orders", "region", "regions"];

  const entities = findEntities(CATALOG, terms);
  const joinPlan = findRelationships(CATALOG, entities).getJoinPlan();

  return { entities, joinPlan };
}

And it would get back something like this:

{
  "entities": {
    "orders": { 
      "table": "transactions",
      "time": { "completed_at": ["revenue reporting", "monthly trends"] },
      "default_filters": ["status != 'refunded'"]
    },
    "region": { "table": "regions" }
  },
  "joinPlan": [{ "from": "transactions", "to": "regions", "key": "region_id" }]
}

It’s minimal but contains everything important to create correct SQL. And it’s exactly what the agent asked for, entities and join plan.

orders maps to transactions, completed_at is the right time column for those use cases, refunded rows should be excluded, and transactions joins to regions on region_id.

We could have used RAG or a graph database, but a single JSON catalog was easier to version and update as the warehouse changed.

Execute

With the search result, the agent now builds the correct query:

SELECT
  r.name                                AS region,
  date_trunc('month', t.completed_at)   AS month,
  sum(t.amount)                         AS revenue
FROM transactions t
JOIN regions r ON r.id = t.region_id
WHERE t.status != 'refunded'
GROUP BY 1, 2
ORDER BY 2

Now the agent hits the real table, uses completed_at, and applies the refund filter we actually care about.

The agent presents the query for review, then runs it through execute. The pattern is the same as search: it writes JavaScript against a small execution API rather than querying the database directly.

That does not have to be a single query. The agent can run a few queries, combine the results in code, and turn them into an Excel file, a chart, or a shareable HTML page.

If you then ask, “Break that down by sales channel,” it does not need to start from scratch. It already knows how to search the catalog and only needs the extra context for that dimension.

Guardrails

Each tool call runs in a Dynamic Worker Loader. A Worker starts, runs the JavaScript, and shuts down. It is sandboxed. It only gets the API we expose. It cannot access other resources, fetch external sites, or do anything outside that boundary.

Access control still lives in the database. We use row-level security and column masking for PII, so the agent only sees what that role is allowed to see. The MCP layer does not add separate exceptions.

The shift

If AI is going to be a real interface to company data, it cannot just write SQL that looks valid. It has to understand what the business is actually asking.

That means going beyond schema, beyond prompt stuffing, and beyond the idea that a query that runs is good enough. A valid query is not the same thing as a correct answer.

For us, the shift was simple. Do not hand the agent all the knowledge. Give it a reliable way to find the right knowledge when it needs it.

That is what made the difference. The agent stopped being a confident guesser and became something people could actually trust.

Claude Desktop answering monthly revenue by region with summary cards and a bar chart.

Cloudflare’s Code Mode pattern and platform made this possible.

Martin Křeček built the catalog and keeps it alive. Martin Hapl became our power user, stress-tested everything, and shaped it with a lot of good ideas. Vojta Dlápal gave the post editorial care and feedback, which in practice means he wrote the beginning and the end.

Usual context windows now are ~200K tokens for Anthropic and OpenAI models. ↩︎
Cloudflare’s Code Mode and Anthropic’s Code execution with MCP. ↩︎