Field Guide · AI Integrationv1 · 2026

When you say AI,
what do you mean?

Most clients arrive with a sense of urgency and no map. 'We need AI' usually means one of five distinct things, depending on how deep into the stack you want to go. This is the stack. Where you sit on it shapes what we build, what it costs, and what can quietly fail.

Premise

“The model is rarely the bottleneck. What sits around the model usually is.”

Most AI projects stall on the same handful of things: the data is unreliable, the agent has access it shouldn't, the prompt is fighting the model, or no one defined what 'working' looks like. Gartner expects 30% of generative AI projects to be abandoned after proof-of-concept by the end of 2025.

Below is a working stack of five layers. Each is real engineering. Each unlocks specific capabilities and quietly fails in specific ways. Reading this won't make you an AI engineer. It will tell you which layer you're asking about.

Layer 01

Data Integration

Connecting the systems where work already lives.

The first layer is plumbing. Before a model can do anything useful, the things you already use (Xero for invoices, SharePoint for documents, a CRM for client details, Asana or Linear for project state) have to be reachable, fresh, and permissioned correctly. This is two-thirds of the work in a real AI project, and almost none of the demo.

What this layer enables

Weekly reporting that pulls real numbers from Xero, HubSpot, and your project tools, instead of describing what reporting could look like in a deck.
Internal search across the documents your team already uses, with permissions honoured at the retrieval layer rather than retrofitted later.
The substrate every layer above this assumes. Without it, nothing else works the way the deck promised.
Workflow automation that triggers off real events (a new lead, an overdue invoice, a tagged document) rather than calendar reminders.

What it won't do yet

Answer free-form questions about your documents. That's the next layer, vectorization, and it doesn't come for free.
Give an agent the ability to write back into your systems. Read access is one layer; scoped write access is two layers up.
Stay useful when data is stale or fragmented across sources. Most projects that quietly fail at this layer never get debugged. Teams stop trusting them.
Skip discovery. Connecting the wrong systems faster doesn't help anyone.

Layer 02

Vectorization

Turning your data into something a model can search by meaning.

Once data is reachable, the second layer turns it into vectors: numeric fingerprints where things with similar meanings end up near each other. A model can find 'the contract clause about cancellation' even when the document doesn't use the word cancellation. This is also where the 2025/26 term 'context layer' lives: verified business meaning, identity resolution, freshness, and policy sitting between raw data and any agent that touches it.

What this layer enables

RAG chatbots (retrieval-augmented generation), where the model answers from cited passages of your real documents rather than guessing.
Semantic search across contracts, policies, internal wikis, and historical client work, by meaning rather than keyword.
Deduplication and record matching across CRM systems that disagree about the same customer.
Classification and routing: auto-tag tickets, categorise incoming requests, surface anomalies in invoice patterns before anyone notices.

What it won't do yet

Make a model smarter. Vectorization is a lookup primitive, not a reasoning upgrade.
Replace business logic. Vectors return what's similar; they can't tell you what's allowed.
Save you from RAG over unreliable sources. If your policy documents contradict each other, the chatbot will repeat the contradictions with citations.
Act on what it finds. The model can locate the right document; it can't update it, file a ticket, or notify anyone until the next layer is in place.

Layer 03

Agentic Tooling

Giving agents scoped, role-aware access to your systems.

The third layer is where AI starts taking action. Each agent gets a manifest of tools: scoped API endpoints with names like GetProjectDetails or CreateInvoiceDraft. Each tool has internal role-based access control, so a project management agent can see tasks and deadlines but never touches pricing or margin. The protocol most teams are converging on is MCP, Anthropic's open Model Context Protocol, which solved the problem of every model needing a custom integration.

What this layer enables

CRM-aware draft assistants that pull client context, past proposals, and pricing rules, draft the email, and never auto-send.
Lead enrichment where the model reads (a company's website, a news article) but isn't trusted to invent (a funding round, a headcount).
Customer-facing chatbots with scoped tools (getOrderStatus, requestReturn) and explicit refusals on out-of-scope questions.
Internal write agents that take real action on your behalf: file a ticket, schedule a follow-up, post a status update, draft an invoice.

What it won't do yet

Survive the Air Canada precedent. In 2024 the airline lost a case when its chatbot promised retroactive bereavement fares its policy didn't support. The model didn't hallucinate. The data layer and tool scope didn't enforce the policy. Anything customer-facing needs RAG over policy, scoped tools, and explicit refusals, or it's a liability.
Stay safe with over-permissive tools. The most common production failure at this layer is an agent inheriting the user's full permissions rather than the task's. Scope tools to the job, not the identity.
Make judgement calls under pressure. Tools execute; they don't deliberate. That's the next layer.
Be trusted without audit logging. Every tool call has to be traceable. Agents fail in ways humans don't, and you need the trace to find them.

Layer 04

Harness & Skills

The scaffolding that shapes how an agent behaves.

The fourth layer is the harness: everything around the model that turns a single text completion into reliable behaviour. The inner harness lives inside each model call: system prompts, tool definitions, retrieved context, output schemas. The outer harness lives outside: evals, retries, guardrails, observability, human-in-the-loop. On top of both, Anthropic's Skills (released October 2025) are small folders of procedural know-how: a SKILL.md file telling the model how to do something specific, optionally with scripts the model can run. Skills teach the model how; tools give the model access to what.

What this layer enables

Brand-voice Skills: your tone, structure, and language locked into every draft, not re-guessed each time.
Procedural Skills like 'how we run a discovery call' or 'how we structure a creative brief', encoded as repeatable, model-readable steps.
Reliable multi-step workflows: research, draft, review, hand off to a human at the right moment, with the handoff defined rather than improvised.
Evals: measurable definitions of what 'working' looks like, so you can tell when behaviour drifts before your users do.

What it won't do yet

Survive the lethal trifecta. Simon Willison's framing: any agent with access to private data, exposure to untrusted content, and an external communication channel is vulnerable to indirect prompt injection. No alignment fix patches this. The architecture has to remove one leg of the trifecta.
Replace judgement. Skills encode procedure; they don't decide whether the procedure should run.
Outperform a strong model by stacking scaffolding around a weak one. Over-scaffolding is the most common failure mode of 2024-era agent frameworks, and it doesn't age well.
Stay useful when no one maintains the evals. An agent without eval discipline drifts the way a forgotten service drifts: silently, then all at once.

Layer 05

Cowork & Orchestration

Where agents become teammates.

The fifth layer is what the people in your business touch. Cowork is the surface (Claude Cowork from Anthropic, Microsoft Copilot Cowork running Claude inside the M365 Graph, or a custom front-end built for your team) where a named agent shows up alongside humans. Not a chatbot. A coworker with a defined role: a project management agent for the PM team, a creative strategy agent for strategists, a finance agent for operations. Each one inherits the four layers below it: real data, real tools, real Skills, real evals.

What this layer enables

A creative strategy agent that knows your client history, helps draft positioning, and never gets near pricing or contracts.
A PM agent that tracks tasks, surfaces risks, and drafts status updates, using the same tools and permissions the human PM has.
Cross-team workflows that span the systems your team already lives in (Slack to CRM to calendar) without anyone learning a new app.
Adoption that sticks, because the agent fits an existing job rather than asking a team to invent one.

What it won't do yet

Make a poorly-scoped agent useful. Agents that try to do everything fail; narrow, named coworkers outperform open-ended assistants by a long margin.
Create trust through installation. Teams trust agents they've shaped: ones whose evals they've watched move, whose mistakes they've corrected. Trust comes from time and use.
Compensate for missing layers. A Cowork-style agent without scoped tools or harness discipline is a more confident chatbot, with the same liability profile.
Be your only AI work. The most reliable AI integrations are three to four narrow coworkers, not one wide assistant.

Constants

What stays the same, regardless of where you start.

AI integration is rarely a single project. It's a sequence of smaller ones. You enter the stack at the point that fits your data and your team. These four constants hold either way.

Discovery first

We map your data, your workflows, and what 'working' looks like before we wire in any model. Every project that skipped this step came back to do it later, more expensively.

Narrow scope

One coworker, one workflow, one layer at a time. We ship something useful before we ship something ambitious. Expanding scope after launch is cheap; debugging an over-scoped launch is not.

Human-in-the-loop

Every write tool starts with a human approval step. We remove the step when the metrics say it's safe to, not when the demo asks us to.

Ongoing tuning

AI projects are gardens, not statues. Prompts, Skills, and evals drift as the model updates and your business changes. We stay involved past launch. Anyone who doesn't is selling you the wrong thing.

Where to next

Tell us what you're trying to do.

If you've read this far, you already know more about AI integration than most agencies will tell you. The next step is a conversation. Tell us the problem you're solving and the systems you're already using, and we'll tell you which layer you're asking about, and whether the project is one we should take on.

Start the Conversation

Field Guide · AI Integrationv1 · 2026

When you say AI,
what do you mean?

Premise

“The model is rarely the bottleneck. What sits around the model usually is.”

Layer 01

Data Integration

Connecting the systems where work already lives.

What this layer enables

Weekly reporting that pulls real numbers from Xero, HubSpot, and your project tools, instead of describing what reporting could look like in a deck.
Internal search across the documents your team already uses, with permissions honoured at the retrieval layer rather than retrofitted later.
The substrate every layer above this assumes. Without it, nothing else works the way the deck promised.
Workflow automation that triggers off real events (a new lead, an overdue invoice, a tagged document) rather than calendar reminders.

What it won't do yet

Answer free-form questions about your documents. That's the next layer, vectorization, and it doesn't come for free.
Give an agent the ability to write back into your systems. Read access is one layer; scoped write access is two layers up.
Stay useful when data is stale or fragmented across sources. Most projects that quietly fail at this layer never get debugged. Teams stop trusting them.
Skip discovery. Connecting the wrong systems faster doesn't help anyone.

Layer 02

Vectorization

Turning your data into something a model can search by meaning.

What this layer enables

RAG chatbots (retrieval-augmented generation), where the model answers from cited passages of your real documents rather than guessing.
Semantic search across contracts, policies, internal wikis, and historical client work, by meaning rather than keyword.
Deduplication and record matching across CRM systems that disagree about the same customer.
Classification and routing: auto-tag tickets, categorise incoming requests, surface anomalies in invoice patterns before anyone notices.

What it won't do yet

Make a model smarter. Vectorization is a lookup primitive, not a reasoning upgrade.
Replace business logic. Vectors return what's similar; they can't tell you what's allowed.
Save you from RAG over unreliable sources. If your policy documents contradict each other, the chatbot will repeat the contradictions with citations.
Act on what it finds. The model can locate the right document; it can't update it, file a ticket, or notify anyone until the next layer is in place.

Layer 03

Agentic Tooling

Giving agents scoped, role-aware access to your systems.

What this layer enables

CRM-aware draft assistants that pull client context, past proposals, and pricing rules, draft the email, and never auto-send.
Lead enrichment where the model reads (a company's website, a news article) but isn't trusted to invent (a funding round, a headcount).
Customer-facing chatbots with scoped tools (getOrderStatus, requestReturn) and explicit refusals on out-of-scope questions.
Internal write agents that take real action on your behalf: file a ticket, schedule a follow-up, post a status update, draft an invoice.

What it won't do yet

Survive the Air Canada precedent. In 2024 the airline lost a case when its chatbot promised retroactive bereavement fares its policy didn't support. The model didn't hallucinate. The data layer and tool scope didn't enforce the policy. Anything customer-facing needs RAG over policy, scoped tools, and explicit refusals, or it's a liability.
Stay safe with over-permissive tools. The most common production failure at this layer is an agent inheriting the user's full permissions rather than the task's. Scope tools to the job, not the identity.
Make judgement calls under pressure. Tools execute; they don't deliberate. That's the next layer.
Be trusted without audit logging. Every tool call has to be traceable. Agents fail in ways humans don't, and you need the trace to find them.

Layer 04

Harness & Skills

The scaffolding that shapes how an agent behaves.

What this layer enables

Brand-voice Skills: your tone, structure, and language locked into every draft, not re-guessed each time.
Procedural Skills like 'how we run a discovery call' or 'how we structure a creative brief', encoded as repeatable, model-readable steps.
Reliable multi-step workflows: research, draft, review, hand off to a human at the right moment, with the handoff defined rather than improvised.
Evals: measurable definitions of what 'working' looks like, so you can tell when behaviour drifts before your users do.

What it won't do yet

Survive the lethal trifecta. Simon Willison's framing: any agent with access to private data, exposure to untrusted content, and an external communication channel is vulnerable to indirect prompt injection. No alignment fix patches this. The architecture has to remove one leg of the trifecta.
Replace judgement. Skills encode procedure; they don't decide whether the procedure should run.
Outperform a strong model by stacking scaffolding around a weak one. Over-scaffolding is the most common failure mode of 2024-era agent frameworks, and it doesn't age well.
Stay useful when no one maintains the evals. An agent without eval discipline drifts the way a forgotten service drifts: silently, then all at once.

Layer 05

Cowork & Orchestration

Where agents become teammates.

What this layer enables

A creative strategy agent that knows your client history, helps draft positioning, and never gets near pricing or contracts.
A PM agent that tracks tasks, surfaces risks, and drafts status updates, using the same tools and permissions the human PM has.
Cross-team workflows that span the systems your team already lives in (Slack to CRM to calendar) without anyone learning a new app.
Adoption that sticks, because the agent fits an existing job rather than asking a team to invent one.

What it won't do yet

Make a poorly-scoped agent useful. Agents that try to do everything fail; narrow, named coworkers outperform open-ended assistants by a long margin.
Create trust through installation. Teams trust agents they've shaped: ones whose evals they've watched move, whose mistakes they've corrected. Trust comes from time and use.
Compensate for missing layers. A Cowork-style agent without scoped tools or harness discipline is a more confident chatbot, with the same liability profile.
Be your only AI work. The most reliable AI integrations are three to four narrow coworkers, not one wide assistant.

Constants

What stays the same, regardless of where you start.

AI integration is rarely a single project. It's a sequence of smaller ones. You enter the stack at the point that fits your data and your team. These four constants hold either way.

Discovery first

We map your data, your workflows, and what 'working' looks like before we wire in any model. Every project that skipped this step came back to do it later, more expensively.

Narrow scope

One coworker, one workflow, one layer at a time. We ship something useful before we ship something ambitious. Expanding scope after launch is cheap; debugging an over-scoped launch is not.

Human-in-the-loop

Every write tool starts with a human approval step. We remove the step when the metrics say it's safe to, not when the demo asks us to.

Ongoing tuning

AI projects are gardens, not statues. Prompts, Skills, and evals drift as the model updates and your business changes. We stay involved past launch. Anyone who doesn't is selling you the wrong thing.

Where to next

Tell us what you're trying to do.

Start the Conversation