AI Brand Governance: Beyond Design.md

Developers have quietly solved AI brand governance in code. A design.md file dropped in a repo, plus an IDE, a build pipeline, and a technical eye, and AI-generated output ships on-brand. That same fix fails everywhere else: in Claude Projects, in Custom GPTs, in the embedded AI sitting inside Outlook, Google Docs, PowerPoint, and Slack. Those surfaces govern ninety-five percent of where AI actually puts your brand, and a markdown file cannot reach them. The fix is structural, and it has a name: brand runtime.

There is a pattern forming inside every company shipping AI-assisted work, and the people closest to it usually do not recognize it as a pattern. It’s this: the developers and the technical operators have figured out on-brand generative design, and the rest of the org has not.

Watch a savvy frontend engineer or a designer-developer use a coding tool. They quickly grab the right design source files and references, and drop a design.md file and a folder of assets into the repo. They may have even accidentally or on-purpose created some additional files that tell AI coding tools to actually read those files automatically. They even update and refine rules over time: no centered body copy, use the wordmark not the logomark, here is the gradient. They generate a landing page or a microsite, and the output is on-brand. It won’t beat what your senior designer would do, but the output is on-brand. Inside a working coding tool, with a pile of .yaml files, some markdown files and a folder of images, they have assembled something that gets them most of the way there.

This makes AI-forward brand and content teams feel optimistic. AI’s visual marketing moment has arrived, and they want in. But as the rest of the enterprise org’s magical discovery of “oh wow, it’s making a PPT deck for me” turns to “how do I get this in our branding?” it makes some leaders nervous. Because, the things that work surprisingly well and consistently in AI coding environments only govern maybe five percent of where your brand is being applied (or not) with AI.

The other ninety-five percent is in chat and in cowork/copilot environments. It’s your content team using Claude or ChatGPT to draft .PPTX, one-pagers, “kinda cool” .html files, and customer emails. It’s your sales team using Gemini to turn discovery calls into semi-branded proposal docs and decks. It’s your product marketer in Copilot turning research notes into a slide deck. In many restricted orgs, few of those people are in a coding tool. Many of them don’t have a design.md file let alone a way to always get the current one. None of them have a build step. And every single one of these outputs is a brand artifact going to a customer.

The format that works pretty well in code doesn’t quite work in the other AI tools used by most general business users. And the reason is structural, not a matter of better prompting.

What code actually solved

A coding tool reading a design.md works because the file is not doing the heavy lifting. The IDE is. The build is. The component library is. The developer is.

Think about what surrounds a markdown file in a code environment:

A filesystem that already scopes context. Your repo is the project. There is no ambiguity about which brand applies.

A build gate that catches anti-patterns. Preflight scripts, lint rules, CI checks. If the AI emits a centered body element or a forbidden token, a decent agentic system checks and corrects it (or increasingly, doesn’t introduce these elements in the first place). The governance is in the pipeline, not in the document.

A component system that enforces tokens at the framework layer. You do not have to remind a developer not to use #FF6B35 directly. They import a Button, and the Button knows the color. Tailwind themes, design token packages, shadcn registries. The tokens are mechanical to apply.

A technical designer who can read the brand and interpret edge cases. If the AI emits something that is technically compliant but creatively off, the designer notices and fixes it. They are not asking the markdown file alone to do the noticing.

In code, design.md does not have to be smart. It has to be present. The IDE, the build, and the developer do the rest. The markdown file is the least important piece of governance in the entire stack, even though it is the piece everyone points to.

Why chat falls apart

Now picture the same brand markdown loaded into Claude Projects, a Custom GPT, a Gemini Gem, or a Notion AI workspace. Strip away the IDE, the build, the components, and the technical user. What is left?

A static document. Loaded every session, every prompt, every output. There is no scope. The marketer writing a LinkedIn post gets your motion design tokens, your PDF pagination rules, and your favicon specs, all consuming the same context window as your voice rules and proof points..

There is no build gate. If the LLM drafts a claim that is still in market test, the voice.md and design.md alone won’t catch it. If it uses the gradient where it should have used the flat color, nobody catches it because these outputs aren’t being routed for design review. The gate is a human reading the output afterward, which means the gate is “did the author remember to check.”

There is no component system. The visual rules exist as descriptions of components, not as components themselves. “Use the H1 style for hero headlines” only works if the model knows what your H1 style is, applied where, in what container, with what spacing. The markdown can describe it, and with the right custom instructions, the chat tool might even follow it…until it doesn’t. It might read the svg code and create the logo faithfully, or it might choose to make it’s own approximation.

I believe these errors and issues are usually well-intentioned people who’ve been told to do more with less, and are excited to be able to deliver visual communication of ideas faster (or for the first time on their own). And, most people in a typical business are not technical, or don’t have an abundance of time or even the org-level permissions to create the durable fix in their allowed AI tools. They are a content marketer, a sales rep, a CSM, a product marketer writing copy at 11 PM. They ask for a draft and they expect it to ship, because that is the promise of chat. They want first-draft compliance, and a markdown file is not going to give it to them.

The format that worked in code does not work in chat for one reason: chat strips away most of the governance and tools required for on-brand AI output.

Why cowork is the hardest case

Cowork is the surface that should worry every brand leader the most, and it is the surface most companies are not even thinking about yet.

Cowork is the embedded AI inside the tools where work already happens. The compose box in Outlook. The drafting pane in Google Docs. The slide assistant in PowerPoint. The reply suggester in Slack. The recap generator in your CRM. The agent that triages your support inbox. These are not optional surfaces. They are not opt-in. They are getting deployed at the org level, on by default, for thousands of users at once.

The user did not summon the AI. The AI is sitting inside the artifact they were already going to make. They open a slide and the AI offers to draft it. They start a doc and the AI suggests the opening. They reply to a customer thread and the AI completes the sentence. Every one of those moments is a brand application event, and there is no “load my brand kit” step.

Cowork needs brand governance that arrives without being requested. It has to be org-deployed, not per-user uploaded. It has to be surface-aware, because the rules for a sales email differ from the rules for a board memo. It has to be confidence-tiered by audience, because what you can claim externally is not what you say internally. It needs to be auditable, because the head of brand needs to see what shipped this week. And it has to be admin-controlled, because the IT and security teams will not deploy something they cannot govern at the policy layer.

A markdown file in a shared drive cannot deploy at that altitude.

The hierarchy of AI brand governance

If you map out what should govern brand application at each AI surface, the framework is the same in every column, but the implementation is completely different. There are four layers.

Layer 1: The identity source of truth

Layer one is the identity source of truth. The brand itself. Tokens, narratives, proof points, voice rules, anti-patterns, application rules, performance evidence. This lives in one place. It is structured, versioned, and queryable. It is the canonical source. Notion, a structured repo, a graph database, whatever. The point is that there is one of it, and it does not get edited per session, per surface, or per team.

Layer 2: The distribution pack

Layer two is the distribution pack. The same source compiled to a surface-specific bundle. A .brand folder for code agents. A Chat Pack for Claude Projects and Custom GPTs. A Gem Pack for Gemini. A NotebookLM pack for citation-grounded answers. An MCP (Model Context Protocol) server for live retrieval. A Cowork plugin for the org’s existing AI deployments. The format changes by surface; the source does not. When the brand updates, every pack recompiles. There are no rogue copies drifting in fifty private workspaces.

Layer 3: The runtime context

Layer three is the runtime context. This is what the AI actually loads at the moment of creation, scoped to the task. A static social graphic loads typography, color, and asset class rules, not motion. A LinkedIn post loads voice, persona, and Active proof points, not the brand history. A slide deck loads layout patterns and the audience-appropriate claim tier, not the web kit. The runtime is the smart part of the system. A markdown file cannot do this. A loader can. The savvy code-tool user has assembled this themselves through their IDE and their judgment. Chat and cowork users do not have an IDE. The platform has to provide it.

Layer 4: The output gate

Layer four is the output gate. Preflight in code. Rubric scoring in chat. Admin policy in cowork. Confidence-tier enforcement at the deployment layer. The gate type changes by surface; the principle is constant. Catch the violations before they ship. Log the misses. Feed the misses back into the source. This is what turns a brand system into a system that actually compounds. Without it, you are generating volume at the same quality plateau and calling it productivity.

Every AI surface needs all four layers. The code surface has them because the surrounding infrastructure provides them almost without you even trying for them. The chat and cowork surfaces have to be designed for them on purpose.

Verbal and visual, fired at the same moment

There is a second reason the code workaround obscures the real problem. Code is mostly a visual surface. Tokens, components, layout. The copy is whatever the developer types or pastes from a doc. Verbal identity barely shows up in code, which is why a design.md can get away with being a visual spec.

Chat and cowork do not work that way. Every artifact a knowledge worker produces with AI fires both halves of the brand at once.

The slide needs the right typography and the right narrative for the audience. The email needs the right tone and the right call to action and the right proof point at the right confidence tier. The blog post needs the right opening pattern, the right asset class, the right link to product, and a closing that does not use a phrase your brand has retired. The customer recap needs the right voice register and the right framing of what the customer said versus what you concluded.

A brand system that only governs visual fails in chat and cowork. A brand system that only governs verbal fails in chat and cowork. The two halves have to be governed by the same source, compiled into the same pack, and fired into the same runtime call. Most companies have these as two separate documents owned by two separate teams, neither of which knows that AI is now applying both halves to every artifact a knowledge worker produces.

The cost of having design and copy as separate governance disciplines used to be inconsistency at the margins. The cost now is brand drift at scale, on every surface, every day.

Where this is going

Three shifts are converging, and they are visible already in any company taking AI deployment seriously.

The brand becomes a runtime, not a document. The PDF goes away. The Figma library does not go away, but it stops being the source of truth. The source becomes a structured, queryable system that compiles to every surface where AI touches the brand. The artifact your designers used to ship is downstream of the artifact your platform team now owns. This is uncomfortable for a lot of brand leaders. It is what lets the brand actually scale.

Governance moves from documentation to infrastructure. Anti-patterns become preflight rules. Voice rules become rubric scores. Confidence tiers become deployment gates. The brand team stops chasing violations across twelve surfaces and starts owning the source that all surfaces compile from. The work shifts from policing output to engineering the system that produces it.

The org chart adjusts. Brand becomes a platform discipline. The head of brand owns a runtime, not a guideline. Creative ops owns the gates, not the review queue. The CMO stops asking “are we on brand this quarter” and starts asking “what did the runtime catch this week and what slipped through.” This is the same arc that happened in product analytics fifteen years ago and in martech ten years ago. It is now happening in brand, and it is happening in months, not years, because AI is forcing the timeline.

What to do if you are reading this and recognizing the problem

If your dev team has stumbled into a workable AI brand compliance pattern in code, that is good news. It means at least one part of your organization has figured out the right approximate shape: scoped context, build gates, component-level enforcement, technical judgment in the loop. Keep that. Do not dismantle it. But understand what it is governing and what it is not.

If you have not extended that thinking into chat and cowork, you have brand governance for the smallest share of your AI surface area. The rest is generating customer-facing content right now, on platforms you do not see into, with no gate between the output and the recipient.

The work to fix it is not a procurement exercise. It is a structural one. You need a single source of brand truth that is queryable, not narrative. You need distribution packs for every AI surface your org actually uses, not the one your most technical team prefers. You need a runtime that scopes context to the task, not a markdown file that dumps everything. And you need gates that match the surface, because preflight in code, rubric scoring in chat, and admin policy in cowork are different mechanisms solving the same problem.

This is the work we are doing inside Brandcode (working title), our framework for portable, runtime brand governance. It is also the work the brand engineering practice at Column Five was built to do for clients whose AI rollout is outpacing their ability to govern it. The brand you have already invested in is not the problem. The container you are storing it in is.

The investment deserves a runtime, not a readme. And the runtime needs to reach every surface where AI puts your brand in front of someone, not just the one your developers happen to control.

Frequently asked questions

What is AI brand governance?

AI brand governance is the set of mechanisms that keep AI-generated outputs on-brand across every surface where AI is producing content for your company: code agents, chat tools, and the embedded AI inside Outlook, Google Docs, PowerPoint, and Slack. It is not a single document. It is a system with four layers: an identity source of truth, distribution packs per surface, a runtime that scopes context at the moment of creation, and an output gate that catches violations before they ship.

Why doesn’t a design.md or brand.md file work outside of code?

Inside a coding tool, the markdown file is the least important piece of governance in the stack. The IDE, the build pipeline, the component library, and the developer are doing the actual work of enforcing the brand. Chat and cowork strip away the IDE, the build, the components, and the technical user, and ask the markdown file to do all four jobs at once. A static document cannot replace four layers of governance infrastructure.

What is a brand runtime?

A brand runtime is a system that compiles a single source of brand truth into surface-specific bundles, loads only the rules relevant to the task at the moment of creation, and runs an output gate matched to the surface. It replaces the static brand PDF and the brand-guidelines microsite with infrastructure that AI tools can actually consume and apply at the moment a brand artifact gets created.