BM
Bhavik Mehta
Contact Me
Back to Blog
{ 07 } — AI

Google's Open Knowledge Format (OKF), Explained Simply

2026-07-0110 min read
#AI#AI Agents#Open Standards#Google Cloud

What Is the Open Knowledge Format (OKF)?

The Open Knowledge Format (OKF) is a free, open standard from Google Cloud for writing down what your organization knows so that AI agents can read and use it. An OKF bundle is just a folder of markdown files with a little bit of structured info at the top of each file. No database, no special app, no vendor lock-in — just files anyone (and any AI) can read.

Google published OKF v0.1 on June 12, 2026. The idea is small on purpose, but the problem it solves is big. Let me walk through it in plain language.

Scattered, disconnected knowledge sources on the left resolving into a clean green node-graph of connected markdown files on the right, illustrating Google's Open Knowledge Format for AI agents

The Problem: AI Agents Are Only As Smart As Their Context

An AI model on its own knows a lot about the world in general, but almost nothing about your company. It doesn't know that your orders table joins to customers on customer_id, or that "weekly active users" is calculated a specific way that only one senior engineer remembers.

Today, that knowledge is scattered everywhere:

  • Data catalogs with their own private APIs
  • Wikis, Notion pages, and Confluence docs
  • Comments buried inside code
  • Slack threads and, honestly, people's heads

Every team building an AI agent ends up solving the exact same problem from scratch: how do I gather all this scattered context and feed it to the model? And every catalog vendor rebuilds the same data models in their own incompatible way. Google summed it up well — the knowledge stays "locked behind whichever surface created it."

So when your AI agent tries to answer a simple question like "How do I compute weekly active users from our event stream?", it has to stitch together an answer from a dozen tools that don't talk to each other. Most of the time, it guesses. And a confident wrong guess is worse than no answer at all.

The Mental Model: A Wiki That AI Actually Keeps Updated

Here's the key insight behind OKF. People have tried keeping personal wikis and internal knowledge bases forever. They almost always rot. Why? Because keeping them tidy is boring, repetitive bookkeeping. Someone renames a table, and now fifteen pages point to the wrong place, and nobody wants to fix all fifteen.

Google built OKF on top of what Andrej Karpathy called the "LLM-wiki" pattern. The observation is simple but sharp:

"LLMs don't get bored, don't forget to update a cross-reference, and can touch 15 files in one pass. The bookkeeping that causes humans to abandon personal wikis is exactly what LLMs are good at."

In other words: the reason wikis fail is exactly the thing AI is good at. So instead of asking humans to maintain a knowledge base, you let AI agents help write and update it — and you store it as plain files that live right next to your code.

This pattern was already showing up everywhere on its own: Obsidian vaults wired into coding agents, CLAUDE.md and AGENTS.md files in repos, "metadata-as-code" folders on data teams. OKF just gives that messy, organic pattern one shared shape so different tools can agree on it.

How OKF Works: Just Folders, Files, and Markdown

The whole format is deliberately boring, and that's the best thing about it. An OKF "bundle" is a directory tree that looks like this:

sales/
  index.md
  datasets/
    index.md
    orders_db.md
  tables/
    index.md
    orders.md
    customers.md
  metrics/
    index.md
    weekly_active_users.md

That's it. Each .md file describes one concept — a table, a dataset, a metric, a playbook, a runbook, an API endpoint, or literally anything you want to capture. Because it's markdown, it renders nicely on GitHub, opens in any text editor, and can be searched with normal tools. And because it's just files, you can ship it as a .tar.gz, host it in a git repo, or mount it on any filesystem.

What a Single Concept File Looks Like

Every concept file has two parts: a small block of structured info at the top (called YAML frontmatter), and a normal markdown body below it.

---
type: BigQuery Table
title: Orders
description: One row per completed customer order.
resource: https://console.cloud.google.com/bigquery?p=acme&d=sales&t=orders
tags: [sales, revenue]
timestamp: 2026-05-28T14:30:00Z
---
 
# Schema
| Column | Type | Description |
|--------|------|-------------|
| order_id | STRING | Globally unique order identifier. |
| customer_id | STRING | Links to the customers table. |
 
# Joins
Joined with [customers](/tables/customers.md) on customer_id.

The top part is machine-friendly. The bottom part is human-friendly. The magic is that it's the same file — a person and an AI agent read the exact same thing, with no translation step in between.

The One Rule You Can't Break

OKF is almost aggressively relaxed about what you must include. There is exactly one required field: type. Everything else — title, description, resource, tags, timestamp — is optional but recommended.

The type field just tells a reader what kind of thing this is: BigQuery Table, API Endpoint, Playbook, Reference, whatever you like. Nobody registers these types with a central authority. You pick descriptive names, and tools are expected to gracefully handle types they've never seen before.

Linking Concepts Together

Concepts connect to each other using ordinary markdown links:

Joined with [customers](/tables/customers.md) on customer_id.

A link starting with / is read from the bundle's root. A link starting with ./ is relative to the current file. Every link is treated as a simple connection between two concepts, and the surrounding sentence explains what the relationship actually means.

This turns a flat folder of files into a graph — a web of connected knowledge that's richer than the folder structure alone. That graph is exactly what an AI agent needs to reason about how your systems fit together.

Note: OKF says broken links are fine. A link can point to a concept that hasn't been written yet — it just marks something worth documenting later. This is a big deal, because it means agents can generate knowledge in pieces without breaking the whole bundle.

Two Special Files

There are two reserved filenames you'll see in bundles:

  • index.md — a table of contents for a folder. It lists the concepts inside with short descriptions, so an agent can scan a summary before diving into full files. This is called progressive disclosure — show the map first, the details on demand.
  • log.md — a change history, newest entries first, grouped by date. It records what was added or updated and when.

Both are optional. If they're missing, nothing breaks.

Why "Boring on Purpose" Is the Whole Point

It would have been easy for Google to ship yet another cloud service with an SDK, an account, and an API you have to integrate against. They deliberately didn't. OKF is built on three principles worth calling out:

  1. Minimally opinionated. It only requires the type field. It doesn't force you into a fixed schema, so it can describe knowledge Google never imagined.
  2. Producer and consumer are independent. The format is the contract. Whatever tool writes the files and whatever tool reads them can be swapped out separately, as long as both agree on the format.
  3. It's a format, not a platform. No proprietary account. No required SDK. It's vendor-neutral, which means it survives moving between systems and companies.

That last point is the quiet revolution here. Because OKF is just text files in version control, your organization's knowledge stops being trapped inside one vendor's product. You can hand a bundle to a partner company, and their agents can read it too.

What Google Is Shipping Alongside the Spec

A spec on its own is just a document. To make OKF useful on day one, Google also released a few reference tools:

  • An enrichment agent that walks through your BigQuery datasets and automatically drafts OKF files for your tables and views — filling in schemas, descriptions, and how tables join together.
  • A static HTML visualizer that turns any OKF bundle into an interactive graph you can click through in your browser. No backend, no data leaving your machine.
  • Sample bundles built from real public datasets (GA4 e-commerce, Stack Overflow, and Bitcoin) so you can see complete, working examples instead of toy snippets.

Google's own Knowledge Catalog product can also ingest OKF bundles and serve them to agents, but — importantly — you don't need it. The format works fine on its own.

The Honest Limitations

OKF is genuinely useful, but it's early, and pretending otherwise would be dishonest.

It's v0.1, not a finished standard. Google says plainly this is "a starting point, not a finished standard." The format will change as more people build tools around it. If you adopt it now, expect to update your bundles as the spec matures.

"Minimally opinionated" cuts both ways. Because the only required field is type, and because anyone can invent their own type names, two teams can describe the same kind of thing in totally different ways. There's no central dictionary of types. That freedom keeps the format flexible, but it means real interoperability depends on communities settling on shared conventions over time — something a spec can't force.

Some critics call it "just a folder." And they're not entirely wrong. There's no compression, no runtime, no clever indexing built in. Whether OKF becomes a true industry standard or stays a nice convention depends on whether other vendors actually adopt it, not on the format itself. A standard only matters when more than one company uses it.

Stale knowledge is still a risk. OKF makes knowledge easy to write. It doesn't guarantee anyone keeps it accurate. The timestamp field and log.md help you spot old content, but an out-of-date OKF file will still confidently mislead an agent. The LLM-wiki idea assumes agents help keep it fresh — that only works if you actually wire that loop up.

When Should You Actually Use This?

Use OKF if you're building AI agents that need real, company-specific context and you're tired of every agent reinventing the same messy context-gathering step. It shines when your knowledge already lives close to code, or when you want that knowledge to be portable across tools and teams.

Skip it — for now — if you have a single, tightly-integrated stack where one vendor's catalog already serves your one agent well. In that narrow case, adopting an early v0.1 format adds work without much payoff yet.

If you want to see where this fits into a broader AI stack, it pairs naturally with the retry logic, prompt versioning, and cost tradeoffs I covered in Practical Patterns for Building with AI APIs. And if you'd like to see what I build with these ideas, take a look at my projects.

So Is OKF Actually New?

Honestly? No. And that's the most interesting thing about it.

Google didn't invent a technology here. They looked at what the whole industry was already doing — CLAUDE.md and AGENTS.md files in repos, Obsidian vaults wired into coding agents, /docs folders every team quietly depends on, metadata-as-code on data teams — and gave it a name. We've all been feeding agents context through plain .md files for a while now. Nobody planned it. It just happened, because it works.

You can be cynical about this. There's a real argument that OKF is Google planting a flag on an open convention so its own Knowledge Catalog product becomes the natural place to host and serve those bundles. Name the standard, ship the "reference" tooling, and you're suddenly the vendor everyone integrates with — for a pattern nobody owned before. The format is free and open; the gravity it creates toward Google Cloud is not an accident.

But here's the other side, and I think it's the truer one: the pattern was everywhere, and everyone did it slightly differently, so none of it was portable. Your AGENTS.md and my CLAUDE.md and their data-team wiki couldn't talk to each other. OKF's contribution isn't invention — it's streamlining a workaround the entire industry independently stumbled into, and giving it just enough shared structure (the type field, the linking rules, the reserved files) that different tools can finally agree.

That's not nothing. Sometimes the most valuable move isn't building new technology — it's naming the thing everyone's already doing so it stops being fifty incompatible dialects and becomes one language. Whether this specific spec wins doesn't really matter. The idea it formalizes — that organizational knowledge should be plain, portable, and agent-readable — was already winning before Google gave it a logo.

References


I use AI tools to help research and draft posts. The ideas, opinions, and takes are mine. Verify anything technical or time-sensitive before acting on it.