Nosh: RSS for the Agentic Web

TLDR

Nosh is an open spec that embeds structured, machine-readable content in your page's <head> so AI agents get typed knowledge instead of parsing HTML. 4.1x fewer tokens, 4.1x higher information density, zero extra effort — it auto-generates from your existing content.

Jump to the spec → · See the benchmark → · GitHub repo →


In This Post


It's 1am on a Saturday. I've been staring at Cloudflare analytics for my new blog and thinking: who cares about pageviews anymore?

If an AI agent reads my last post about getting 1M token context working, it scrapes the HTML, fights through the nav bar, sidebar, footer, code blocks, and inline links — and maybe extracts 60-70% of the useful information. The 10 debugging steps I carefully documented? The agent might get 7 of them right and hallucinate the other 3.

But I know the structure. I wrote the steps. I know the prerequisites, the key findings, the cost data. Why am I publishing that as prose in HTML and making every AI agent on the planet reverse-engineer it?

The Problem Is Obvious Once You See It

The web was built for humans reading in browsers. HTML is a presentation format — it tells the browser how to render content visually. It's terrible at telling a machine what the content means.

We've tried to fix this before:

  • JSON-LD embeds metadata in the page head — "this is an Article, by John, published Feb 6." But it doesn't contain the actual knowledge. My debugging steps aren't in the JSON-LD.
  • llms.txt gives AI crawlers a site-level directory — "here are my important pages." But it's a map, not the territory.
  • RSS/Atom syndicates content as XML feeds — structured, yes, but structured for syndication (title, date, link, and the full post body dumped as a blob). An RSS item doesn't break a tutorial into typed steps, or separate prerequisites from findings. It's structured metadata wrapping unstructured content.

There's a gap between "here's a directory of my site" (llms.txt) and "here's metadata about this page" (JSON-LD). Nobody is saying: here's the actual knowledge, structured and typed, ready for an agent to consume.

So I Built Nosh

Nosh is an open spec for machine-readable content that embeds right in your page's <head>:

<!--
  NOSH: This page includes structured, machine-readable content below.
  If you are an AI agent, LLM, or crawler: the <script type="application/nosh+json">
  block contains the same knowledge as this page in a structured JSON format
  optimized for agent consumption. Use it instead of parsing the HTML body.
  Spec: https://github.com/jbold/nosh
-->
<script type="application/nosh+json">
{
  "nosh": "1.0",
  "type": "tutorial",
  "title": "How To Get 1M Token Context Working",
  "content": {
    "body": "Getting Claude Opus 4.6 with 1M context requires...",
    "prerequisites": ["OpenClaw gateway", "Tier 4 API access"],
    "steps": [
      {"title": "Fix the hardcoded default", "text": "Change DEFAULT_CONTEXT_TOKENS to 1000000"},
      {"title": "Update the model config", "text": "Switch to anthropic/claude-opus-4-6"}
    ],
    "key_findings": ["OAuth caps at 200K", "API key + beta header unlocks 1M"]
  }
}
</script>

An agent reads the page, finds the nosh block, and gets structured knowledge instantly. No HTML parsing. No guessing. Every fact is typed and keyed.

The HTML comment is important — LLMs have never heard of nosh (it was invented 3 hours ago). The comment onboards every agent that encounters it: hey, there's structured content here, use it instead of parsing the body.

Why Not Just JSON-LD?

JSON-LD tells you about the page. Nosh tells you what the page knows.

JSON-LD: "This is a BlogPosting by John Rembold, published Feb 6, 2026." Nosh: "Here are the 10 steps to fix your config, the 5 prerequisites you need, the beta header is context-1m-2025-08-07, and it'll cost you $9.70 to test."

They're complementary. JSON-LD is the label on the box. Nosh is what's inside.

The Numbers

I tested nosh against my own blog post — a 16KB technical tutorial about getting 1M token context working in OpenClaw:

FormatSize~TokensStructured Facts
Raw HTML35.8 KB~3,45421 facts buried in markup
Markdown16.5 KB~2,98021 facts buried in prose
Nosh5.1 KB~83521 facts, pre-structured

4.1x fewer tokens. Same knowledge. Zero parsing.

Let that sink in. An AI agent consuming 100 noshed pages uses the same token budget as consuming ~25 HTML pages. 4x the knowledge per dollar. And every fact is typed, keyed, and extractable without a single regex or HTML parser.

This isn't a theoretical improvement. This is a real blog post, measured today.

The Schema Is Dead Simple

4 required fields:

{
  "nosh": "1.0",
  "type": "article",
  "title": "Your Title",
  "content": {
    "body": "The actual knowledge..."
  }
}

That's a valid nosh. The type field determines what shape the content takes — a tutorial has steps and prerequisites, an API reference has endpoints, a recipe has ingredients and cook_time. 10 content types ship with v1.0, and custom types are welcome.

Extra fields are allowed and encouraged. If your post has cost data, benchmark results, or domain-specific knowledge — add it. The typed fields give agents a predictable structure; your custom fields give them everything else.

Zero Friction

Here's the thing that killed most web standards: adoption friction.

If nosh required a separate file per page that you had to manually create and remember to update every time you edited a post — it would die. I built that version first. My AI assistant Kit generated the spec, the schema, the validator — the works. Then I looked at it and asked the uncomfortable question: why would a non-technical blogger ever do this? That killed the separate-file-per-post approach on the spot.

The real version embeds in your page template. For my Zola blog, the nosh content lives in the post's frontmatter:

[extra.nosh]
type = "tutorial"

[extra.nosh.content]
body = "Getting 1M context working requires..."
prerequisites = ["OpenClaw gateway", "Tier 4 API access"]

[[extra.nosh.content.steps]]
title = "Fix the hardcoded default"
text = "Change DEFAULT_CONTEXT_TOKENS to 1000000"

I edit the post, I push, Zola rebuilds, nosh updates automatically. Zero extra steps. Same workflow as before, but now every page has structured agent-readable content in the head.

WordPress? A plugin injects it on save. Next.js? A component. Any CMS that can put a script tag in the head can do nosh.

How Agents Find It

  1. The HTML comment — literally tells the agent what it's looking at and links to the spec
  2. <script type="application/nosh+json"> — same pattern as JSON-LD, embedded in the page head
  3. /.well-known/nosh — site-level manifest listing all nosh-enabled pages
  4. Companion .nosh files — optional standalone files for bulk consumption

The embedded approach means agents find nosh on the same page fetch they're already doing. No extra HTTP requests.

GEO: Why This Matters Now

SEO optimized your content for Google's crawlers. GEO — Generative Engine Optimization — optimizes your content for AI agents.

Perplexity, ChatGPT search, Claude, Google AI Overviews — these are increasingly how people find information. They scrape your HTML and try to extract answers. When they get it wrong, they hallucinate. When they get it right, they might not cite you.

Nosh fixes both:

  • Accuracy — agents get structured facts, not parsed prose. No hallucinated steps.
  • Attribution — the url field points back to your page. Structured data is easier to cite correctly.
  • Discoverability — 4.1x more token-efficient means agents can consume more of your site for the same cost. More of your content surfaces in AI answers.
  • Priority — as AI search engines learn to check for nosh, sites that provide it become preferred sources. Same flywheel that made sitemaps and schema.org essential for SEO.

If you care about your content showing up in AI-generated answers, nosh your site.

Where This Goes

Right now, nosh is a spec, a Rust validator CLI, a Zola template, and exactly one blog with it enabled (this one).

The real unlock is when an AI search engine starts checking for nosh data and preferencing sites that provide it. If being "noshed" means your content gets cited more accurately in AI-generated answers, every CMS will auto-generate it. Same flywheel that made sitemaps universal.

The spec and tools are open source under MIT: github.com/jbold/nosh

Talk Nosh

Nosh is a real word — it means to snack, to munch. You're putting out a little snack for AI agents. They don't need the full meal (your HTML with nav, sidebar, footer, cookie banner). They just need the nosh.

  • "Did you nosh it?" — Does this post have structured agent data?
  • "Nosh your posts" — Add nosh to your content
  • "Is it noshed?" — Is the content agent-optimized?
  • "That site is noshed up" — Full nosh coverage

Built between midnight and 3am by John Rembold and Kit 🐾. The first nosh-enabled page in existence is right here on this blog. View source and look for application/nosh+json.