From Semantic Web Struggles to Block Protocol: A New Hope for Structured Data

The Web's Structural Shortcomings

Since the 1990s, the web has primarily served as a platform for publishing documents meant for human eyes. These documents are built with HTML, a language that provides minimal structure—enough to indicate a paragraph, emphasize a word, or create a list. Add in some CSS for visual flair, and you get pages that are pleasant to read but largely opaque to machines.

From Semantic Web Struggles to Block Protocol: A New Hope for Structured Data — Source: www.joelonsoftware.com

Consider a simple example: you mention a book on a webpage.

Goodnight Moon by Margaret Wise Brown
Illustrated by Clement Hurd
Harper & Brothers, 1947
ISBN 0-06-443017-0

A human reader recognizes this as a book reference. But a naive computer program sees only a bold title and some text. There is no explicit indication that this is a published work with an author, illustrator, publisher, and ISBN. That’s the level of structure the web has offered for decades—enough for people, but not enough for machines.

The Semantic Web Vision

As early as 1999, Tim Berners-Lee articulated a dream: a web where computers could analyze all data—content, links, and transactions. In his book Weaving The Web, he wrote:

“I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A ‘Semantic Web’, which makes this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines.”

To realize this vision, the web needed semantic markup—extra information embedded in pages that explicitly describes the meaning of content. Standards like schema.org provided vocabularies for defining entities such as books, people, and events. Developers could then use formats like RDF or JSON-LD to annotate their HTML, telling machines: “Hey, this is a book!”

The Implementation Gap

But theory and practice rarely aligned. Adding these semantic markups was—and still is—hard work. After crafting a beautiful, human-readable blog post, the last thing most creators want to do is wrestle with obscure schemas and syntax. Unless there is already a machine reading their pages, the incentive to add structured data is weak. As a result, after two decades, remarkably few web pages carry meaningful semantic annotations. The Semantic Web remains more promise than reality.

Why The Gap Persists

The core problem is ease of use. Semantic markup feels like extra homework: you must learn a schema, choose a format, and manually embed code. This is a significant barrier for writers, designers, and other content creators who are not technical specialists. Without immediate reward (like better search visibility or rich snippets), most give up.

The Block Protocol: A Fresh Approach

Enter the Block Protocol. This emerging standard aims to close the gap by making structured data a natural, effortless part of content creation. Instead of asking publishers to add markup after the fact, the Block Protocol embeds semantic structure directly into the blocks used to compose a page. A “book block” would automatically encode the title, author, ISBN, and other fields in a machine-readable way—no extra effort required.

How It Works

The protocol defines a system of typed blocks that each know their own meaning. When you insert a block for a book, a person, or a recipe, the block itself carries the semantic schema. The block’s data is stored in a structured format—like JSON—that is simultaneously human‑readable in the rendered page and machine‑parseable in the underlying code. This eliminates the need for separate annotation steps.

Progress And Promise

Early implementations show promise. The Block Protocol is being adopted by content management systems, site builders, and even individual developers. It offers a bridge between the unstructured web of today and the semantically rich web of tomorrow. By lowering the barrier to entry, it could finally realize Berners-Lee’s dream—not by asking everyone to learn RDF, but by integrating structure into the very act of publishing.

As one advocate puts it: “People will only add semantic markup to their web pages if doing so is as easy as writing a sentence.” The Block Protocol may be the tool that makes that possible.

Conclusion

The web’s lack of structure has been a bottleneck for decades. The Semantic Web vision provided a roadmap, but the implementation proved too cumbersome. The Block Protocol offers a pragmatic next step: embed meaning into content from the start, so that machines can finally read what humans write—without adding extra work. If this approach gains traction, the web will become a more intelligent, interconnected space for everyone.

Tags: