In our previous post, “Architecting for AI,” we laid out a blueprint for a modern Enterprise Information Architecture built on three essential layers: Data, Information, and Knowledge. This architecture provides the trustworthy foundation needed for both human collaboration and reliable AI. But for that foundation to be solid, the Information Layer (where metadata is held and organized) must be built with discipline and a forward-thinking strategy.
So, let’s zoom in on that foundational layer. We’re going to explore a crucial shift: moving from treating data as a technical afterthought to managing it as a strategic asset. That journey starts by admitting that our old tools have let us down and that it’s time for a new, product-centric way of thinking.
A New Foundation: Data as a Product
The fundamental shift required in the Information Layer begins with a new unit of value: the data product. Unlike a raw table or a one-off report, a data product is an asset engineered from the ground up to be consumed. It’s treated with the same rigor as a software product, which means it must embody a set of core principles, often called its “-ilities”:
- Discoverable, Addressable, and Trustworthy: At a minimum, a consumer has to be able to find the product, access it through a stable interface, and trust that its quality meets defined standards.
- Reusable and Composable: This is where the real value is unlocked. Reusability prevents teams from constantly reinventing the wheel, saving time and effort. But composability is even more powerful; it allows teams to confidently combine multiple data products into new, more complex solutions, accelerating innovation in ways previously unimaginable.
Embracing this product-centric mindset immediately forces a critical question: Are the tools we’ve relied on for years, especially the traditional data catalog, actually designed to support this new world of curated, composable products?
The Broken Promise of the Traditional Data Catalog
For years, the data catalog was hailed as the definitive solution for data discovery. The promise was a centralized, searchable registry where anyone could find the data they needed, understand its structure, and put it to use. The reality? Most data catalogs never delivered in complex scenarios. They became digital graveyards of metadata, plagued by a fatal disconnect between the data itself and the documentation meant to describe it.
The core problem was that they were designed as passive, after-the-fact inventories. A data engineering team would create a dataset, and a separate governance team would attempt to document it later. This separation, combined with automated harvesting, led to critical failures:
- From Registry to Junkyard: Automated harvesting created thousands of unmanaged assets. Without curation, the effort to find anything valuable retroactively was simply overwhelming.
- Governance as a Bottleneck: Instead of enabling people, governance efforts created complex, manual processes that everyone saw as a burden, not a benefit.
- The Trust Deficit: With outdated information and missing context, users quickly learned they couldn’t trust the catalog. They stopped using it, and the investment became worthless.
The failure wasn’t in the catalog’s design, but in our expectations. We asked a tool built for passive, technical metadata to manage dynamic, human concepts. We burdened it with accountability and ownership (which are organizational challenges) and demanded deep semantic understanding when it was only equipped to show the technical what.
However, this doesn’t render the traditional catalog obsolete; its role is simply being redefined. It still serves a critical function as a foundational, technical registry. It excels at answering ground-level questions: “What is the precise schema of this table?” or “Which columns exist and what are their data types?” In this capacity, it’s an indispensable map of the technical terrain. We just need to stop asking it to be more than it is.
A Paradigm Shift: From Raw Data to Managed Products
So what’s the solution? It’s simple, at least in concept: Stop treating data like a technical byproduct and start treating it as a first-class product. A true data product isn’t just a table or a file. It’s a durable, reusable digital asset that is designed for consumption. Like any good product, it has a clear owner, a defined lifecycle, and a commitment to quality through Service Level Agreements (SLAs).
To be effective, a data product is intentionally engineered for reuse and composability. It’s not built for a single report but as a reliable building block for other teams to discover and integrate into their work. This shift from a project mindset to a product mindset is the first and most important step in building a modern data architecture. To make it a reality, we need new processes and tools built to treat data products as first-class citizens.
“Shift Left”: How to Make Data Products a Reality
Creating high-quality data products requires a fundamental change in process, one that directly corrects the failures of the past. For too long, governance has been a reactive, centralized function—an afterthought applied downstream from data creation. This model was bound to fail because the teams trying to govern lacked context, and the teams creating the data lacked any real incentive to care about its eventual use.
The shift to a data-as-a-product model, with its decentralized ownership and accountability, provides the perfect opportunity to fix this. We can finally “shift left,” embedding governance duties directly into the development lifecycle of the data product itself. This is made practical through the Data Product Descriptor.
More than just a schema or a contract, the descriptor is a comprehensive specification that defines the product in its entirety before it is built. It captures everything: the technical schema, the business meaning, the ownership, quality metrics, and access policies.
By making the descriptor a mandatory part of the development workflow, governance becomes a prerequisite, not a cleanup task. Crucially, it places accountability where it belongs. The domain teams building the data product are now also formally accountable for how it is represented and consumed. What they publish via the descriptor is what appears in the Data Product Catalog and Marketplace. This ensures the information is not only accurate at the moment of creation but is also maintained by the people who know the data best.
This kind of public accountability sparks a powerful cultural shift. Suddenly, a team’s work isn’t just another forgotten entry in a sea of thousands of tables. It’s a product on a shelf. This visibility naturally encourages a process of curation and pride. A well-designed catalog and marketplace don’t just display data; they incentivize teams to polish, document, and support their data products, knowing they represent a formal offering to the organization.
Putting It All Together
This new product-centric approach doesn’t mean we discard old tools, but rather that we evolve them into a more sophisticated, multi-layered architecture. A modern data hub consists of three distinct layers, each with a specific purpose.
1. The Foundation: The Technical Data Catalog At the bottom of it all, the traditional data catalog still has a job—it just isn’t the one we originally gave it. Think of it as a low-level, automated registry of physical assets. It’s perfect for answering purely technical questions: “What’s the schema?” or “Where is this asset located?” We stop asking it to understand business meaning or ownership—duties it consistently failed at and let it excel as the map of our technical terrain.
2. The Curated Shelf: The Data Product Catalog The Data Product Catalog is where information from the Data Product Descriptor comes to life. It’s where the Data Product Team’s accountability is made real. They populate this layer with the crucial context the technical metadata lacks: clear ownership, semantic meaning, lineage, quality scores, and SLAs. Unlike the passive registry beneath it, the Data Product Catalog is built from the ground up to promote and enable reusability. It’s not just a list; it’s a high-trust showcase of certified products ready for others to build with.
3. The Consumer Storefront: The Data Marketplace The final layer is built upon the curated catalog to serve the end user. If the catalog is the well-organized warehouse, the marketplace is the intuitive storefront. Its purpose is to make finding and using data products as seamless as possible. The marketplace handles access control, tracks usage to reveal which products are most valuable, and provides an e-commerce-like experience for the data consumer. It is the primary interface for discovery and consumption across the entire enterprise.
By progressing on this journey—from fixing the failures of the old catalog, to embracing a product-first mindset, and finally to building this modern, three-layer hub—organizations can construct an Information Layer that is not only organized but also resilient, scalable, and truly ready for the future.