Lakehouse format convergence & data interoperability

Open table formats like Delta Lake and Apache Iceberg are converging to enable true interoperability across data platforms.

Introduction

Storage formats are no longer just file layout details—they are architectural decisions. In today’s cloud-native data ecosystems, open table formats like Delta Lake and Apache Iceberg have become the foundation for how enterprises enforce schema, manage transactions, control data versions and interoperate across engines. But until recently, these formats were siloed—fragmenting teams, duplicating data and constraining platform design.

That’s changing.

With Delta Lake UniForm, Iceberg v3 and major updates from Databricks and Snowflake in 2025, the industry is converging on a new principle: write once, read anywhere. For platform architects and data leaders, this signals more than convenience—it redefines the relationship between storage, compute and governance.

Format fragmentation: The history of open table formats

Delta Lake and Iceberg were each born to solve real problems:

  • Delta Lake offered transactional reliability for cloud data lakes and deep integration with Apache Spark.

  • Iceberg focused on distributed metadata, partition evolution and multi-engine compatibility from the ground up.

But for years their incompatible metadata structures and protocol differences forced organizations to choose—or worse, to maintain parallel datasets across formats and engines.

Delta Lake UniForm: Interoperability for the lakehouse

With the introduction of Delta Lake UniForm, a single Delta Lake table can now expose Iceberg-compatible metadata, enabling multiple compute engines like Snowflake, Databricks, Trino and BigQuery to read from it as if it were a native Delta Lake or Iceberg table format.

Key capabilities:

  • One physical dataset, multiple compatible metadata layers (Delta + Iceberg + Hudi)

  • No duplication or format-specific copies

  • Asynchronous metadata syncs with minimal write overhead

  • Fully open source and available in Delta Lake 3.x

UniForm doesn’t convert the table—it extends its compatibility. This transforms the format from a binding decision into an implementation detail.

Diagram showing Delta Lake UniForm metadata layers. A single copy of table data stored in Parquet format supports multiple metadata layers including Delta Lake, Iceberg and Hudi.

Apache Iceberg’s expanding role in the open data format ecosystem

The growing reach of Delta Lake UniForm doesn’t replace Apache Iceberg—it amplifies it. Instead of competing head-on, Iceberg is evolving to coexist and integrate within this new interoperability landscape.

Iceberg’s momentum remains strong:

Rather than becoming obsolete, Iceberg is becoming more interoperable—embraced by multiple engines, governed by multiple catalogs and integrated into shared storage layers

This alignment sets the stage for format convergence to scale—not through format replacement but through catalog-centered design!

The moment of interoperability: Databricks acquires Tabular and aligns Delta with Iceberg

In June 2024, Databricks acquired Tabular, the company founded by the original creators of Apache Iceberg. This move wasn’t about vendor consolidation—it was a strategic acknowledgment of where the open lakehouse is heading. For the first time, Delta Lake and Iceberg communities are aligned under one strategic direction.

The move signals more than consolidation. It reflects a shift toward:

  • Protocol-level alignment: Shared roadmaps are driving compatibility in commit handling, schema evolution and versioning mechanics.

  • Interoperability over competition: The focus is no longer on replacing formats but on enabling a shared catalog interface across engines and ecosystems.

  • Metadata as architecture: With catalogs like Unity and Polaris embracing both formats, the metadata becomes the control layer—governing access, lineage and trust.

Crucially, both Iceberg (Apache Foundation) and Delta Lake (Linux Foundation) remain under open governance, preserving neutrality and community trust. But with overlapping roadmaps, joint catalog strategies and shared architectural goals, format convergence has moved from theoretical to inevitable.

This sets the stage for a platform model where catalogs unify, formats interoperate and engines coexist—without sacrificing governance or performance.

The rise of the data catalog layer in lakehouse platforms

As open formats converge, the next battleground isn’t storage—it’s Metadata Catalog. The real enabler of interoperability is not just format compatibility but also shared catalogs that govern access, lineage and structure across engines.

Recent industry developments have made this shift explicit.

Snowflake Horizon + Polaris: Catalog-centric data governance

Snowflake introduced Horizon as a unified governance layer across its platform—managing access control, lineage, policies and schema evolution not only for internal Snowflake tables but also for externally managed Apache Iceberg tables.

To enable this, Snowflake released Polaris Catalog, an open-source implementation of the Iceberg REST Catalog API. Polaris allows external query engines like Trino, Spark and Athena to discover and read Iceberg tables governed by Snowflake using open protocols rather than vendor-specific integration.

This decouples compute from governance and positions Snowflake as a catalog-first architecture, not just a database.

Databricks Unity Catalog: Open source metadata control plane

In parallel, Databricks open-sourced Unity Catalog, its own metadata control plane, enabling:

  • Format-agnostic governance (e.g., Delta, Iceberg)

  • Support for unified, cross-compute engine catalog APIs 

  • A foundation for ecosystem-wide interoperability

By open-sourcing Unity Catalog, Databricks formalized its role as a neutral metadata catalog manager, not just a Spark-native solution.

REST catalog APIs: The new standard for interoperability

The Iceberg REST catalog specification is fast becoming the industry standard. Instead of relying on Hive Metastore or engine-coupled metadata, platforms are adopting REST-based catalogs that:

  • Enable consistent, governed access to a single physical dataset across diverse query engines (such as Spark, Trino, Snowflake and Athena) by exposing a format-compatible metadata layer through REST-based catalogs.

  • Allow metadata—including table schemas, version history and access policies—to be managed centrally yet discovered and queried from engines operating in different cloud environments. This enables seamless interoperability across AWS, Azure and Google Cloud Platform.

  • Architect platforms in which cloud object storage, metadata catalogs and compute engines operate as loosely coupled components. This allows governance policies to be enforced independently of where data is stored or how it is queried.

The catalog is now the center of gravity. In a converging landscape, catalogs aren’t just where metadata lives—they’re where platform strategy, policy and trust now reside.

Architectural implications: designing format-neutral data platforms

For enterprise data platform architects, format convergence has deep design consequences. At the heart of it is a new platform baseline—one where data format, engine and governance decouple cleanly.

1. Unified cloud storage with multi-engine access

Architects can now design for a single governed dataset, written in Delta Lake format with UniForm enabled and stored once in cloud object storage (e.g., S3, Azure Data Lake Storage, Google Cloud Storage). This dataset can be accessed by:

  • Business intelligence tools and federated query engines via Iceberg-compatible REST-based catalog APIs (e.g., Snowflake, Trino, Athena, BigQuery)

  • Data engineering and machine learning (ML) training pipelines via native Delta Lake APIs in Apache Spark and Databricks

  • Analytics and artificial intelligence (AI) inference workloads in Snowflake and other platforms that recognize Iceberg tables

Thanks to Iceberg metadata exposure and REST-based catalog standards, these engines can read from the same physical data without duplication or extract, transform, load bridging. Metadata mediates discovery, schema resolution and snapshot-based reads, enabling true interoperability.

2. Catalogs as the central data control plane

Rather than just referencing formats, platforms are now centered on shared catalog layers that apply policies, manage schemas and enforce lineage independent of the compute engine.

3. Format-agnostic data products and interoperability

Teams can publish datasets with consistent contracts and governance and then expose them through format interfaces suited to consumer needs. The format becomes a delivery detail—not a limitation.

4. Multicloud, multiplatform interoperability for analytics

With Delta UniForm and Iceberg catalogs aligned, organizations can build multi-engine pipelines that support:

  • Distributed compute across clouds

  • Metadata-aware compliance checks

  • Unified lineage even with diverse readers

Diagram of a unified lakehouse architecture. Snowflake, Databricks and Trino connect to a unified metadata catalog and governance layer that manages Delta Lake, Iceberg and Hudi metadata over a single copy of table data in Parquet format.

New capabilities enabled by format convergence

  • Write interoperability: Current approaches focus on read access, but shared write semantics (e.g., bidirectional Delta-Iceberg updates) are already on their way.
  • Governance standardization: Catalog APIs are converging. Expect common lineage and policy standards across vendors.
  • Format evolution at scale: Features like streaming-compatible reads, deletion vectors and liquid clustering will extend format adaptability, supporting high-volume ingestion, compaction and query optimization at scale.
  • AI-aware metadata models: Formats will encode feature lineage, inference traceability and retrieval optimization as AI becomes a first-class workload.

Final thoughts: Why the future is catalog-centric and format-agnostic

The convergence of Delta Lake and Iceberg isn’t just a compatibility milestone—it marks a deeper architectural shift. It tells us that the storage format is no longer the boundary of a platform. The new boundary is the catalog: the layer that governs how data is exposed, trusted and reused across multiple tools and teams.

For enterprise data platform architects and engineers, this convergence unlocks a new kind of design thinking:

  • You can architect and build data platforms where format is implementation detail, not a constraint.

  • You can enforce governance and lineage across multiple compute engines from a single control plane.

  • You can design for data reuse, not redundancy, with a “write once, serve anywhere” model.

And critically, as AI, ML operations and information retrieval-based intelligent systems expand, interoperability becomes a compliance requirement, not just an engineering convenience.

The path forward is clear: Open table formats will continue to evolve, but the real architectural advantage comes from treating catalogs, metadata and governance as first-class components of your modern data platform.


Rahul Joshi, Distinguished Data Engineer & Director, Card Tech

Rahul Joshi is a Distinguished Data Engineer at Capital One with over 19 years of experience in building cloud-native data and analytics ecosystems that power intelligent products, decisioning systems and analytics at scale. Currently, his focus is within Card Tech, where he spearheads the data and analytics platform architecture for modern credit card core systems managing credit card accounts and transaction data. Prior to Capital One, Rahul held tech leadership roles at EY and IBM, delivering next-gen data and advanced analytics solutions for regional and global financial institutions. He writes about modern data architecture, cloud-native analytics and enterprise data strategies.

Related Content