Tech

Databricks acquires Tabular to create a common data lakehouse standard

Databricks, the analytics and AI giant, has acquired data management company Tabular for an undisclosed amount. (CNBC reports that Databricks paid more than $1 billion.)

According to Tabular co-founder Ryan Blue, he and Tabular’s two other co-founders, Daniel Weeks and Jason Reid, will join Databricks in some capacity. There, they will work to unify the customer bases and communities of Tabular and Databrick.

“Joining Databricks means there will be more contributions from our new colleagues,” Blue writes in a blog post. “By doing so, we ensure that our approach to (our community) does not change. »

Tabular, founded by Blue, Weeks and Reid in 2021, offers data management products based on Apache Iceberg, a project that Blue and Weeks developed while at Netflix and later donated to the Apache Software Foundation . Iceberg is a high-performance, open source database format that optimizes database tables for big data while allowing data engines to work with the tables.

Iceberg competed with Databricks’ Delta Lake in the format wars for data lakehouses – data architectures designed to store large amounts of raw data while providing structure and management functions. Although Iceberg and Delta Lake use the Apache Parquet data storage format, they are incompatible in key aspects.

Soon, however, Delta Lake and Iceberg will converge into one. Databricks and Tabular commit to working on a common standard in light of the acquisition news.

“(We will) work to improve support for Iceberg across the Databricks platform,” Blue said. “Our goal is to improve interoperability so that you can enjoy the incredible work of both communities without having to worry about the underlying format.”

The market for data lakehouses is huge – according to the MIT Tech Review, around 74% of organizations have one – and so, from Databricks’ perspective, bringing Tabular into its family of companies was probably the obvious choice. After all, fewer competing data lakehouse formats – or, alternatively, platforms supporting multiple formats – make the Databricks platform more attractive to enterprise customers, even if those formats aren’t proprietary to the vendor.

In a blog post co-authored by Databricks CEO Ali Ghodsi and Chief Architect Reynold Xin, Databricks says it intends to “work closely” with the Iceberg and Delta Lake communities to “bringing interoperability to the formats themselves”.

“This acquisition underscores our commitment to open formats and open source data in the cloud,” the blog post reads. “This is a long journey, which will probably take several years to complete in the (Data Lakehouse) communities. »

Prior to the acquisition, San Jose-based Tabular raised $37 million in venture capital from investors including Andreessen Horowitz, Zetta Venture Partners and Altimeter Capital. Databricks says it expects the purchase to be completed during the second quarter of 2024, subject to customary closing conditions.

techcrunch

Back to top button