Databricks says it solved the decades-old data pipeline problem that's been slowing AI agents
For decades, data professionals have struggled with the challenge of managing both operational and analytical databases in a unified approach that doesn't introduce latency and performance degradation. Agents made the problem structural. A system that reasons continuously and acts on live data cannot tolerate a pipeline between itself and the information it needs to act on. At the Data + AI Summit on Tuesday, Databricks announced two products aimed at collapsing that infrastr
For decades, data professionals have struggled with the challenge of managing both operational and analytical databases in a unified approach that doesn't introduce latency and performance degradation. Agents made the problem structural. A system that reasons continuously and acts on live data cannot tolerate a pipeline between itself and the information it needs to act on. At the Data + AI Summit on Tuesday, Databricks announced two products aimed at collapsing that infrastructure. Lakehouse//RT delivers millisecond query latency directly on governed Delta and Iceberg tables, eliminating the dedicated real-time serving tier that enterprises have maintained alongside their lakehouses. LTAP, short for Lake Transactional/Analytical Processing, stores Postgres-native transactional data in Delta and Iceberg format from the point of write, removing the ETL pipelines that have connected operational and analytical systems for decades. Reynold Xin, co-founder of Databricks, described a simpler data stack as "the holy grail for agents" in a briefing with VentureBeat, arguing that as users vibe code more applications, the agents reasoning analytically on top of those apps need the underlying infrastructure out of the way to move fast. "The agents really prefer a much simpler stack, because they can move way faster," he said. LTAP bets on storage-layer unification where HTAP tried engine convergence Many vendors have tried various approaches over the decades to unify analytical and transactional data. Back in 2014, analyst firm Gartner coined the term HTAP, an acronym that stands for Hybrid Transactional/Analytical Processing as a way to describe vendors that attempted to unify the two types of databases. Vendors including MemSQL (now known as SingleStore ) SAP HANA and Oracle's MySQL Heatwave are among many HTAP vendors in the market. LTAP is Databricks' answer to HTAP, using the Lakebase architecture to unify data at the storage layer rather than the engine level. Lakebase is Databricks' serverless cloud-based PostgreSQL database service that became generally available in February. "HTAP to us is kind of more of a failure of the industry rather than a success," Xin said. The LTAP approach goes to the storage layer instead of the query layer. Lakebase previously stored Postgres data in Postgres format on object storage, requiring conversion before the Lakehouse's analytical engines could use it efficiently. With LTAP, transactional data lands directly in Delta or Iceberg format, sharing the same copy that analytical workloads read. Postgres remains the transactional engine. Spark and the Lakehouse remain the analytical engine. "The whole point is, hey, you use the best tool for the job at the query engine level, we just make sure underlying storage is a single copy of the data," Xin said. The central engineering challenge is latency. Object storage carries response times in the seconds range, far too slow for OLTP workloads that require sub-millisecond performance. Lakebase handles this through a caching layer between Postgres compute instances and object storage. The key design decision is where the column conversion happens: idle CPU capacity in that caching layer performs the row-to-column conversion before data lands in object storage. "When you convert data from row to column, it compresses more than 10 times, typically, so now you substantially reduce the network cost of that basic caching layer between that caching layer and the object stores," Xin said. Lakehouse//RT delivers millisecond query latency on live lakehouse data without a separate serving tier Lakehouse//RT is Databricks' answer to the dedicated real-time serving tier — the separate system enterprises have maintained alongside their lakehouses to handle low-latency queries, at the cost of data copies, split governance and pipeline complexity agents cannot work around. Key capabilities of Lakehouse//RT include: Reyden compute engine: Built specifically for high-concurrency, low-latency serving, Reyden queries Delta and Iceberg tables directly without moving data out of the lakehouse. Latency and throughput: Lakehouse//RT delivers sub-100ms latency at 12,000 queries per second, with response times as low as 10ms on smaller datasets and up to 16x better performance than existing dedicated serving stacks. Governance and data access: Every query runs within Unity Catalog's governance framework with no separate permissions layer, no data copies and no ingestion pipelines. Analysts see the agentic framing and open format approach as the real differentiators The problem both products address is well-documented among enterprise data teams, but analysts draw a distinction between the pain point and the specific claim Databricks is making. "Enterprises have had HTAP, streaming, cloud warehouses, and operational stores for years," Stephanie Walter, Practice Lea
📌 Kaynak
Bu haber XML kaynağından derlenmiştir. Tamamı için orijinal habere gidin.
Orijinal haberi oku →