In concert with its Next ’24 event in Tokyo this week, Google Cloud is making an array of announcements across its entire data platform, including the analytical and operational sides.
Platforms involved include flagships like BigQuery and Spanner, but also supporting components like Looker, Analytics Hub, Cloud SQL and Bigtable. While the big headlines involve generative AI (GenAI), data lake-relevant support for open table formats and core database capabilities like SQL support, full-text search, distributed counters and geo-distributed functionality are also part of the mix.
Some of the new capabilities are being released to general availability (GA), while others are entering public or private preview. But taken as a whole, the new features represent a sweeping upgrade to everything Google Cloud is doing with data.
And in the context of the innovations announced earlier this summer at Microsoft Build, Snowflake Data Cloud Summit, and Databricks Data and AI Summit, it’s clear that Google Cloud should be considered an important member of the cloud data, analytics, and AI cohort.
AI Aplenty
In the AI world, Google Cloud is taking the sensible approach of leveraging all the GenAI and ML goodness in Vertex AI, without making data professionals explicitly detour to that platform. Instead, the company is offering a preview of strong integration with Vertex from BigQuery and Spanner.
The previously available BigQuery Machine Learning (BQML) is part of this, as a bridge to Vertex AI, but so too is the ability for Spanner to function as a vector database, supporting Retrieval Augmented Generation (RAG) applications and fine-tuning of foundation models in the large language model (LLM) category. Together, these features are meant to allow customers to chat, search, generate, process and stream with GenAI on enterprise data.
The reverse application of GenAI is there as well: embedding Google’s Gemini technology within its “Data Cloud.” In BigQuery, there’s SQL code generation and explanation, Python code generation, and partitioning/clustering recommendations. The new Data Canvas platform brings Visual and AI-driven data discovery, modeling and analysis, embedded in BigQuery Studio. Semantic data discovery, automated Python notebook generation, and collaboration for data analysts are in the mix as well. All of these capabilities are GA.
On the preview side, there’s Gen AI-assisted visual data prep, which offers automated schema matching, proposes transformations and data quality rules (with enforcement), and generates SQL pipelines for BigQuery. New Data Agents assist BI workloads, through customized LLM-driven query and analysis, and the generation of insights and workflows. A preview integration of Gemini into Looker adds automatic slide generation and “Assistants” for generating formulas, visualizations and LookML modeling code.
What about conversational analytics, to help Google’s BI catch up with the likes of Copilot for Power BI and ThoughtSpot Sage? The company says that’s coming soon.
Data Lake Fundamentals
In the data lake/lakehouse arena, Google is enhancing BigQuery by bringing to GA its support for the Delta Lake and Iceberg open table formats and adding support for the Apache Hudi format in preview. It’s also adding, as a preview, native support for Apache Spark data processing and Apache Kafka streaming data. The Spark support allows customers to create Spark code written in Python and run it directly from BigQuery, as stored procedures. The Kafka support automates operations and security and allows for easy ingest of streaming data into BigQuery.
And speaking of real-time data, a preview release allows Google Cloud Pub/Sub data streams to be shareable via Google Cloud’s Analytics Hub, which will also offer enhanced data publisher capabilities as a GA feature and integration with Google Cloud Marketplace in preview.
Operational Databases Still Indispensable
On the operational data side, there are a range of new capabilities as well, including AI-relevant ones, and core database advances. Google’s Spanner geo-distributed database is where a lot of these innovations are landing. To start with, Google Cloud is adding graph capabilities to the platform, as a preview feature, with full adoption of the Graph Query Language (GQL) standard.
Graph capabilities in Spanner are made possible through new index types and mappings and allow for dedicated edge/node tables or integration with conventional relational tables. Graph has applications in GenAI but also in the building of knowledge graphs, recommendation engines, IT topology, social networks and fraud detection.
New vector search capabilities will directly support RAG application development, by enabling developers to pull relevant context from Spanner databases and dispatch it along with the prompts they send to LLMs for GenAI queries. A new full-text search capability, also in preview, is relevant here, as well. This enables keyword searches on data without needing to use external platforms like Elastic or Apache Solr.
Don’t need those capabilities and don’t want to pay for them? Spanner will soon support tier-based pricing through the availability of three distinct editions: Standard, Enterprise, and Enterprise Plus, the last two of which will include the new graph and vector/full-text search capabilities. Configurable read-only replicas, a new managed autoscaler and new incremental backup capabilities will be exclusive to those editions as well.
At the highest end, preview capabilities around dual-region operation (within the same country) and geo-partitioning of data will be available only in the Enterprise Plus edition. Two other new capabilities, reverse ETL from BigQuery and scheduled backups, will be available in all three editions.
By adding vector, full-text and graph capabilities to Spanner’s previous support for key-value, relational and analytics capabilities, Google Cloud now views Spanner’s Enterprise editions as true multi-model databases. It also says that these different models can be used together, without forcing you to pick a specific one, up-front, for your database.
Bigtable, SQL Server, and Big Red
One NoSQL mode Spanner doesn’t support is the wide column/column family variety. That is the province of Cloud Bigtable, the wide column database Google pioneered decades ago that inspired the open source Apache HBase project, the API for which Cloud Bigtable supports. That API can be a bit cumbersome though. It’s imperative, rather than declarative, and often requires several lines of code to perform fairly straightforward operations.
That’s not a huge problem for column family database enthusiasts, who know that API in a second-nature fashion, and can write the necessary code easily, but the need for that level of familiarity has limited the platform’s appeal.
To address this challenge, Google Cloud is adding SQL support to Bigtable, in preview. The addition of SQL support for querying data will likely broaden Bigtable’s appeal and, the company says, will accelerate migrations of Apache Cassandra and HBase implementations over to Bigtable.
So too will the GA release of “distributed counters,” which allow Bigtable to store aggregations and update them as new data is written to the database, rather than calculating and storing those aggregations after the fact or generating them at query time. Think of it as keeping an always-available running total (or minimum or maximum) rather than generating it, on-demand, later. This is extremely useful for both general and operational analytics.
Moving beyond Google’s own database technology, Google Cloud is also announcing enhanced support scenarios for Microsoft SQL Server and various Oracle technologies.
On the SQL Server side, Google’s Cloud SQL is gaining a new Enterprise Plus edition for SQL Server (as a GA offering), joining counterparts already in place for the open-source PostgreSQL and MySQL databases, and offering both a “four-nines” (99.99%) availability SLA and advanced disaster recovery.
On the Oracle side, Google Cloud customers will be able to run Big Red’s Exadata, and Autonomous Database in Google data centers, and run Oracle Database and Oracle Applications on Google Compute Engine virtual machines.
Bring Out Your Workloads
Every company in the data/analytics world is adding GenAI features, and with these announcements, Google is making clear it’s no exception.
But the full complement of announcements is clearly meant to enhance Google Cloud’s appeal for all variety of data workloads and seems aimed at getting customers of other clouds and data platforms to migrate to Google Cloud.
As an overt manifestation of this aim, Google Cloud is offering significant data migration incentives, aimed specifically at helping customers migrate their data warehouse and data lake/lakehouse workloads to BigQuery and Cloud Dataproc (Google’s open-source data lake platform), respectively.
These incentives include up to 100% first-year Google Cloud Credits and up to $250K in data egress credits, to run old and migrated systems in parallel, without additional cost. It also includes migration services implementation credits to leverage Google expertise and tools.
While Google Cloud may be in third place among the big hyperscalers generally, its position in the data race has been far more competitive. In the era where generative AI is so high on the enterprise priority list, and efforts around it can be made successful only upon a solid data and analytics foundation, Google Cloud’s data bona fides may be the key to it advancing in the cloud wars overall.
Will data win the day, and propel Google Cloud to new heights? Google clearly sees the merit in that bet, both in terms of the scalability and capabilities of its data cloud.
The post Google Cloud Adds GenAI, Core Enhancements Across Data Platform appeared first on The New Stack.
For its Next Tokyo '24 event, Google Cloud is adding GenAI and core enhancements across BigQuery, Spanner, Looker, Cloud SQL, and Bigtable.