Google fuses SQL, Python, and Spark's Colab Enterprise Push • Register

Machine Learning


Google integrates SQL, Python and Apache Spark in one place, promising a single notebook environment for machine learning and data analysis.

Readers may find that other prominent vendors in the data science and analytics market are also trying to tackle the division between SQL/analysis and the machine learning workbench.

Yasmeen Ahmad, Managing Director of Google Data Cloud, said the biggest barrier to data science productivity is switching between database/data warehouse environments to retrieve data in SQL code, and exporting and loading it into a Python notebook for machine learning while configuring a separate Spark Cluster. After that, they may switch to BI tools just to visualize the results, she said.

“Our prioritization is not only to implement predictive models, but to eliminate this friction by creating a single intelligent environment that architects need to design, build and deploy.”

So Google previews many extensions to the Colab Enterprise Notebook for BigQuery Data Warehouse and ML Platform Vertex AI.

Within Colab Enterprise Notebooks, Google can preview native SQL cells, and users can use SQL for data exploration, view results with BigQuery data frames, Pythonic data frames, machine learning (ML) APIs equipped with Python engines, and build models in Python. Chocolate Factory also previews interactive visualization cells that generate editable charts in the same environment, breaking the barriers between SQL, Python and visualization, the vendor claims.

Additionally, with Colab Enterprise Notebooks, Google offers a data science agent. It claims to have been enhanced to incorporate the use of tools into detailed planning, such as using BigQuery ML for training and guessing, BigQuery DataFrames for analysis using Python, or large-scale spark transformations (now preview). Google announced BigQuery support for Apache Spark in 2022.

Google is not the only vendor trying to bridge the gap between data analytics and machine learning. For example, the cloud data platform Snowflake introduced the Snowpark connector in August. Based on Spark Connect by Apache Spark Community. It employs a client server architecture that allows client applications to connect to remote spark clusters.

According to Snowflake, the SnowPark connector says that instead of Spark users managing another Spark cluster, they can run code on clients tied directly to Analytics Engine. This allows you to run all the latest Spark DataFrames, Spark SQL, and user-defined function code within Snowflake.

Databricks moved in 2020 to bring SQL support to data lake environments, including Apache Spark.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *