AI, applications, geospatial, etc.

Applications of AI


Snowflake Summit 2023: What to Expect

Since its launch in 2014, Snowflake has been the most successful in the data industry, attracting customers such as Adobe, AXA, Blackrock, Novartis, Sainsbury’s, and reaching over $2 billion in annual revenue.

At its core, Snowflake was originally a cloud-native data warehouse that separated compute and storage, with great documentation, a clean and user-friendly interface, and pay-as-you-go pricing. has grown to offer a host of other tools via platform.

These include data positioned as a neutral home for cloud users to access third-party data, add their own datasets, perform analysis, and present results via various visualization tools. Marketplaces and tool sets for data scientists and developers are growing. Build and deploy data-intensive applications. (Snowflake’s position in the world is nicely captured in the chart below from its 2023 earnings presentation…)

What to see at Snowflake Summit 2023

As CEO Frank Slootman said at the company’s final earnings call in May, “Snowflake’s mission is to steadily break down all limits to data users, workloads, applications and new forms of intelligence.” , clustering, governance, metadata management, partitioning, security and more in a cloud-agnostic SaaS while removing all the heavy lifting.

Snowflake has been aggressive with acquisitions to realize its vision, agreeing four acquisitions this year alone and trying to counter long-standing criticism. This includes unstructured data and lack of good support for the kinds of languages/libraries that data scientists want to introduce. (Snowflake GA unstructured data management capabilities, and in early 2022 he also added Python tools. This year he added PyTorch and MLFlow plugins, both of which were private his previews…)

stack will take a closer look at its expanding service and the data ecosystem around it at the Snowflake Summit in Las Vegas starting June 26th. Ahead of the event, we sat down with Industry Field CTO Fawad Qureshi to discuss Snowflake’s evolution and rethink the company’s efforts. Recent acquisitions, new product suites, challenges and more.

Bring your application to data

As Qureshi puts it, Snowflake helps customers—or customers willing to make the migrations required organizationally to really deliver on the promise of the technology—to break down internal data silos and enable users to scale in seconds. We have helped enable terabytes of data to be shared between Snowflake accounts. Later, we added functionality to address data sharing between organizations via the Snowflake Marketplace.Next Big Focus: Applications, or as he says stack: “We are working to eliminate application silos.”

Snowflake’s “native application framework” entered private preview in June 2022. The big idea extracted is to make Snowflake your go-to home, not just for data warehousing (the company is now disassociating the term in favor of “data platform”). However, app development, deployment. even sales.

What to see at Snowflake Summit 2023

In June 2021, the company quoted partner Bucky Moore as saying, “Applications using Snowflake core features such as UDFs and stored procedures, as well as the Streamlit integration (currently in development), can be brought to the data cloud via the Snowflake Marketplace. We will sell to all customers,” he said. Venture Capital Firm Kleiner Perkins – ‘The natural next step is to use this foundation’ [the cloud data warehouse] It’s about building full-featured applications that both read and write to the warehouse” – and Andreessen Horowitz partner Martin Cassad said that in 2021, “every app will be a data layer. It will only be reimplemented on top of…”

Theoretically, by encouraging customers to build their applications on Snowflake’s infrastructure, it offers unlimited scalability, native governance and security, and execution “across all clouds and regions supported by the Snowflake Data Cloud.” You get the capabilities (expansive, but not infinite) that you can get. . )

This will help with integration as well as compliance, Qureshi suggests, pointing to the GDPR’s “explainable AI” requirement, which allows companies that use personal data for automated processing to understand how they make decisions. It means that you must be able to explain. “It’s much easier to build a genealogy for Explainable AI applications,” the field CTO emphasized during a June 16 conference call. stack.

There are many things to do first. Cloud data warehouses need to improve streaming data ingestion and processing for applications that provide or rely on real-time analytics rather than batch-based pipelines. Also, Snowflake’s native application framework is currently only in public preview on AWS. I’m not sure if there will be an update on this next week, but expect the theme to be supporting application development on Snowflake, especially AI application development. stack Brings details…

Snowflake Summit 2023 Announcement

But what Qureshi is particularly excited about is geospatial data.

“At Snowflake Summit 2023, we will be talking about many features in this area. “They’re not positional components,” he says.

“The worlds of GIS technology and data analytics remain completely separate,” he says, noting that many GIS tools were not built for cloud data warehouse-scale analysis. (I was using a GIS software, whose name I will withhold for demonstration purposes, and it gave me an error saying it could not connect to a table with more than 250,000 rows. When I queried the rows, the query returned in 6 minutes, so that’s a completely different scale,” he emphasizes.)

“How to combine the world of geospatial analytics with ‘traditional’ data analytics to deliver more value: It will be a very exciting area in the next two to three years. We intend to invest in this space with our ecosystem partners and encourage more and more users to adopt geolocation components in their data analytics workloads…”

Snowflake: Addressing Costs and More…

Snowflake is evolving rapidly. But the world is not standing still either. Critics are keen to point out that costs can grow quickly. Quresh said Snowflake is working hard with customers to help support his FinOps requirements, and the company offers a variety of tools for proactive control.

(“Administrators can set usage quotas. Resource Monitor can also send alerts and automatically suspend accounts when those quotas are reached. That approach is radical. If too much, the administrator can set a pause warning when the quota is approached… Autopause Policy The virtual warehouse turns off quickly after being idle, so the resource is running. No cost when it’s not running Auto-restart turns the virtual warehouse back on when needed Administrators can also set time thresholds to limit long-running queries Yes, session timeouts or scope statement timeouts limit accidental usage, and reviewing access history may identify unused tables that can be dropped to reduce costs… “)

As Bucky Moore of the same Kleiner Perkins, quoted on Snowflake’s Native Application Framework blog, said about this separately earlier this year, “The separation of storage and compute with BigQuery and Snowflake has revolutionized the industry. However, it can incur unexpectedly high costs, and comes at a price.” Prevent lock-in to custom storage formats.

“Many people are also beginning to realize that they do not have the ‘big data’ to warrant distributed computing in the first place. These factors, I believe, are contributing to the emergence of new, unbundled OLAP architectures. We believe that in an unbundled OLAP architecture, data is stored directly: indexing is handled by open-source formats like Hudi and Iceberg, and queried by distributed query engines like Trino, or in-process with DuckDB Structure your data to provide transactional guarantees.

“This allows us to apply the right storage, indexing, and query technology for each use case based on cost, performance, and operational requirements. I found it to be apt. This is why I’m particularly excited about DuckDB’s in-process columnar analytics experience. At the same time, open source projects such as Datafusion, Polars, and Velox have enabled the development of query engines for previously considered use cases. ” Too niche to build. As the industry standardizes on Arrow for in-memory data representation, the challenge of how to share data across these new platforms has been resolved. We expect that this will lead to rapid innovation in analytical databases by commoditizing the approach. Query execution was a key driver of Snowflake’s success.

“The success of this architecture seems likely to chip away at market share for cloud data warehouses…” he wrote.

It’s a avant-garde approach that assumes far more engagement with early open source toolkits than many business leaders would like at this stage. Snowflake’s ability to abstract complexity will likely continue to appeal to many, but it’s still early days. Management believes it can reach $10 billion in product revenue in fiscal 2029 and plans to add another 1,000 employees this year.

Do not miss it. Subscribe to The Stack now.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *