Gemini, Google's family of generative AI models, can now analyze longer documents, codebases, videos, and audio recordings than ever before.
In its keynote at the Google I/O 2024 developer conference on Tuesday, Google announced a private preview of a new version of its current flagship model, Gemini 1.5 Pro, which can capture up to 2 million tokens. This is double the previous high.
The new version, Gemini 1.5 Pro, supports the largest input of any commercially available model at 2 million tokens. The next largest is Anthropic's Claude 3 with a maximum of 1 million tokens.
In the AI field, “tokens” are bits of raw data, like the syllables “fan,” “tas,” and “tick” in the word “fantastic.” 2 million tokens equals approximately 1.4 million words, 2 hours of video or 22 hours of audio.
In addition to being able to analyze large files, models that can ingest more tokens may be able to achieve performance improvements.
Model with small maximum token input (also known as: context), 2 million tokens input Models like the Gemini 1.5 Pro don't easily “forget” or go off topic in the most recent conversation. Large context models can also (at least hypothetically) better understand the flow of data they ingest and produce richer responses depending on the context.
Developers interested in trying Gemini 1.5 Pro in the context of 2 million tokens can add their name to the waiting list of Google AI Studio, Google's generative AI development tool. (Gemini 1.5 Pro with 1 million token context will be generally available next month across Google's developer services and Surfaces.)
Beyond the larger context window, Google says Gemini 1.5 Pro has been “enhanced” over the past few months with algorithm improvements. Google says it excels at code generation, logical reasoning and planning, multi-turn conversations, and audio and image understanding. The Gemini API and AI Studio also allow 1.5 Pro to infer entire audio in addition to images and video, which is then “manipulated” through features called system instructions.
Gemini 1.5 Flash, faster model
For less demanding applications, Google is launching Gemini 1.5 Flash, a “distilled” version of Gemini 1.5 Pro, in public preview. It's a small, efficient model built for narrow and high-frequency generative AI workloads. With a context window of up to 2 million tokens, Flash is multimodal, just like Gemini 1.5 Pro. This means that it can analyze audio, video, and images as well as text (although only text is generated).
“Gemini Pro is well-suited for more general or complex, often multi-step inference tasks,” Josh Woodward, VP of Google Labs, one of Google's experimental AI divisions, told reporters in a statement. He said this at a press conference. “[But] As a developer, what I really want to use is [Flash] When speed of model output is important. ”
Woodward added that Flash is particularly suited for tasks such as summarizing, chat apps, captioning images and videos, and extracting data from long documents and tables.
Flash seems to be Google's answer to smaller, lower-cost models delivered through APIs like Anthropic's Claude 3 Haiku. Along with Gemini 1.5 Pro, it is very widely available and is currently available in over 200 countries and territories, including the European Economic Area, the United Kingdom, and Switzerland. (However, the 2 million token contextual version is gated to the back of the waiting list.)
In another update aimed at cost-conscious developers, all Gemini models, not just Flash, will soon receive a feature called Contextual Cache. This allows developers to cache large amounts of information (such as knowledge bases or databases of research papers) that Gemini models can access quickly and relatively cheaply (from a per-usage perspective).
Available in public preview today on Vertex AI, Google's enterprise generative AI development platform, the free Batch API provides a more cost-effective way to handle workloads such as classification and sentiment analysis, data extraction and explanation generation. Provides a way to send multiple prompts to a Gemini model in a single request.
Control Generation, another new feature coming to Vertex preview later this month, will further reduce costs by allowing users to define Gemini model output according to a specific format or schema (such as JSON or XML). Woodward suggests that this could lead to.
“Send all files to the model once and no longer have to resubmit them multiple times,” says Woodward. “This should give you a longer context.” [in particular] It's much more convenient and affordable. ”