Google releases Gemini Embedding 2 for multimodal AI applications

Applications of AI


google has released Gemini Embedding 2. this is, gemini architecture.

This model extends beyond previous text-only embedding systems by mapping text, images, video, audio, and documents into a single unified embedding space. Captures semantic meaning across over 100 languages ​​and supports AI tasks such as search augmented generation (RAG), semantic search, sentiment analysis, and data clustering.

Gemini embedding 2

Gemini Embedding 2 uses the multimodal capabilities of the Gemini architecture to generate embeddings from different types of data.

This model supports interleaved multimodal input, allowing developers to combine inputs such as text and images in a single request. This allows the system to capture relationships between different media types and process datasets containing multiple formats.

Main features

Multimodal input support

  • Text: Supports up to 8,192 input tokens
  • Images: Processes up to 6 images per request and supports PNG and JPEG formats.
  • Video: Supports up to 120 seconds of video input in MP4 and MOV formats
  • Audio: Process audio directly without the need for transcription.
  • Documentation: Supports embedding PDF files up to 6 pages

Interleaved multimodal input

This model can handle multiple media types within a single request, allowing for contextual understanding between inputs such as images and text.

Matryoshka Representation Learning (MRL)

Gemini Embedding 2 includes the following features Matryoshka expression learningThis allows the embedding vector to be scaled across different dimensions. The default dimension is 3,072, and developers can reduce the size to manage storage and performance requirements.

Recommended output dimensions:

Model features

According to Google, this model introduces multimodal embedding support across text, image, video, and audio tasks, and also adds native audio processing capabilities.

Supported use cases

  • Search extension generation (RAG)
  • Semantic search
  • sentiment analysis
  • data clustering
  • Large scale data management

availability

Gemini Embedding 2 is available in public preview below. Gemini API and Vertex AI. Developers can access models through integration with frameworks and vector database tools, including:

  • rung chain
  • llama index
  • haystack
  • Weaviate
  • quadrant
  • chroma DB

This model can also be used with vector search systems for multimodal data processing.



Source link