Haystack (deepset) Blog·Infra·48d ago·~3 min read

Multimodality Embeddings Bilge Yücel DevRel Engineer Stefano Fiorucci AI/Software Engineer Multimodal Search with Gemini Embedding 2 in Haystack Build multimodal search systems in Haystack using Gemini Embedding 2 to embed text, images, video, audio, and PDFs in a shared vector space. March 10, 2026

Multimodal Search with Gemini Embedding 2 in Haystack Build multimodal search systems in Haystack using Gemini Embedding 2 to embed text, images, video, audio, and PDFs in a shared vector space. March 10, 2026Embeddings are the backbone of modern AI applications, from semantic search and recommendation systems to Retrieval-Augmented Generation (RAG). However, most embedding models operate in a single modality, typically focusing only on textual data. Google has introduced Gemini Embedding 2, a fully multimodal embedding model that maps text, images, video, audio, and PDFs into a shared vector space. This means you can search across different types of data using a single embedding model: gemini-embedding-2-preview . Even better, Haystack supports Gemini Embedding 2 from Day 0. Through the Google GenAI x Haystack integration, you can immediately start using the model in your Haystack applications for both text and multimodal embeddings. Let’s take a closer look. Meet Gemini Embedding 2 Gemini Embedding 2 is Google’s first fully multimodal embedding model, built on the Gemini architecture. It can map text, images, video, audio, and PDFs into a single unified vector space, enabling cross-modal comparison and retrieval using a shared semantic representation. For example, a text query can retrieve relevant images, an audio clip can match a document, or a video segment can be retrieved using text search. This unified representation makes it easier to build multimodal applications like image search, recommendation systems, and RAG. The model supports 100+ languages and allows developers to choose flexible embedding sizes using Matryoshka Representation Learning (MRL). Depending on the trade-off between storage and accuracy, you can select embedding dimensions up to 3072, with commonly recommended sizes being 768, 1536, or 3072 (default). Gemini Embedding 2 also supports large inputs up to 8192 tokens, making it suitable for embedding longer documents and complex multimodal inputs. Early benchmarks indicate strong performance across modalities, including a top-5 ranking on the MTEB Multilingual leaderboard for text and state-of-the-art results among proprietary models, with document retrieval performance comparable to Voyage. Check out the official Google documentation for more details. Using Gemini Embeddings in Haystack Haystack provides built-in components for generating Gemini embeddings through the Gemini API and Vertex AI. For text data, you can use: The GoogleGenAIDocumentEmbedder is typically used during the indexing to embed documents before storing them in a vector database. # pip install haystack-ai google-genai-haystack from datasets import load_dataset from haystack import Document from haystack.document_stores.in_memory import InMemoryDocumentStore from haystack_integrations.components.embedders.google_genai import ( GoogleGenAIDocumentEmbedder, GoogleGenAITextEmbedder ) document_store = InMemoryDocumentStore(embedding_similarity_function="cosine") docs = [ Document(content="The capybara is the largest rodent in the world and is native to South America, where it lives near rivers, lakes, and wetlands. It is highly social and often seen relaxing in groups, spending much of its time swimming or soaking in water. Capybaras communicate through whistles, barks, and purr-like sounds."), Document(content="Dogs are domesticated mammals known for their loyalty, intelligence, and strong bond with humans. They have been bred for thousands of years for roles such as companionship, hunting, guarding, and assisting people with various tasks. Different…

#gemini#multimodal#embeddings

read full article on Haystack (deepset) Blog →

0login to vote