Building High-Performance Vector Search Systems with VertexFSVectorStore: A Comprehensive Guide to Low-Latency Embedding Retrieval on Google Cloud

April 15, 2025 7 minute read

Building High-Performance Vector Search Systems with VertexFSVectorStore: A Comprehensive Guide to Low-Latency Embedding Retrieval on Google Cloud

In the rapidly evolving landscape of AI applications, efficient vector search has become a critical component for production systems. Whether you’re building a semantic search engine, recommendation system, or a RAG (Retrieval-Augmented Generation) application, the ability to quickly retrieve relevant embeddings at scale is essential. Google Cloud’s VertexFSVectorStore offers a powerful solution for implementing high-performance vector search with low latency. In this comprehensive guide, we’ll explore how to leverage this technology to build robust, production-ready AI applications.

Understanding VertexFSVectorStore

VertexFSVectorStore is a vector store implementation provided by LangChain that combines the reliability of BigQuery Storage with the low-latency serving capabilities of Vertex AI Feature Store. This hybrid approach provides an excellent balance between data durability and retrieval performance.

What makes VertexFSVectorStore particularly valuable for production applications is its architecture:

BigQuery serves as the underlying source of truth for your vector data
Vertex AI Feature Store provides optimized, low-latency retrieval of embeddings
The system supports similarity search, filtering, and nearest neighbor retrieval by ID

Let’s dive into how you can implement this solution for your own applications.

Setting Up VertexFSVectorStore

To get started with VertexFSVectorStore, you’ll need to initialize it with the appropriate parameters. Here’s a basic example:

from langchain_google_community.bq_storage_vectorstores.featurestore import VertexFSVectorStore
from langchain_openai import OpenAIEmbeddings

# Initialize the embedding model
embedding_model = OpenAIEmbeddings()

# Initialize the vector store
vector_store = VertexFSVectorStore(
    embedding=embedding_model,
    project_id="your-gcp-project-id",
    dataset="your-bigquery-dataset",
    table="your-bigquery-table",
    location="us-central1",
    embedding_dimension=1536  # Dimension of your embeddings (e.g., OpenAI's ada-002 is 1536)
)

This creates a vector store that will use BigQuery for storage and Vertex AI Feature Store for serving. The system will automatically handle the synchronization between these two components.

Adding Documents to the Vector Store

Once your vector store is set up, you can add documents to it. VertexFSVectorStore provides several methods for adding data:

Adding Documents from Scratch

from langchain.schema import Document

# Create some example documents
documents = [
    Document(page_content="This is the first document", metadata={"source": "doc1"}),
    Document(page_content="This is the second document", metadata={"source": "doc2"}),
    Document(page_content="This is the third document", metadata={"source": "doc3"})
]

# Add documents to the vector store
doc_ids = vector_store.add_documents(documents)
print(f"Added documents with IDs: {doc_ids}")

Adding Texts with Metadata

# Add texts with metadata
texts = [
    "This is the first text",
    "This is the second text",
    "This is the third text"
]

metadatas = [
    {"category": "general", "priority": "high"},
    {"category": "technical", "priority": "medium"},
    {"category": "business", "priority": "low"}
]

# Add texts to the vector store
text_ids = vector_store.add_texts(texts=texts, metadatas=metadatas)
print(f"Added texts with IDs: {text_ids}")

Adding Pre-computed Embeddings

If you’ve already computed embeddings for your texts, you can add them directly:

# Add pre-computed embeddings
texts = ["Pre-computed text 1", "Pre-computed text 2"]
embeddings = [
    [0.1, 0.2, 0.3, ...],  # 1536-dimensional vector
    [0.4, 0.5, 0.6, ...]   # 1536-dimensional vector
]
ids = ["id1", "id2"]
metadatas = [{"source": "precomputed1"}, {"source": "precomputed2"}]

# Add to the vector store
vector_store.add_embeddings(
    texts=texts,
    embs=embeddings,
    metadatas=metadatas,
    ids=ids
)

Performing Vector Search

VertexFSVectorStore supports several types of search operations, making it versatile for different use cases:

Basic Similarity Search

# Basic similarity search
query = "Find documents about technical topics"
docs = vector_store.similarity_search(query, k=3)

for doc in docs:
    print(f"Content: {doc.page_content}")
    print(f"Metadata: {doc.metadata}")
    print("---")

Similarity Search with Scores

# Search with relevance scores
query = "Find high priority items"
docs_with_scores = vector_store.similarity_search_with_score(query, k=3)

for doc, score in docs_with_scores:
    print(f"Content: {doc.page_content}")
    print(f"Metadata: {doc.metadata}")
    print(f"Relevance Score: {score}")
    print("---")

Filtering During Search

One of the powerful features of VertexFSVectorStore is the ability to filter results based on metadata:

# Search with metadata filtering
query = "Find technical documents"
filter_dict = {"category": "technical"}

docs = vector_store.similarity_search(
    query,
    k=5,
    filter=filter_dict
)

for doc in docs:
    print(f"Content: {doc.page_content}")
    print(f"Metadata: {doc.metadata}")
    print("---")

Maximal Marginal Relevance (MMR) Search

To balance relevance with diversity in your search results, you can use MMR:

# MMR search for diverse results
query = "Find documents about various topics"
docs = vector_store.max_marginal_relevance_search(
    query,
    k=5,  # Number of documents to return
    fetch_k=10,  # Number of documents to consider before reranking
    lambda_mult=0.5  # Diversity factor (0=max diversity, 1=max relevance)
)

for doc in docs:
    print(f"Content: {doc.page_content}")
    print(f"Metadata: {doc.metadata}")
    print("---")

Advanced Configuration for Production

For production deployments, VertexFSVectorStore offers several advanced configuration options:

Private Service Connect

If you’re working in a secure environment, you can enable Private Service Connect:

import grpc
from google.cloud.aiplatform_v1.services.feature_online_store_service.transports.grpc import FeatureOnlineStoreServiceGrpcTransport

# Set up a secure channel
channel = grpc.insecure_channel("10.128.0.1:10002")
transport = FeatureOnlineStoreServiceGrpcTransport(channel=channel)

# Initialize with PSC enabled
vector_store = VertexFSVectorStore(
    embedding=embedding_model,
    project_id="your-gcp-project-id",
    dataset="your-bigquery-dataset",
    table="your-bigquery-table",
    location="us-central1",
    enable_private_service_connect=True,
    transport=transport,
    allowed_projects=["project-1", "project-2"]
)

Customizing Feature Store Configuration

You can customize the Feature Store configuration for your specific needs:

vector_store = VertexFSVectorStore(
    embedding=embedding_model,
    project_id="your-gcp-project-id",
    dataset="your-bigquery-dataset",
    table="your-bigquery-table",
    location="us-central1",
    online_store_name="my-custom-store",
    feature_view_name="my-custom-view",
    sync_schedule="0 */6 * * *",  # Sync every 6 hours
    distance_type="COSINE_DISTANCE"  # Use cosine similarity instead of dot product
)

Data Synchronization

To ensure your Feature Store has the latest data from BigQuery, you can manually trigger synchronization:

# Sync data from BigQuery to Feature Store
vector_store.sync()

Converting to BigQueryVectorStore for Batch Operations

For batch processing scenarios, you might want to use BigQuery directly rather than going through the Feature Store. VertexFSVectorStore makes this easy:

# Convert to BigQueryVectorStore for batch operations
bq_vector_store = vector_store.to_bq_vector_store()

# Now you can use BigQueryVectorStore for batch operations
batch_results = bq_vector_store.similarity_search(
    "Batch query example",
    k=100  # Can process larger result sets
)

Building a Retriever for LLM Applications

VertexFSVectorStore integrates seamlessly with LangChain’s retriever interface, making it easy to use in RAG applications:

# Create a retriever from the vector store
retriever = vector_store.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 5}
)

# Use the retriever in a chain
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model_name="gpt-4")

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever
)

# Query the system
response = qa_chain.run("What are the high priority items in our system?")
print(response)

Performance Considerations and Best Practices

When implementing VertexFSVectorStore in production, keep these best practices in mind:

Embedding Dimension: Choose an appropriate embedding dimension. Higher dimensions provide more expressiveness but consume more storage and processing resources.
Indexing Strategy: Configure the indexing algorithm based on your needs. For most applications, the default settings work well, but you can customize them for specific requirements.
Sync Schedule: Set an appropriate sync schedule based on how frequently your data changes. More frequent syncs keep data fresh but consume more resources.
Filtering Columns: Only specify the columns you’ll actually use for filtering to optimize performance.
Distance Metrics: Choose the right distance metric for your use case. DOT_PRODUCT_DISTANCE works well for normalized embeddings like those from OpenAI, while COSINE_DISTANCE might be better for other embedding types.
Regional Deployment: Place your BigQuery dataset and Feature Store in the same region to minimize latency and data transfer costs.

Example: Complete RAG Application

Let’s put everything together in a complete example of a RAG application using VertexFSVectorStore:

from langchain_google_community.bq_storage_vectorstores.featurestore import VertexFSVectorStore
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain.chains import RetrievalQA
from langchain.schema import Document

# Initialize components
embedding_model = OpenAIEmbeddings()
llm = ChatOpenAI(model_name="gpt-4")

# Initialize vector store
vector_store = VertexFSVectorStore(
    embedding=embedding_model,
    project_id="your-gcp-project-id",
    dataset="your-bigquery-dataset",
    table="your-bigquery-table",
    location="us-central1",
    embedding_dimension=1536
)

# Add documents (in a real application, you would load these from a source)
documents = [
    Document(page_content="Google Cloud Platform offers a wide range of AI services including Vertex AI.",
             metadata={"category": "cloud", "provider": "google"}),
    Document(page_content="Azure provides machine learning services through Azure ML.",
             metadata={"category": "cloud", "provider": "microsoft"}),
    Document(page_content="AWS offers machine learning capabilities through SageMaker.",
             metadata={"category": "cloud", "provider": "amazon"})
]

# Add documents to the vector store
vector_store.add_documents(documents)

# Create a retriever
retriever = vector_store.as_retriever(
    search_type="similarity",
    search_kwargs={
        "k": 2,
        "filter": {"category": "cloud"}  # Only retrieve cloud-related documents
    }
)

# Create a QA chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever
)

# Query the system
response = qa_chain.run("What AI services does Google Cloud offer?")
print(response)

Conclusion

VertexFSVectorStore offers a powerful solution for implementing high-performance vector search in production AI applications on Google Cloud. By combining the storage capabilities of BigQuery with the low-latency serving of Vertex AI Feature Store, it provides an excellent balance between reliability and performance.

The flexibility to perform various types of searches, filter results based on metadata, and seamlessly integrate with LangChain’s retriever interface makes it a versatile choice for a wide range of applications, from semantic search to recommendation systems and RAG applications.

By following the best practices outlined in this guide and leveraging the advanced configuration options, you can build robust, scalable, and efficient vector search systems that meet the demands of production AI applications.

This post was originally written in my native language and then translated using an LLM. I apologize if there are any grammatical inconsistencies.

Share on

X Facebook LinkedIn Bluesky

Hand

Building High-Performance Vector Search Systems with VertexFSVectorStore: A Comprehensive Guide to Low-Latency Embedding Retrieval on Google Cloud

Building High-Performance Vector Search Systems with VertexFSVectorStore: A Comprehensive Guide to Low-Latency Embedding Retrieval on Google Cloud

Understanding VertexFSVectorStore

Setting Up VertexFSVectorStore

Adding Documents to the Vector Store

Adding Documents from Scratch

Adding Texts with Metadata

Adding Pre-computed Embeddings

Performing Vector Search

Basic Similarity Search

Similarity Search with Scores

Filtering During Search

Maximal Marginal Relevance (MMR) Search

Advanced Configuration for Production

Private Service Connect

Customizing Feature Store Configuration

Data Synchronization

Converting to BigQueryVectorStore for Batch Operations

Building a Retriever for LLM Applications

Performance Considerations and Best Practices

Example: Complete RAG Application

Conclusion

Share on

You may also enjoy

Implementing High-Performance Vector Search with FAISS in LangChain: A Complete Guide to Building Advanced RAG Applications

Integrating ChatYuan2: A Comprehensive Guide to Chinese Language Models in LangChain Applications

Building High-Performance RAG Systems with ThirdAI’s NeuralDBRetriever in LangChain: A Comprehensive Guide

Implementing Privacy-Focused AI: A Comprehensive Guide to Local LLM Deployment with LlamaCpp and LangChain