Implementing Production-Grade RAG Systems with AWS MemoryDB and LangChain’s InMemoryVectorStore: A Complete Guide to Custom Schemas and Advanced Filtering

May 9, 2025 7 minute read

Implementing Production-Grade RAG Systems with AWS MemoryDB and LangChain’s InMemoryVectorStore: A Complete Guide to Custom Schemas and Advanced Filtering

Building production-ready Retrieval Augmented Generation (RAG) applications requires a robust vector database that can handle high throughput, maintain low latency, and provide advanced filtering capabilities. AWS MemoryDB combined with LangChain’s InMemoryVectorStore offers a powerful solution for enterprise-grade RAG systems. This article explores how to implement such systems, focusing on custom schemas and advanced filtering techniques.

Introduction to AWS MemoryDB and LangChain Integration

AWS MemoryDB is a durable, in-memory database service compatible with Redis that provides ultra-fast performance with high availability and durability. When paired with LangChain’s InMemoryVectorStore, it becomes an excellent choice for production RAG applications that need to handle large volumes of vector embeddings with sophisticated query capabilities.

The InMemoryVectorStore class in LangChain provides a convenient interface for working with AWS MemoryDB as a vector database. It supports various connection methods, custom schemas, and advanced filtering options that make it suitable for production use cases.

Setting Up the Environment

Before diving into implementation details, let’s set up the necessary environment. You’ll need to install the Redis Python package to work with AWS MemoryDB:

pip install redis langchain-aws

Connecting to AWS MemoryDB

The InMemoryVectorStore supports several connection URL schemas:

from langchain_aws.vectorstores.inmemorydb import InMemoryVectorStore
from langchain_openai import OpenAIEmbeddings

# Simple connection
redis_url = "redis://host:port"

# Connection with authentication
redis_url = "redis://username:password@host:port"

# Connection with SSL
redis_url = "rediss://host:port"

# Connection with SSL and authentication
redis_url = "rediss://username:password@host:port"

# Initialize the vector store
embedding_model = OpenAIEmbeddings()
vector_store = InMemoryVectorStore(
    redis_url=redis_url,
    index_name="my_production_index",
    embedding=embedding_model
)

Creating and Populating the Vector Store

There are several ways to create and populate an InMemoryVectorStore:

Method 1: Creating from Documents

from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter

# Load and split documents
loader = TextLoader("path/to/document.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

# Create vector store from documents
vector_store = InMemoryVectorStore.from_documents(
    documents=docs,
    embedding=embedding_model,
    index_name="my_production_index"
)

Method 2: Creating from Texts

texts = [
    "AWS MemoryDB provides Redis compatibility with durability",
    "LangChain offers tools for building LLM applications",
    "Vector databases store embeddings for semantic search"
]

metadatas = [
    {"source": "aws_docs", "category": "database", "year": 2023},
    {"source": "langchain_docs", "category": "framework", "year": 2022},
    {"source": "vector_db_article", "category": "database", "year": 2023}
]

vector_store = InMemoryVectorStore.from_texts(
    texts=texts,
    embedding=embedding_model,
    metadatas=metadatas,
    index_name="my_production_index"
)

Method 3: Creating and Getting Keys

For cases where you need to track the Redis keys of the stored documents:

vector_store, keys = InMemoryVectorStore.from_texts_return_keys(
    texts=texts,
    embedding=embedding_model,
    metadatas=metadatas,
    index_name="my_production_index"
)
print(f"Document keys: {keys}")

Connecting to an Existing Index

For production systems, you often need to connect to an existing index:

vector_store = InMemoryVectorStore.from_existing_index(
    embedding=embedding_model,
    index_name="my_production_index",
    redis_url=redis_url
)

Advanced Configuration: Custom Vector Schema

The default vector schema in InMemoryVectorStore uses flat (k-NN) search. For production workloads, you might want to use HNSW (Hierarchical Navigable Small World) which offers better performance for approximate nearest neighbor search:

# Define custom vector schema
vector_schema = {
    "algorithm": "HNSW",    # Use HNSW instead of FLAT
    "m": 16,                # Number of connections per node
    "ef_construction": 200, # Size of the dynamic candidate list during construction
    "distance_metric": "COSINE"  # Distance metric: COSINE, IP, or L2
}

# Create vector store with custom vector schema
vector_store = InMemoryVectorStore(
    redis_url=redis_url,
    index_name="production_hnsw_index",
    embedding=embedding_model,
    vector_schema=vector_schema
)

Advanced Configuration: Custom Index Schema

One of the most powerful features of InMemoryVectorStore is the ability to customize how metadata is indexed, enabling advanced filtering capabilities:

# Define custom index schema as a dictionary
index_schema = {
    "properties": [
        {
            "name": "category",
            "type": "TAG"
        },
        {
            "name": "year",
            "type": "NUMERIC"
        },
        {
            "name": "source",
            "type": "TEXT"
        }
    ]
}

# Alternatively, define schema as YAML
index_schema_yaml = """
properties:
  - name: category
    type: TAG
  - name: year
    type: NUMERIC
  - name: source
    type: TEXT
"""

# Create vector store with custom index schema
vector_store = InMemoryVectorStore(
    redis_url=redis_url,
    index_name="production_filtered_index",
    embedding=embedding_model,
    index_schema=index_schema  # or index_schema_yaml
)

By default, InMemoryVectorStore automatically generates an index schema with the following rules:

Strings are indexed as text fields
Numbers are indexed as numeric fields
All other types are not indexed

You can override these defaults with a custom schema, which is particularly useful for optimizing filter performance.

Example: Custom Schema for Numeric String Fields

If you have string fields that contain numeric values that you want to filter numerically:

# Define custom schema that treats a string field as numeric
index_schema = {
    "properties": [
        {
            "name": "credit_score",
            "type": "NUMERIC"  # Even though credit_score might be stored as a string
        }
    ]
}

# Create vector store with this schema
vector_store = InMemoryVectorStore(
    redis_url=redis_url,
    index_name="credit_data_index",
    embedding=embedding_model,
    index_schema=index_schema
)

# Add documents with credit scores
texts = ["Customer profile information"]
metadatas = [{"credit_score": "720"}]  # String, but will be indexed as numeric

vector_store.add_texts(texts=texts, metadatas=metadatas)

Performing Similarity Searches with Filtering

Now that we’ve set up our vector store with custom schemas, we can perform advanced similarity searches with filtering:

# Basic similarity search
results = vector_store.similarity_search(
    query="Redis compatible database",
    k=5  # Return top 5 results
)

# Similarity search with metadata filtering
from langchain_aws.vectorstores.inmemorydb import InMemoryDBFilterExpression

# Filter for documents from 2023 about databases
filter_expr = (
    InMemoryDBFilterExpression.field("category").equals("database") &
    InMemoryDBFilterExpression.field("year").equals(2023)
)

filtered_results = vector_store.similarity_search(
    query="Redis compatible database",
    k=5,
    filter=filter_expr
)

# Similarity search with distance threshold
threshold_results = vector_store.similarity_search(
    query="Redis compatible database",
    k=10,
    distance_threshold=0.2  # Only return results with distance less than 0.2
)

Advanced Search Techniques

InMemoryVectorStore supports various advanced search techniques:

Maximal Marginal Relevance (MMR) Search

MMR search balances relevance to the query with diversity among results:

mmr_results = vector_store.max_marginal_relevance_search(
    query="database technologies",
    k=5,                # Number of results to return
    fetch_k=20,         # Number of results to consider before reranking
    lambda_mult=0.5     # Balance between relevance (1.0) and diversity (0.0)
)

Similarity Search with Relevance Scores

To get both documents and their relevance scores:

scored_results = vector_store.similarity_search_with_relevance_scores(
    query="vector database technology"
)

for doc, score in scored_results:
    print(f"Document: {doc.page_content[:50]}... | Relevance: {score}")

Managing the Vector Store

Adding More Documents

You can add more documents to an existing vector store:

new_docs = text_splitter.split_documents(new_documents)
vector_store.add_documents(new_docs)

# Or add texts directly
new_texts = ["New information about AWS MemoryDB"]
new_metadatas = [{"source": "recent_update", "category": "database", "year": 2023}]
vector_store.add_texts(texts=new_texts, metadatas=new_metadatas)

Deleting Documents

# Delete specific documents by their IDs
vector_store.delete(ids=["doc_id_1", "doc_id_2"])

# Delete all documents (use with caution)
vector_store.delete()

Dropping an Index

To completely remove an index:

InMemoryVectorStore.drop_index(
    index_name="my_production_index",
    delete_documents=True,
    redis_url=redis_url
)

Using the Vector Store as a Retriever

You can easily convert your vector store into a LangChain retriever for use in chains and agents:

retriever = vector_store.as_retriever(
    search_type="mmr",  # Can be "similarity", "mmr", or "similarity_score_threshold"
    search_kwargs={
        "k": 5,
        "lambda_mult": 0.7,
        "filter": filter_expr
    }
)

# Use the retriever in a chain
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model_name="gpt-4")
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever
)

result = qa_chain.invoke({"query": "What are the advantages of AWS MemoryDB?"})
print(result["result"])

Exporting and Managing Schemas

For documentation and version control, you can write the schema to a file:

vector_store.write_schema("path/to/schema.yaml")

Asynchronous Operations

InMemoryVectorStore also supports asynchronous operations, which can be beneficial for high-throughput applications:

import asyncio

async def async_search_example():
    results = await vector_store.asimilarity_search(
        query="Redis compatible database",
        k=5
    )
    return results

results = asyncio.run(async_search_example())

Best Practices for Production RAG Systems

Choose the right algorithm: Use HNSW instead of FLAT for large datasets to balance search speed and accuracy.
Optimize metadata schema: Define proper field types (TAG, TEXT, NUMERIC) based on how you’ll query the data.
Use TAG fields for exact matching: For fields where you need exact matches (like categories or IDs), use TAG type.
Set appropriate distance thresholds: Use distance_threshold to filter out low-quality matches.
Implement proper error handling: In production, wrap vector store operations in try-except blocks.
Monitor performance: Keep track of query latency and result quality.
Implement connection pooling: For high-throughput applications, use connection pooling to manage Redis connections.
Regularly backup your data: Even though MemoryDB is durable, implement regular backups.

Conclusion

AWS MemoryDB combined with LangChain’s InMemoryVectorStore provides a powerful foundation for building production-grade RAG systems. The ability to customize vector and index schemas allows for optimized performance and advanced filtering capabilities, making it suitable for enterprise applications.

By leveraging the techniques discussed in this article—custom schemas, advanced filtering, and different search methods—you can build sophisticated RAG applications that deliver accurate and relevant results efficiently.

Whether you’re building a customer service bot, a document retrieval system, or a knowledge management application, the combination of AWS MemoryDB and LangChain’s InMemoryVectorStore offers the flexibility, performance, and scalability needed for production deployments.

This post was originally written in my native language and then translated using an LLM. I apologize if there are any grammatical inconsistencies.

Share on

X Facebook LinkedIn Bluesky

Hand

Implementing Production-Grade RAG Systems with AWS MemoryDB and LangChain’s InMemoryVectorStore: A Complete Guide to Custom Schemas and Advanced Filtering

Implementing Production-Grade RAG Systems with AWS MemoryDB and LangChain’s InMemoryVectorStore: A Complete Guide to Custom Schemas and Advanced Filtering

Introduction to AWS MemoryDB and LangChain Integration

Setting Up the Environment

Connecting to AWS MemoryDB

Creating and Populating the Vector Store

Method 1: Creating from Documents

Method 2: Creating from Texts

Method 3: Creating and Getting Keys

Connecting to an Existing Index

Advanced Configuration: Custom Vector Schema

Advanced Configuration: Custom Index Schema

Example: Custom Schema for Numeric String Fields

Performing Similarity Searches with Filtering

Advanced Search Techniques

Maximal Marginal Relevance (MMR) Search

Similarity Search with Relevance Scores

Managing the Vector Store

Adding More Documents

Deleting Documents

Dropping an Index

Using the Vector Store as a Retriever

Exporting and Managing Schemas

Asynchronous Operations

Best Practices for Production RAG Systems

Conclusion

Share on

You may also enjoy

Advanced Data Transformation in LangChain: Mastering RunnableAssign for Dynamic Pipeline Construction

Leveraging Alibaba Cloud’s Hologres as a Vector Store for Advanced Similarity Search in LangChain Applications

Advanced LLM Application Development: Leveraging ChatWrapper in LangChain for Streaming, Tool Binding, and Fallback Strategies

Advanced RAG Techniques: Implementing Sparse Vector Retrieval with Qdrant and LangChain