4 minute read

Building Advanced RAG Applications with FalkorDB Vector Store in LangChain: A Comprehensive Guide

Retrieval Augmented Generation (RAG) has emerged as a powerful paradigm in modern AI applications, allowing language models to access external knowledge for more accurate and contextual responses. One of the key components in any RAG system is the vector store, which enables efficient semantic search across documents. In this guide, we’ll explore how to implement advanced RAG applications using FalkorDB as a vector store within the LangChain framework.

What is FalkorDB Vector Store?

FalkorDB is a graph database that provides vector indexing capabilities, making it an excellent choice for RAG applications. When integrated with LangChain, it allows developers to store embeddings and perform similarity searches efficiently. The FalkorDBVector class in LangChain’s community integrations provides a robust interface for vector storage and retrieval.

Setting Up FalkorDB Vector Store

To get started with FalkorDB in LangChain, you need to install the required package:

pip install falkordb langchain

Here’s a basic example of how to initialize and use FalkorDB Vector Store:

from langchain_community.vectorstores.falkordb_vector import FalkorDBVector
from langchain_community.embeddings.openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter
from langchain_community.document_loaders import TextLoader

# Connection settings
host = "localhost"
port = 6379

# Load and split documents
raw_documents = TextLoader('data/state_of_the_union.txt').load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
documents = text_splitter.split_documents(raw_documents)

# Initialize embeddings
embeddings = OpenAIEmbeddings()

# Create vector store
vectorstore = FalkorDBVector.from_documents(
    embedding=embeddings,
    documents=documents,
    host=host,
    port=port,
)

Configuration Options

FalkorDBVector offers numerous configuration options to tailor it to your specific use case:

Basic Connection Settings

  • host: FalkorDB host address
  • port: FalkorDB port number
  • username and password: For connecting to FalkorDB Cloud instances
  • ssl: Whether to use SSL/TLS encryption (default: False)

Data Structure Settings

  • database: Database name (optional, auto-generated if not provided)
  • node_label: Label for nodes storing embeddings (default: “Chunk”)
  • relation_type: Type of relationship for stored embeddings (default: “”)
  • embedding_node_property: Property name for storing embeddings (default: “embedding”)
  • text_node_property: Property name for storing text (default: “text”)

Search Settings

  • distance_strategy: Distance metric to use (default: “EUCLIDEAN”)
  • search_type: Search type, either VECTOR or HYBRID (default: VECTOR)
  • index_type: Vector index type to use

Advanced RAG Operations

Adding Documents

You can add documents to an existing FalkorDB vector store in several ways:

# Adding documents
ids = vectorstore.add_documents(new_documents)

# Adding texts with metadata
texts = ["Text content 1", "Text content 2"]
metadatas = [{"source": "file1.txt"}, {"source": "file2.txt"}]
ids = vectorstore.add_texts(texts, metadatas=metadatas)

# Adding pre-computed embeddings
embeddings_list = [[0.1, 0.2, ...], [0.3, 0.4, ...]]
ids = vectorstore.add_embeddings(texts, embeddings_list, metadatas=metadatas)

FalkorDB supports various search methods:

# Basic similarity search
docs = vectorstore.similarity_search("What is artificial intelligence?", k=5)

# Search with scores
docs_with_scores = vectorstore.similarity_search_with_score("What is artificial intelligence?")

# Search by vector
query_embedding = embeddings.embed_query("What is artificial intelligence?")
docs = vectorstore.similarity_search_by_vector(query_embedding, k=5)

Maximal Marginal Relevance (MMR)

For more diverse search results, you can use MMR:

# MMR search balances relevance and diversity
docs = vectorstore.max_marginal_relevance_search(
    "What is artificial intelligence?",
    k=5,
    fetch_k=20,
    lambda_mult=0.5  # 0 = max diversity, 1 = max relevance
)

Filtering Results

You can filter search results based on metadata:

# Filter by metadata
filter_dict = {"source": "file1.txt"}
docs = vectorstore.similarity_search(
    "What is artificial intelligence?",
    filter=filter_dict
)

Creating and Managing Indexes

FalkorDB Vector Store allows you to create and manage different types of indexes:

# Create a new node index
vectorstore.create_new_node_index(
    node_label="Document",
    embedding_node_property="vector",
    embedding_dimension=1536
)

# Create a new relationship index
vectorstore.create_new_index_on_relationship(
    relation_type="RELATED_TO",
    embedding_node_property="similarity",
    embedding_dimension=1536
)

# Create a keyword index for hybrid search
vectorstore.create_new_keyword_index(
    text_node_properties=["title", "content"]
)

Working with Existing Indexes

You can also connect to existing indexes:

# Connect to an existing node index
existing_store = FalkorDBVector.from_existing_index(
    embedding=embeddings,
    node_label="Document"
)

# Connect to an existing relationship index
existing_rel_store = FalkorDBVector.from_existing_relationship_index(
    embedding=embeddings,
    relation_type="RELATED_TO"
)

# Connect to an existing graph
existing_graph_store = FalkorDBVector.from_existing_graph(
    embedding=embeddings,
    database="my_database",
    node_label="Document",
    embedding_node_property="embedding",
    text_node_properties=["title", "content"]
)

Building a Complete RAG Application

Let’s put everything together to create a complete RAG application using FalkorDB Vector Store:

from langchain.chains import RetrievalQA
from langchain_community.llms import OpenAI
from langchain_community.vectorstores.falkordb_vector import FalkorDBVector
from langchain_community.embeddings.openai import OpenAIEmbeddings
from langchain_community.document_loaders import DirectoryLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

# 1. Load documents
loader = DirectoryLoader('./data', glob="**/*.txt")
documents = loader.load()

# 2. Split documents
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)
splits = text_splitter.split_documents(documents)

# 3. Initialize embeddings
embeddings = OpenAIEmbeddings()

# 4. Create vector store
vectorstore = FalkorDBVector.from_documents(
    embedding=embeddings,
    documents=splits,
    host="localhost",
    port=6379,
    search_type="HYBRID",  # Use hybrid search for better results
    pre_delete_collection=True  # Start fresh
)

# 5. Create a retriever
retriever = vectorstore.as_retriever(
    search_type="mmr",  # Use MMR for diverse results
    search_kwargs={"k": 5, "fetch_k": 20, "lambda_mult": 0.7}
)

# 6. Create a QA chain
llm = OpenAI(temperature=0)
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever
)

# 7. Query the system
query = "What are the main challenges in implementing AI systems?"
result = qa_chain.run(query)
print(result)

Performance Optimization Tips

To get the best performance from FalkorDB Vector Store in your RAG application:

  1. Choose the right distance strategy: For text embeddings, “COSINE” often works better than “EUCLIDEAN”.

  2. Use hybrid search: Combining vector search with keyword search can improve relevance.

  3. Tune your k values: Experiment with different values for k and fetch_k in MMR searches.

  4. Optimize chunk size: Smaller chunks may increase precision but reduce context, while larger chunks do the opposite.

  5. Use metadata filters: Implement filters to narrow down search results based on document metadata.

Conclusion

FalkorDB Vector Store provides a powerful foundation for building advanced RAG applications within the LangChain ecosystem. Its graph database capabilities, combined with efficient vector indexing, make it an excellent choice for applications requiring semantic search over large document collections.

By leveraging the features discussed in this guide, you can build sophisticated RAG applications that provide accurate, contextual, and diverse responses to user queries. Whether you’re building a question-answering system, a document search engine, or a knowledge base interface, FalkorDB Vector Store offers the flexibility and performance needed for production-grade RAG applications.

This post was originally written in my native language and then translated using an LLM. I apologize if there are any grammatical inconsistencies.

Categories:

Updated: