Comprehensive Guide to Integrating Amazon Bedrock LLMs with LangChain: Authentication, Guardrails, and Streaming Implementation

March 29, 2025 4 minute read

Comprehensive Guide to Integrating Amazon Bedrock LLMs with LangChain: Authentication, Guardrails, and Streaming Implementation

Amazon Bedrock has emerged as a powerful platform for accessing foundation models from leading AI companies. Combined with LangChain’s flexible framework, developers can build sophisticated AI applications with enterprise-grade features. This guide explores how to implement AWS Bedrock models in LangChain applications, covering authentication, guardrails, streaming, and more.

Introduction to BedrockLLM in LangChain

The BedrockLLM class in LangChain provides a seamless interface to Amazon Bedrock’s language models. It inherits from both LLM and BedrockBase classes, implementing the standard Runnable Interface that provides powerful methods like with_config, with_retry, and more.

Authentication Setup

AWS Bedrock requires proper authentication. LangChain’s BedrockLLM supports multiple authentication methods:

Method 1: Using AWS Credentials

from langchain_aws import BedrockLLM

# Using explicit credentials
llm = BedrockLLM(
    model_id="amazon.titan-text-express-v1",
    aws_access_key_id="YOUR_ACCESS_KEY_ID",
    aws_secret_access_key="YOUR_SECRET_ACCESS_KEY",
    aws_session_token="YOUR_SESSION_TOKEN",  # Optional for temporary credentials
    region_name="us-west-2"
)

Method 2: Using AWS Credential Profiles

# Using a specific profile from ~/.aws/credentials
llm = BedrockLLM(
    model_id="amazon.titan-text-express-v1",
    profile_name="bedrock-profile",
    region_name="us-west-2"
)

Method 3: Using Environment Variables

LangChain can also read credentials from environment variables:

AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_SESSION_TOKEN
AWS_REGION or AWS_DEFAULT_REGION

# Credentials will be automatically loaded from environment variables
llm = BedrockLLM(
    model_id="amazon.titan-text-express-v1"
)

Model Configuration

When initializing BedrockLLM, you need to specify which model to use:

# For standard foundation models
llm = BedrockLLM(
    model_id="anthropic.claude-v2",
    region_name="us-west-2"
)

# For custom or provisioned models (using ARN)
llm = BedrockLLM(
    model_id="arn:aws:bedrock:us-west-2:123456789012:provisioned-model/MyCustomModel",
    provider="anthropic",  # Provider must be specified for ARN models
    region_name="us-west-2"
)

Advanced Configuration with model_kwargs

You can customize model behavior by passing model-specific parameters:

llm = BedrockLLM(
    model_id="anthropic.claude-v2",
    model_kwargs={
        "temperature": 0.7,
        "max_tokens_to_sample": 500,
        "top_p": 0.9
    }
)

Implementing Guardrails

Bedrock supports guardrails to ensure safe and compliant AI responses. With LangChain, you can easily implement these guardrails:

llm = BedrockLLM(
    model_id="anthropic.claude-v2",
    client=bedrock_client,
    model_kwargs={},
    guardrails={
        "guardrailId": "my-guardrail-id",
        "guardrailVersion": "DRAFT"
    }
)

To enable tracing for guardrails, you can add:

from langchain.callbacks import BedrockAsyncCallbackHandler

llm = BedrockLLM(
    model_id="anthropic.claude-v2",
    guardrails={
        "guardrailId": "my-guardrail-id",
        "guardrailVersion": "DRAFT",
        "trace": True
    },
    callbacks=[BedrockAsyncCallbackHandler()]
)

You can also handle guardrail interventions in your callback handlers:

def on_llm_error(self, error: BaseException, **kwargs: Any):
    reason = kwargs.get("reason")
    if reason == "GUARDRAIL_INTERVENED":
        # Handle guardrail intervention
        print("Guardrail prevented potentially harmful content")
        # Log the event or take alternative actions

Streaming Implementation

Streaming allows for incremental responses, improving user experience for longer generations:

from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

# Enable streaming
llm = BedrockLLM(
    model_id="anthropic.claude-v2",
    streaming=True,
    callbacks=[StreamingStdOutCallbackHandler()]
)

# Use the stream method
for chunk in llm.stream("Explain quantum computing in simple terms"):
    # Process each chunk as it arrives
    print(chunk, end="", flush=True)

For async implementations:

async for chunk in llm.astream("Explain quantum computing in simple terms"):
    # Process each chunk asynchronously
    await process_chunk(chunk)

Token Management

LangChain’s BedrockLLM provides methods to manage token usage:

# Get token count for a string
token_count = llm.get_num_tokens("How does quantum computing work?")
print(f"This prompt uses {token_count} tokens")

# Get token count for messages
from langchain.schema import HumanMessage, SystemMessage
messages = [
    SystemMessage(content="You are a helpful assistant"),
    HumanMessage(content="Explain quantum computing")
]
token_count = llm.get_num_tokens_from_messages(messages)

Error Handling and Retries

Implement retry logic for handling transient errors:

from langchain.globals import set_debug

# Enable debug logging
set_debug(True)

# Create a Bedrock LLM with retry logic
llm_with_retry = BedrockLLM(
    model_id="anthropic.claude-v2"
).with_retry(
    stop_after_attempt=3,
    wait_exponential_jitter=True
)

# The LLM will now retry up to 3 times on failures
response = llm_with_retry.invoke("Explain the theory of relativity")

Caching Responses

Enable caching to improve performance and reduce costs:

from langchain.cache import InMemoryCache
from langchain.globals import set_llm_cache

# Set up a cache
set_llm_cache(InMemoryCache())

# Enable caching on the LLM
llm = BedrockLLM(
    model_id="anthropic.claude-v2",
    cache=True
)

# First call will hit the API
response1 = llm.invoke("What is the capital of France?")

# Second identical call will use the cache
response2 = llm.invoke("What is the capital of France?")

Fallbacks and Chaining Models

You can set up fallback mechanisms between different models:

from langchain.schema.runnable import RunnableWithFallbacks

# Primary model
primary_llm = BedrockLLM(
    model_id="anthropic.claude-v2"
)

# Fallback model
fallback_llm = BedrockLLM(
    model_id="amazon.titan-text-express-v1"
)

# Create a chain with fallback
llm_with_fallback = primary_llm.with_fallbacks(
    fallbacks=[fallback_llm],
    exceptions_to_handle=(Exception,)
)

# If the primary model fails, the fallback will be used
response = llm_with_fallback.invoke("Explain machine learning")

Saving and Loading Models

LangChain allows you to save and load model configurations:

# Save model configuration
llm.save("bedrock_model_config.yaml")

# Load model configuration
from langchain.load import load
loaded_llm = load("bedrock_model_config.yaml")

Complete Example: Building a Chatbot with BedrockLLM

Let’s put everything together in a complete example:

from langchain_aws import BedrockLLM
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.prompts import ChatPromptTemplate

# Initialize the BedrockLLM with streaming
llm = BedrockLLM(
    model_id="anthropic.claude-v2",
    region_name="us-west-2",
    streaming=True,
    callbacks=[StreamingStdOutCallbackHandler()],
    model_kwargs={
        "temperature": 0.7,
        "max_tokens_to_sample": 1000
    },
    guardrails={
        "guardrailId": "my-content-filter",
        "guardrailVersion": "DRAFT"
    }
)

# Create a conversation memory
memory = ConversationBufferMemory(return_messages=True)

# Create a prompt template
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful AI assistant that specializes in {topic}."),
    ("human", "{input}"),
])

# Create a conversation chain
conversation = ConversationChain(
    llm=llm,
    memory=memory,
    prompt=prompt
)

# Start the conversation
response = conversation.invoke({
    "topic": "quantum computing",
    "input": "Explain superposition in simple terms"
})

# Continue the conversation
follow_up = conversation.invoke({
    "topic": "quantum computing",
    "input": "How does that relate to quantum entanglement?"
})

Conclusion

Integrating Amazon Bedrock models with LangChain provides a powerful combination for building robust AI applications. The BedrockLLM class offers a comprehensive set of features including flexible authentication, guardrails implementation, streaming capabilities, and more.

By following this guide, you can effectively leverage AWS Bedrock’s foundation models while taking advantage of LangChain’s extensive framework for creating sophisticated AI applications with proper security, performance optimization, and error handling.

Whether you’re building a simple chatbot or a complex AI system, the combination of Bedrock and LangChain provides the tools you need to create production-ready applications that harness the power of large language models safely and effectively.

This post was originally written in my native language and then translated using an LLM. I apologize if there are any grammatical inconsistencies.

Share on

X Facebook LinkedIn Bluesky

Hand

Comprehensive Guide to Integrating Amazon Bedrock LLMs with LangChain: Authentication, Guardrails, and Streaming Implementation

Comprehensive Guide to Integrating Amazon Bedrock LLMs with LangChain: Authentication, Guardrails, and Streaming Implementation

Introduction to BedrockLLM in LangChain

Authentication Setup

Method 1: Using AWS Credentials

Method 2: Using AWS Credential Profiles

Method 3: Using Environment Variables

Model Configuration

Advanced Configuration with model_kwargs

Implementing Guardrails

Streaming Implementation

Token Management

Error Handling and Retries

Caching Responses

Fallbacks and Chaining Models

Saving and Loading Models

Complete Example: Building a Chatbot with BedrockLLM

Conclusion

Share on

You may also enjoy

Implementing High-Performance Vector Search with FAISS in LangChain: A Complete Guide to Building Advanced RAG Applications

Integrating ChatYuan2: A Comprehensive Guide to Chinese Language Models in LangChain Applications

Building High-Performance RAG Systems with ThirdAI’s NeuralDBRetriever in LangChain: A Comprehensive Guide

Implementing Privacy-Focused AI: A Comprehensive Guide to Local LLM Deployment with LlamaCpp and LangChain