Integrating ChatFriendli with LangChain: A Practical Implementation Guide for Production Applications

April 24, 2025 4 minute read

Integrating ChatFriendli with LangChain: A Practical Implementation Guide for Production Applications

Introduction

Integrating powerful language models into your production applications requires reliable, flexible tools that can scale with your needs. ChatFriendli, a chat model accessible through LangChain’s ecosystem, provides developers with a robust solution for implementing conversational AI capabilities. This guide explores how to implement ChatFriendli in your LangChain applications, covering everything from basic setup to advanced features and best practices.

Getting Started with ChatFriendli

ChatFriendli is a chat model that implements LangChain’s BaseChatModel interface, making it compatible with the broader LangChain ecosystem. Before diving into implementation, you’ll need to install the necessary package:

pip install friendli-client

Basic Configuration

To use ChatFriendli, you’ll need to authenticate with your personal access token. You can provide this in two ways:

from langchain_community.chat_models import ChatFriendli

# Option 1: Using environment variables
# Set FRIENDLI_TOKEN in your environment variables
chat_model = ChatFriendli()

# Option 2: Providing the token directly
chat_model = ChatFriendli(friendli_token="your_personal_access_token")

Core Usage Patterns

ChatFriendli implements the standard Runnable interface from LangChain, which provides a consistent way to interact with various models. Here’s how to use it for basic chat interactions:

from langchain_core.messages import HumanMessage, SystemMessage

# Create message list
messages = [
    SystemMessage(content="You are a helpful assistant specializing in Python programming."),
    HumanMessage(content="How do I read a CSV file in Python?")
]

# Get a response
response = chat_model.invoke(messages)
print(response.content)

Streaming Responses

For applications that benefit from real-time responses, ChatFriendli supports streaming:

# Stream the response
for chunk in chat_model.stream(messages):
    print(chunk.content, end="", flush=True)

For asynchronous applications:

async def stream_response():
    async for chunk in chat_model.astream(messages):
        print(chunk.content, end="", flush=True)

import asyncio
asyncio.run(stream_response())

Advanced Features

Token Management

ChatFriendli provides methods to help you manage token usage, which is crucial for staying within context limits:

# Count tokens in a message
token_count = chat_model.get_num_tokens_from_messages(messages)
print(f"This conversation uses {token_count} tokens")

# Count tokens in a string
text_token_count = chat_model.get_num_tokens("Hello, how can I assist you today?")
print(f"This text uses {text_token_count} tokens")

Caching Responses

To improve performance and reduce API calls, you can enable caching:

from langchain.globals import set_llm_cache
from langchain.cache import InMemoryCache

# Set up a global cache
set_llm_cache(InMemoryCache())

# Enable caching on the model
chat_model = ChatFriendli(cache=True)

Tool Binding

ChatFriendli supports tool binding, allowing you to provide external tools that the model can use to enhance responses:

from langchain.tools import BaseTool

class WeatherTool(BaseTool):
    name = "get_weather"
    description = "Get the current weather in a given location"
    
    def _run(self, location: str) -> str:
        # Implement actual weather fetching logic
        return f"It's sunny in {location} with a temperature of 72°F"

# Bind the tool to the model
model_with_tools = chat_model.bind_tools([WeatherTool()])

# Use the model with tools
response = model_with_tools.invoke([
    HumanMessage(content="What's the weather like in San Francisco?")
])

Error Handling with Fallbacks

For production applications, implementing fallbacks is crucial for reliability:

from langchain_community.chat_models import ChatOpenAI

# Create a fallback model
fallback_model = ChatOpenAI()

# Set up the primary model with fallback
robust_model = chat_model.with_fallbacks(
    fallbacks=[fallback_model],
    exceptions_to_handle=(Exception,)
)

# Now if ChatFriendli fails, it will automatically try the fallback

Structured Output

For applications requiring structured data, you can format the model’s output to match a specific schema:

from pydantic import BaseModel, Field
from typing import List

class MovieRecommendation(BaseModel):
    title: str = Field(description="The title of the movie")
    year: int = Field(description="The year the movie was released")
    genre: str = Field(description="The primary genre of the movie")
    reasons: List[str] = Field(description="Reasons why this movie is recommended")

# Create a structured output model
structured_model = chat_model.with_structured_output(MovieRecommendation)

# Get structured recommendations
recommendations = structured_model.invoke([
    HumanMessage(content="Recommend me a sci-fi movie from the 90s")
])

# Access the structured data
print(f"Title: {recommendations.title}")
print(f"Year: {recommendations.year}")
print(f"Genre: {recommendations.genre}")
print("Reasons:")
for reason in recommendations.reasons:
    print(f"- {reason}")

Asynchronous Processing for Batch Operations

When processing multiple requests, ChatFriendli supports efficient batch operations:

# Prepare multiple inputs
batch_inputs = [
    [HumanMessage(content="What is machine learning?")],
    [HumanMessage(content="Explain neural networks")],
    [HumanMessage(content="How does natural language processing work?")]
]

# Process in batch
results = chat_model.batch(batch_inputs)

# Process asynchronously
async def process_batch_async():
    return await chat_model.abatch(batch_inputs)

batch_results = asyncio.run(process_batch_async())

Monitoring and Debugging

For production applications, monitoring is essential. ChatFriendli supports callbacks and event streaming:

from langchain.callbacks import StdOutCallbackHandler

# Create a callback handler
handler = StdOutCallbackHandler()

# Use the handler with the model
response = chat_model.invoke(
    messages,
    callbacks=[handler]
)

# Stream events for detailed monitoring
async def monitor_events():
    async for event in chat_model.astream_events(
        messages,
        version="v2",  # Use the latest event schema
    ):
        print(f"Event: {event['event']}, Data: {event['data']}")

asyncio.run(monitor_events())

Performance Optimization

To optimize performance in production environments, consider these strategies:

Implement caching as shown earlier to reduce duplicate API calls
Use streaming for responsive user interfaces
Configure retry logic for resilience:

resilient_model = chat_model.with_retry(
    stop_after_attempt=3,
    wait_exponential_jitter=True
)

Set rate limits to manage API usage:

from langchain.utils.rate_limiter import MaxTokensRateLimiter

rate_limiter = MaxTokensRateLimiter(
    max_tokens_per_minute=10000
)

limited_model = ChatFriendli(rate_limiter=rate_limiter)

Conclusion

Integrating ChatFriendli with LangChain provides a powerful foundation for building production-ready conversational AI applications. By leveraging the features described in this guide, you can create robust, efficient, and flexible systems that take full advantage of modern language models.

Remember that successful implementation requires attention to error handling, performance optimization, and proper integration with your existing application architecture. With the right approach, ChatFriendli can become a valuable component in your AI-powered applications.

Additional Resources

Explore the LangChain documentation for more details on the broader ecosystem
Check the Friendli client documentation for the latest updates and features
Consider exploring other LangChain components like memory systems, agents, and chains to build more complex applications

By following this guide, you should be well-equipped to implement ChatFriendli in your LangChain applications and leverage its capabilities for your production needs.

This post was originally written in my native language and then translated using an LLM. I apologize if there are any grammatical inconsistencies.

Share on

X Facebook LinkedIn Bluesky

Hand

Integrating ChatFriendli with LangChain: A Practical Implementation Guide for Production Applications

Integrating ChatFriendli with LangChain: A Practical Implementation Guide for Production Applications

Introduction

Getting Started with ChatFriendli

Basic Configuration

Core Usage Patterns

Streaming Responses

Advanced Features

Token Management

Caching Responses

Tool Binding

Error Handling with Fallbacks

Structured Output

Asynchronous Processing for Batch Operations

Monitoring and Debugging

Performance Optimization

Conclusion

Additional Resources

Share on

You may also enjoy

Implementing High-Performance Vector Search with FAISS in LangChain: A Complete Guide to Building Advanced RAG Applications

Integrating ChatYuan2: A Comprehensive Guide to Chinese Language Models in LangChain Applications

Building High-Performance RAG Systems with ThirdAI’s NeuralDBRetriever in LangChain: A Comprehensive Guide

Implementing Privacy-Focused AI: A Comprehensive Guide to Local LLM Deployment with LlamaCpp and LangChain