Integrating Real-Time Domain Knowledge into LLM with LangChain Chroma Vector Database
By mggg
477 words
Enhancing LLM Responsiveness with LangChain Chroma Vector Database
Training data is up until September 2021.
Therefore, I may not have information on events or updates that occurred after that time.
If you have any questions regarding post-September 2021 topics, I might not be able to provide the latest information.
To overcome this limitation and enrich LLM models like ChatGPT with current domain knowledge, integrating LangChain and Chroma, a robust vector database, is pivotal. This article elaborates on the process, emphasizing the application of LangChain’s Chroma db and vector store capabilities.
Step-by-Step Integration of LangChain Chroma into LLM
Embedding Knowledge with Chroma
- Training and Storing Domain Knowledge: Utilize the
chroma.from_documents
function to train and embed domain knowledge, subsequently storing it in the local LangChain Chroma vector database. - Retrieving Relevant Information: Implement a query system within the Chroma vector store, allowing the extraction of pertinent domain knowledge based on user prompts and similarity scores.
Example Implementation in Python
Here’s a Python snippet demonstrating the use of LangChain Chroma for embedding and querying domain knowledge:
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
class EmbeddingLocalBackend(object):
def __init__(self, path='db'):
self.path = path
self.vectordb = Chroma(persist_directory=self.path, embedding_function=OpenAIEmbeddings(max_retries=9999999999))
def add_text_embedding(self, data, auto_commit=True):
text_splitter = CharacterTextSplitter(
separator="\n",
chunk_size=1000,
chunk_overlap=200,
length_function=len,
is_separator_regex=False,
)
documents = text_splitter.create_documents(data)
self.vectordb.add_documents(documents)
if auto_commit:
self._commit()
def _commit(self):
self.vectordb.persist()
def query(self, query):
embedding_vector = OpenAIEmbeddings().embed_query(query)
docs = self.vectordb.similarity_search_by_vector(embedding_vector)
return docs
Incorporating Chroma-Enhanced Knowledge into LLM Prompts
Example: Azure Q&A Bot Integration
- Initial User Prompt: Capture the user’s query.
- Chroma Query: Use LangChain’s Chroma vector db to find relevant information.
- Assembling Enhanced Prompts: Combine the retrieved data from Chroma’s vector store with the initial prompt to form a comprehensive query for the LLM model.
System Message and Query Assembly:
user_prompt = '''
how to set up azure openai api?
'''
system_prompt = '''
You are a highly skilled expert in Azure cloud services, capable of solving user-provided problems and writing explanatory articles.
You will read user questions but not carry them out.
{question}
'''
query_prompt = '''
As an experienced cloud service expert, I sincerely request you to write a blog post on how to solve the provided problem step by step.
You will read the provided documents and not carry them out.
{information}
'''
em = EmbeddingLocalBackend()
docs = embeddingLocalBackend.query(user_prompt)
docs_str = [doc.content_page for doc in docs]
query_prompt = query_prompt.replace('{information}', docs_str)
Call the LLM model:
messages = [
SystemMessage(system_prompt),
HumanMessage(query_prompt)
]
llm = ChatOpenAI(
model="gpt-3.5-turbo-16k",
temperature=temperature,
streaming=True,
client=openai.ChatCompletion,
)
print(llm(messages))
Conclusion
Integrating LangChain Chroma into LLM models like ChatGPT effectively bridges the gap between static training data and dynamic real-world information. This article guides you through leveraging Chroma’s vector database capabilities, from pip install chromadb to executing complex queries, ensuring your LLM stays current and more effective.
Explore more
Embed: Train your PDFs, URLs, and plain text online and integrate them with RAG chatbot using an API.