Below you will find pages that utilize the taxonomy term “AI”
Jiamin's Blog
Harnessing the Power of OpenAI's Latest Innovations
Introduction: Embracing the Future with OpenAI’s Updates In the ever-evolving landscape of artificial intelligence, staying updated with the latest advancements is not just a matter of curiosity, but a necessity for those looking to leverage AI for their projects. On the 11th of June, 2023, OpenAI introduced a slew of new features, marking a significant update to their Python SDK, now at version 1.0.0. In this blog, we’ll dive into these updates and explore how they can revolutionize the way we interact with AI.
Jiamin's 博客
OpenAI 11.06更新
OpenAI 11.06更新 在这篇博客中,将讨论 OpenAI 11.06 的一些更新,更新主要有:
聊天内容支持图片, gpt-4-vision-preview 返回内容支持json模式 引入system_fingerprint, 支持可复现性 OpenAI 多模态模型 OpenAI 引入的最令人兴奋的新功能之一是多模态模型,它可以处理文本和图像的组合。这一能力为 AI 应用打开了一个新的维度,从增强的视觉数据分析到更互动的聊天机器人。
GPT-4 Vision: gpt-4-vision-preview 示例: 分析阿里巴巴股票的K线.
import openai openai.api_key = "your-api-key" response = openai.ChatCompletion.create( model="gpt-4-vision-preview", messages=[ { "role": "user", "content": [ {"type": "text", "text": "What information can you understand from the K-line of the image?"}, { "type": "image_url", "image_url": "https://mggg.cloud/img/ali.png", }, ], } ], max_tokens=300, ) print(response.choices[0].message.content) output:
The image appears to show a candlestick chart for a stock, specifically ticker 'BABA' which is Alibaba Group Holding Limited.
Jiamin's Blog
Langchain LLM Streaming
Langchain LLM Streaming Langchain offers the capability to perform real-time processing of tokens generated by LLM through a callback mechanism.
from langchain.chat_models import ChatOpenAI from langchain.schema import ( HumanMessage, ) from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler chat = ChatOpenAI(streaming=True, callbacks=[StreamingStdOutCallbackHandler()], temperature=0) resp = chat([HumanMessage(content="Write me a song about sparkling water.")]) Langchain supports both synchronous and asynchronous IO for token output. This corresponds to StreamingStdOutCallbackHandler and AsyncIteratorCallbackHandler, respectively.
StreamingStdOutCallbackHandler First, let’s take a look at the Langchain official implementation of StreamingStdOutCallbackHandler, which allows for real-time printing of LLM-generated tokens to the terminal.
Jiamin's 博客
Langchain LLM Streaming
Langchain LLM Streaming langchain通过callback机制,可以将LLM输出的token实时处理
from langchain.chat_models import ChatOpenAI from langchain.schema import ( HumanMessage, ) from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler chat = ChatOpenAI(streaming=True, callbacks=[StreamingStdOutCallbackHandler()], temperature=0) resp = chat([HumanMessage(content="Write me a song about sparkling water.")]) langchian同时支持同步IO和异步IO的方式来输出token,分别对应StreamingStdOutCallbackHandler 和AsyncIteratorCallbackHandler
StreamingStdOutCallbackHandler 首先可以看看Langchain官方实现StreamingStdOutCallbackHandler, 将LLM输出的token实时打印在终端, 主要实现了on_llm_new_token
class StreamingStdOutCallbackHandler(BaseCallbackHandler): ... def on_llm_new_token(self, token: str, **kwargs: Any) -> None: """Run on new LLM token. Only available when streaming is enabled.""" sys.stdout.write(token) sys.stdout.flush() 但这种方式是同步的,接下来看看异步IO的方式
AsyncIteratorCallbackHandler 以下是使用AsyncIteratorCallbackHandler, 异步打印返回的token
import asyncio from langchain.callbacks import AsyncIteratorCallbackHandler from langchain.
Jiamin's Blog
Adding Real-time Domain Knowledge to LLM with LangChain Vector Database
Adding Latest Domain Knowledge to LLM with LangChain Vector Database When using chatgpt, we often encounter certain prompts:
Training data is up until September 2021. Therefore, I may not have information on events or updates that occurred after that time. If you have any questions regarding post-September 2021 topics, I might not be able to provide the latest information. To ensure that the LLM model possesses real-time domain knowledge, it becomes necessary to incorporate up-to-date information into the model.
Jiamin's 博客
基于LangChain向量数据库,为LLM添加实时的领域知识
基于langchain向量数据库, 为LLM添加最新的领域知识 在我们使用的chatgpt的时候,往往会遇到一些提示:
截止到2021年9月的训练数据。 因此,我不具备在那个时间点之后发生的事件或获取的信息。 如果你有关于2021年9月之后的问题,我可能就无法提供最新的信息。 因此,为了让LLM模型具备实时的领域知识,将实时的信息添加到模型中变得非常必要。接下来,我们将介绍如何通过LangChain和Chroma向量数据库,为ChatGPT添加最新的领域知识。
步骤如下:
训练文档知识,存到本地向量数据库chroma 根据用户prompt去查询向量数据库, 根据相似度获取相似领域知识 包装领域知识到prompt中 LangChain + Chroma 将文档知识嵌入并存储到本地向量数据库Chroma中。 根据用户的提示查询向量数据库,获取相似领域知识。 代码如下, 我们通过 add_text_embedding来解析文本为向量, 通过query来查询相似度的领域知识
from langchain.embeddings import OpenAIEmbeddings from langchain.text_splitter import CharacterTextSplitter from langchain.vectorstores import Chroma class EmbeddingLocalBackend(object): def __init__(self, path='db'): self.path = path self.vectordb = Chroma(persist_directory=self.path, embedding_function=OpenAIEmbeddings(max_retries=9999999999)) def add_text_embedding(self, data, auto_commit=True): text_splitter = CharacterTextSplitter( separator="\n", chunk_size=1000, chunk_overlap=200, length_function=len, is_separator_regex=False, ) documents = text_splitter.create_documents(data) self.vectordb.add_documents(documents) if auto_commit: self._commit() def _commit(self): self.vectordb.persist() def query(self, query): embedding_vector = OpenAIEmbeddings().