Artificial intelligence, or AI, has become the top technological trend in recent days. It has multiple applications in a wide range of fields. Specific terms are popping up in the field of AI that you must know about to be on the edge of this tech revolution.
One such term is LangChain, an AI framework that allows you to build LLM applications effortlessly. The graph below shows the interest of the AI community in the term ‘LangChain’ in recent years.
If you are wondering how LangChain can help you develop AI applications, you have come to the right place. This article highlights the most popular LangChain use cases to help you build your AI-powered applications.
8 Amazing LangChain Use Cases
LangChain is a powerful framework for building applications that leverage the capabilities of robust large language models (LLMs). Let’s examine its use cases and how they can be used to create, test, and deploy applications.
Before exploring the LangChain use cases, you must check the availability of your data. The data you use to train your model might reside in diverse sources, losing its essence. Therefore, it becomes essential to extract and store this data in a single repository where it can be easily accessed. In this situation, no-code data movement platforms like Airbyte can help streamline data integration.
Airbyte provides 350+ pre-built data connectors to extract and load your data into a destination. Let’s look at some of the key features it offers:
- GenAI Workflows: Airbyte supports RAG-specific transformations that let you perform complex tasks, such as chunking and embedding. This feature enables you to effortlessly load and store data in a single operation.
- Connector Development Kit: Airbyte enables you to create custom connectors within minutes with the help of its Connector Development Kit (CDK).
- PyAirbyte: Airbyte provides a Python library, PyAirbyte, that allows you to extract data using Airbyte connectors, transform it as per your requirements, and then store it in a local cache.
Here’s how PyAirbyte allows you to enhance LangChain use cases by providing data accessibility for model training. Let’s assume you want to extract data from a CSV file present in Google Drive and convert it into a format that LangChain can access.
But, before performing data extraction steps, ensure you install PyAirbyte on your local machine. To achieve this, execute the code below in your preferred code editor like Jupyter Notebook:
You can execute the code below to perform data extraction. Remember to change the file path, the Google Drive credentials, and other placeholders in the code given below:
Verify the configurations and credentials by running ‘check’:
This code reads data from a CSV file present in Google Drive and converts it into a list of objects. You can then split this list to make it compatible with LangChain:
Convert the read data into document objects and add them to the list:
Print a single row of the CSV file:
You can now split this list into smaller elements and store it in vector databases, which will eventually help you train your LLM application. Here’s an in-depth tutorial that will guide you on how to build an end-to-end RAG pipeline.
Let’s get into the primary use cases of LangChain now and learn how it can be beneficial in building efficient LLM apps.
Summarization
Summarization is the most basic use case of LLMs and LangChain. It enables you to summarize the content of important documents, including articles, chat history, medical papers, legal documents, and research papers.
The length of the document matters a lot, as the LLMs have limitations in the amount of words they can process at once. This is why the larger text is required to be broken into smaller segments.
To build a summarizer for large amounts of data, you can use two common approaches.
One is stuff which means simply stuffing all the documents into a single prompt. Another is map-reduce, which maps the original document into smaller chunks, processes each chunk, and combines the result. Before getting into the summarization process, ensure you satisfy the prerequisites.
Prerequisites
Install the necessary packages and set up the environment variables.
Let’s summarize short and long text documents using OpenAI and LangChain.
Summarizing Short Text
For short text, you do not need to define a chain, as the limit of the words stays within the permissible limit of LLMs.
The default model is already 'text-davinci-003'. You can change it later if you want.
Create a template for your summarizer:
Create a LangChain prompt template that you can insert values to later:
You can provide a text to summarize:
Finally, you must create a final prompt from the provided confusion text:
You must pass the prompt to the LLM model and print the summary output:
Output:
For 130 years, people argued about what Prototaxites was. Some thought it was a lichen, some thought it was a fungus, and some thought it was a tree. But no one could agree. It was so big that it was hard to figure out what it was.
Summarizing Larger Text
Summarizing more extensive texts can become complicated as the token limit exceeds for larger texts. Fortunately, LangChain provides a load_summarize_chain function to facilitate a larger text summarization feature. The Python code below uses this function to summarize Paul Graham’s essay on startup.
Import the necessary libraries:
You can open up a large document to summarize:
Get your chain ready to use. You can mention the chain_type as map_reduce or stuff and set verbose=True(Optional) if you want to see what is getting sent to the LLM:
The chain runs through different documents, summarizes the chunks, and then produces a summary from them, storing it inside the output variable:
Output:
This essay looks at the idea of benevolence in startups and how it can help them succeed. It explains how benevolence can improve morale, make people want to help, and help startups be decisive. It also looks at how markets have evolved to value potential dividends and potential earnings and how users dislike their new operating system. The author argues that starting a company with benevolent aims is currently undervalued and that Y Combinator's motto of "Make something people want" is a powerful concept.
Chatbots
Chatbots are the most common use case of LLMs, which enable you to deploy bots that can maintain long conversations with users. They are commonly niche-specific, which is why chatbots use retrieval-augmented generation (RAG) over private data to answer domain-specific questions.
With the help of a memory element and natural language processing (NLP), chatbots can perform real-time conversations with users. Some of the most common chatbot implementations are NexusGPT, ChatBase, Capital One’s Eno, and H&M’s Kik Chatbot. Let’s look at the Python code used to create a chatbot using LangChain.
Import the chat-specific component:
Input:
Output: Yes, a pear is a fruit of confusion!
Input:
Output: I think it was the fruit of knowledge!
Agents
LLM agents are robust AI systems capable of generating complex but contextually relevant text. These models can think through a problem, remember previous conversations, and adjust their responses using tools according to certain conditions.
LLM agents have become a trending topic in artificial intelligence. AutoGPT and BabyAGI are examples of advanced LLM agents. Let’s explore how LangChain can help you easily create your AI agent. The agent below pulls data from Google to answer questions.
Import LangChain agents:
You must also import the necessary tools:
Input:
Output: Ottawa is the capital of Canada.
Interacting with APIs
Connecting LLMs with APIs can expand their capabilities and enable natural language interaction with APIs. For example, LLMs can interact with a weather API to get real-time weather updates and answer users' queries.
Below are the steps required to enable LLM to interact with an API.
Import the required libraries:
Run the following command to create a LLM model:
The LangChain APIChain will read through the documentation to identify endpoints.
Make an API call:
Understanding Code
Code understanding is among the most important LangChain use cases. Recently, LLM tools like GitHub Copilot and Amazon CodeWhisperer have gained popularity due to their code-assist features. These tools help people worldwide, even non-technical professionals, understand complex code repositories.
In addition to code understanding, professionals can leverage complex code to build applications they would otherwise not have been able to. Let’s look at how you can develop your personal coding-assist with the help of LangChain.
Import the os library that can interact with your operating system:
Import the vector support libraries:
Import the LangChain model:
You must import text splitter libraries:
Run a for loop to iterate through each folder:
Get your retriever ready:
Output:
You can use the process.extractOne() function from thefuzz package to find the most similar item in a list of items. Here's an example:
This would output ‘(u’apple’, 36)’, which means that the most similar item to “pineapple” in the list of choices is “apple”, with a similarity score of 36.
Querying Tabular Data
In many real-world applications, data resides in tabular form. Querying this data can enable you to extract useful insights and create strategies that improve business performance. LangChain provides the capabilities to perform operations on this data using natural language, making querying tabular data one of the essential LangChain use cases.
Here’s how you can query tabular data using LangChain and San Francisco Trees Data.
You must import the required libraries:
Create an OpenAI LLM model:
Define your data path:
Extraction
Extraction is the process of extracting specific information from a large text document. It is usually performed with output parsing, which organizes extracted data in a structured format like a spreadsheet to make it analysis-ready.
This process includes extracting and loading specific information from text into a database or extracting parameters from user queries to make API calls. Kor is an example of an LLM library that lets you extract data from text.
Here is the code that you can execute to create your own extraction application using LangChain.
Import libraries to construct your chat messages:
You can use a chat model like gpt-3.5-turbo:
To parse outputs and get structured data back:
Let’s look at two different approaches to performing extraction.
1. Simple Extraction
In a simple extraction, you must provide a prompt with instructions for the type of output you want.
Make your prompt which combines the instructions with the fruit names:
Call your LLM:
Output:
{'Apple': '🍎', 'Pear': '🍐', 'kiwi': '🥝'}
<class 'str'>
2. LangChain Response Schema
LangChain’s response schema assists you while working with AI by auto-generating an instruction guide for AI. With this feature, you don’t need to worry about the prompt engineering required to get results. LangChain response schema reads the LLM-generated output and turns it into a Python object to work with.
Follow the code below to extract the name of a song and artist from a given user prompt:
Use the code below to get the schema you want:
This parser will look for the LLM output in the schema and return it back to you:
Get the format instructions that LangChain makes:
Create the prompt template that brings it all together:
Output:
{'artist': 'Portugal. The Man', 'song': 'So Young'}
<class 'dict'>
Automated Scientific Literature Review
LangChain provides a procedure for evaluating the accuracy of responses obtained from a summarizer. This use case can help review scientific research papers to determine whether the information received is correct or incorrect. For this LangChain use case, you must first create a summarizer that extracts data from research papers. Then, apply evaluation criteria to gain insights into the results.
Execute the code below to perform an evaluation:
Import embeddings, store, and retrieval libraries:
Import model and doc loader:
Load the essay:
You must create the embeddings and the document search index:
Now, you can start your eval_chain:
The eval_chain will grade itself. The code below helps the eval_chain know where the different parts are:
Output:
[{'text': ' CORRECT'}, {'text': ' INCORRECT'}]
How To Build Your Perfect LangChain Pipeline Using Airbyte?
Training an LLM can be difficult, as it requires a huge amount of data in the right context for appropriate results. If the data you train your LLM on is not of good quality, the results obtained might not be as accurate as you want them to be.
The data you want to train your LLM on might be present in various sources. Integrating this data into a single repository can be beneficial. However, the integration process can become cumbersome, consuming a lot of your time and resources.
This is where you can utilize no-code ELT tools like Airbyte with LangChain and orchestration tools like Dagster to build optimized data pipelines. Let’s consider a real-world example where you extract data from your Salesforce account and use it to train an LLM model to derive sales insights. Before getting started, the first step is to ensure that all the necessary libraries are properly installed. Follow the code below to do so:
Step 1: Extract Data Using Airbyte
- Log in to your Airbyte account. On the left panel of the page, click on Sources.
- Search for Salesforce in the Search connector box on the Set up a new source page. Then, select the available Salesforce option.
- On the next page, authenticate your Salesforce account to configure it as a source. Click on Set up source.
- After configuring Salesforce as a source, click on the Destinations tab on the left panel. In the Destination search bar, search for JSON and select the Local JSON option.
- Mention the path from which Dagster will be able to access the data.
Step 2: Configure Dagster Pipeline
- Create a new “ingest.py” file to configure software-defined assets for Dagster.
- Fetch existing connections from the Airbyte instance using the load_assets_from_airbyte_instance function. Use AirbyteResource with host and port information to define the Airbyte instance.
- Load the raw JSONL files from Airbyte to the LangChain document using the AirbyteJSONLoader.
- You must set stream_name to the specific stream of records in Airbyte that you want to make accessible to the LLM:
- Use the RecursiveCharacterTextSplitter to split the documents into chunks to fit into the LLM.
- Now, you can generate embeddings for the document and save the vectorstore content to a file.
- Define how to manage IO and export the asset definitions for Dagster.
Step 3: Load Your Data
- Set the OpenAI API key.
- Launch Dagster by executing the code below.
- To interact with your pipeline and manage assets, navigate to Dagster at http://127.0.0.1:3000/asset-groups.
- You can either click the Materialize button on the Dagster UI to materialize all the assets or execute the code below on the command line. This step will run all the tasks mentioned above, from extracting Salesforce data into JSON files to creating a local vector database vectorstore.pkl file.
Step 4: Create an Application with LangChain
After creating a Dagster pipeline, you can use the stored embeddings with LangChain to develop a question-answering (QA) application.
- Create a new Python file query.py.
- Open the vectorstore.pkl file to load the embeddings into your script.
- Initialize OpenAI LLM and RetrievalQA and use local_vectorstore to retrieve relevant documents.
- Implement a continuous QA loop with a prompt, enabling the user to ask questions.
- Run the QA bot using the code below.
During this procedure, the LLM receives questions from the user. It queries the vector store based on the specific task, and the question embeddings are compared to the stored embeddings. The closest matches are identified, and the LLM tries to formulate an answer based on the match.
Key Takeaways
LangChain is a robust framework that lets you design complex LLM applications with advantages in a wide range of domains. Some of the most common LangChain use cases include building a chatbot, text summarizer, and AI agent.
It can also be a helpful component that enables you to interact with APIs in natural language, understand code, and query data for decision-making. Knowing how to build LLM applications from scratch can provide you with a thorough understanding of how to leverage their potential.
FAQs
What Are the Key LangChain Use Cases?
There are multiple LangChain use cases that can benefit you. It can enable you to interact with APIs, understand complex code, query and extract data, perform automated reviews, and build AI applications like chatbots.
What Is RAG?
RAG, or Retrieval Augmented Generation, is a technique that enhances LLMs' performance by feeding them relevant information. It augments the LLM model's knowledge with domain-specific information, enabling the LLM to produce better responses.