The langchain-airbyte package integrates LangChain with Airbyte. It has a very powerful function AirbyteLoader which can be used to load data as document into langchain from any Airbyte source.
Download our free guide and discover the best approach for your needs, whether it's building your ELT solution in-house or opting for Airbyte Open Source or Airbyte Cloud.
The langchain-airbyte
package integrates LangChain with Airbyte.
It has a very powerful function AirbyteLoader
which can be used to load data as document into langchain from any Airbyte source!
This notebook demonstrates the usage of langchain_airbyte
to load data from an Airbyte source (Github Repository) , store the data into a vector database, and perform a basic QnA on that data using FAISS and OpenAI embeddings.
1) OpenAI API Key:
2) Github Personal Access Token:
Lets start by installing all the required dependencies!
First of all we will create a virtual environment and then begin installing the dependencies.
Now we use AirbyteLoader
to fetch data from the source source-github
.
You may use any other source you require, but fetch the data accordingly!
Dont forget to add all the required fields!
Refer the guide for your source here
For more information regarding this package refer
The last step of converting data to documents ensures that the raw data (GitHub commits) is converted into a standardized format that includes both the main content and any associated metadata.
Large documents are split into smaller chunks to make them easier to handle. This also helps in improving the efficiency of the retrieval process, as smaller chunks can be more relevant to specific queries.
The chunks of documents are transformed into vectors using an embedding model (OpenAI embeddings).
These vectors are then stored in a FAISS vector store, which allows for efficient similarity search.
The vector store indexes the vectors and enables fast retrieval of similar vectors based on a query.
Finally we perform the Question And Answer here.
When a query is made, the vector store retrieves relevant document chunks based on their vector similarity to the query. The language model (OpenAI) then generates answers based on the retrieved chunks.
Download our free guide and discover the best approach for your needs, whether it's building your ELT solution in-house or opting for Airbyte Open Source or Airbyte Cloud.