Perform RAG with Vectara

Learn how to use data stored in Airbyte's Vectara destination to perform RAG.

Should you build or buy your data pipelines?

Download our free guide and discover the best approach for your needs, whether it's building your ELT solution in-house or opting for Airbyte Open Source or Airbyte Cloud.

Download now

This tutorial showcases how to use data stored in Airbyte's Vectara destination to perform Retrieval-Augmented Generation (RAG).

Prerequisites

1) OpenAI API Key:

  • Create an OpenAI Account: Sign up for an account on OpenAI.
  • Generate an API Key: Go to the API section and generate a new API key. For detailed instructions, refer to the OpenAI documentation.

2) Vectara Customer ID, Corpus ID ,API Key:

  • Create an Vectara Account: Sign up for an account on Vectara.
  • Customer ID: Click on the profile icon on top right, and look for your customer ID Vectara Console.
  • Corpus ID: You can see the list of Corpora you've created in your Vectara Account. Note down the required Corpus ID Vectara Corpora.
  • Generate an API Key: Go here and generate a new API key. Vectara API_Key.

Install Dependencies

As in any other Python Code, the first step is to install the required Dependencies!

# Add virtual environment support in Google Colab
!apt-get install -qq python3.10-venv

# Install required packages
%pip install --quiet openai langchain-openai tiktoken pandas langchain_community

Set Up Environment Variables

We configure the required credentials here!

import os

os.environ["VECTARA_CUSTOMER_ID"] = "YOUR_VECTARA_CUSTOMER_ID"
os.environ["VECTARA_CORPUS_ID"] = "YOUR_VECTARA_CORPUS_ID"
os.environ["VECTARA_API_KEY"] = "YOUR_VECTARA_API_KEY"
os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"

Initialize Vectara Vector Store

We start by initializing the Vectara Vector Store and then see what the data in Vectara looks like.

import pandas as pd
from langchain_community.vectorstores import Vectara
from google.colab import userdata

# Initialize Vectara vector store
vectara = Vectara(
    vectara_customer_id=os.getenv("VECTARA_CUSTOMER_ID"),
    vectara_corpus_id=os.getenv("VECTARA_CORPUS_ID"),
    vectara_api_key=os.getenv("VECTARA_API_KEY")
)

def fetch_vectara_data():
    # Simulate fetching data from Vectara
    data = {
        "document_id": [1, 2, 3],
        "document_content": ["Content of doc 1", "Content of doc 2", "Content of doc 3"],
        "metadata": ["Metadata1", "Metadata2", "Metadata3"],
        "embedding": ["[0.1, 0.2, ...]", "[0.3, 0.4, ...]", "[0.5, 0.6, ...]"]
    }
    df = pd.DataFrame(data)
    return df

# show data
data_frame = fetch_vectara_data()
print(data_frame)

Embedding and similarity search with Vectara

Here we will convert the user's query into embeddings using OpenAI and retrieve similar chunks from Vectara based on the query

from openai import OpenAI
from langchain_openai import OpenAIEmbeddings
from typing import List
from rich.console import Console


client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# Convert user's query into a vector array to prep for similarity search
def get_embedding_from_openai(query) -> List[float]:
    print(f"Embedding user's query -> {query}...")
    embeddings = OpenAIEmbeddings(openai_api_key=client.api_key)
    return embeddings.embed_query(query)

# Use Vectara to find matching chunks
def get_similar_chunks_from_vectara(query: str) -> List[str]:
    print("\nRetrieving similar chunks...")
    try:
        results = vectara.similarity_search(query=query)
        chunks = [result.page_content for result in results]
        print(f"Found {len(chunks)} matching chunks!")
        return chunks
    except Exception as e:
        print(f"Error in retrieving chunks: {e}")
        return []

Building RAG Pipeline and asking a question

Finally we use OpenAI for querying our data!
We know the three main steps of a RAG Pipeline are :

  • Embedding incoming query
  • Doing similarity search to find matching chunks
  • Send chunks to LLM for completion
# Use OpenAI to complete the response
def get_completion_from_openai(question, document_chunks: List[str], model_name="gpt-3.5-turbo"):
    print(f"\nSending chunks to OpenAI (LLM: {model_name}) for completion...")
    chunks = "\n\n".join(document_chunks)

    response = client.chat.completions.create(
    model=model_name,
    messages=[
        {"role": "system", "content": "You are an Airbyte product assistant. Answer the question based on the context. Do not use any other information. Be concise."},
        {"role": "user", "content": f"Context:\n{chunks}\n\n{question}\n\nAnswer:"}
    ],
    max_tokens=150
    )
    return response.choices[0].message.content.strip()

# Putting it all together
def get_response(query, model_name="gpt-3.5-turbo"):

    chunks = get_similar_chunks_from_vectara(query)

    if len(chunks) == 0:
        return "I am sorry, I do not have the context to answer your question."
    else:

        return get_completion_from_openai(query, chunks, model_name)

# Ask a question
query = 'What data do you have?'
response = get_response(query)

Console().print(f"\n\nResponse from LLM:\n\n[blue]{response}[/blue]")

Should you build or buy your data pipelines?

Download our free guide and discover the best approach for your needs, whether it's building your ELT solution in-house or opting for Airbyte Open Source or Airbyte Cloud.

Download now

Similar use cases

No similar recipes were found, but check back soon!