End-to-end RAG using Salesforce, Airbyte Cloud and Weaviate

Join our newsletter to get all the insights on the data stack

In this tutorial post, we will guide you through establishing a complete Retrieval-Augmented Generation (RAG) pipeline using Airbyte Cloud, Salesforce, and Weaviate.

We will demonstrate how to seamlessly import vector data into Weaviate via an Airbyte connection and then utilize OpenAI for Retrieval-Augmented Generation (RAG).

Prerequisites

Airbyte Cloud Account : If you are new to Airbyte Cloud ,Sign up here
Salesforce Account : Have your Salesforce Account with Enterprise access signed in , on your browser.
Weaviate Cluster : Have your cluster's URL and API Key ready! You can find it here. Click on the drop down button on your cluster and get the details.
OpenAI API Key:
- Create an OpenAI Account: Sign up for an account on OpenAI.
- Generate an API Key: Go to the API section and generate a new API key. For detailed instructions, refer to the OpenAI documentation.

Setup the Salesforce Source

To setup the source Salesforce in Airbyte Cloud, follow these steps :

In the Left Sidebar, Click on Sources

On Top Right Side, Click on + New source

Now Search for salesforce, and finally select salesforce

Enter a source Name: Enter a name which will help you identify this source ,later on!

Click on "Authenticate your Salesforce account" .(You will need to login , in case you have not yet!)

Finally click on setup the source on bottom right of the screen!

For a more detailed guide visit here

Set up the Weaviate destination

Follow these steps: In the Left Sidebar: Click on Destinations

On Top Right Side: Click on + New destination

Now search for Weaviate and finally select it

Start Configuring the Weaviate destination in Airbyte:

Destination name: Provide a friendly name.
Embeddings: Choose OpenAI and enter your API key.
Indexing: In Public Endpoint : Enter your cluster's REST Endpoint URL given on the homepage. In API Token enter your API Key

To get a more detailed overview of Vecatara destination, visit this

Set up the connection

In the Left Sidebar: Click on Connections->click on new connection -> Select S3 Source->

On Top Right Side: Click on + New connection

Define Source : Select Salesforce

Define Destination : Select Weaviate

Select streams : Now you will be able to see all streams available in Salesforce , Activate the streams you want and click next on the bottom right conner

Now select schedule of jobs and click setup the connection.

Now we can successfully sync data from S3 to Weaviate

Retrieval-Augmented Generation (RAG) with Weaviate

RAG elevates language models by extracting pertinent information from a database, enabling them to generate precise and contextually rich responses. In this section, we'll walk you through the process of setting up RAG with Weaviate.

For your convenience and quick reference, we have provided a Google Colab notebook. Feel free to explore and experiment with the fully operational RAG code in Google Colab .

You can change the collection and property according to your own needs!

collection="Lead"
property="name"
def get_similar_chunks_from_weaviate(query: str) -> List[str]:
    try:
        embedding = get_embedding_from_openai(query)
        near_vector = {
            "vector": embedding
        }
        result = weaviate_client.query.get(collection, [property]).with_near_vector(near_vector).do()

        if 'data' in result and 'Get' in result['data'] and collection in result['data']['Get']:
            chunks = [res[property] for res in result['data']['Get'][collection]]
            return chunks
        else:
            print("Unexpected result format:", result, flush=True)
            return []
    except Exception as e:
        print(f"Error during Weaviate query: {e}", flush=True)
        return []

The get_response function is designed to handle a user's query, search for relevant document segments in Weaviate, and produce an accurate contextual response using OpenAI's language model.

Essentially, this function seamlessly combines querying Weaviate for pertinent data and utilising OpenAI to generate a coherent, contextually appropriate answer based on that data.

query = 'How many lead work in BNY?'
response = get_response(query)

print(f"\n\nResponse from LLM:\n\n{response}", flush=True)

Conclusion

In this tutorial, we illustrated how to harness Weaviate and OpenAI for Retrieval-Augmented Generation (RAG), demonstrating the seamless integration of data from Weaviate and the power of OpenAI's language models.

This dynamic duo allows you to build intelligent AI-driven applications, such as chatbots, capable of tackling complex questions with ease.

Weaviate takes the hassle out of managing and retrieving vector data, making it an indispensable tool for efficient and scalable data integration.

This, in turn, supercharges your AI solutions, enabling them to deliver top-notch, context-aware responses based on thorough data analysis.

About the Author

Should you build or buy your data pipelines?

Download our free guide and discover the best approach for your needs, whether it's building your ELT solution in-house or opting for Airbyte Open Source or Airbyte Cloud.

Download now

Join our newsletter to get all the insights on the data stack

Should you build or buy your data pipelines?

About the Author

About the Author

Join our newsletter to get all the insights on the data stack

Prerequisites

Setup the Salesforce Source

Set up the Weaviate destination

Set up the connection

Retrieval-Augmented Generation (RAG) with Weaviate

Conclusion

About the Author

About the Author

Should you build or buy your data pipelines?

Similar use cases