Article

AI Prompt Best Practices with Airbyte & Cursor

•

December 9, 2024

•

10 min read

Summarize this article with:

AI development tools are a game changer for developer productivity. It’s incredibly difficult to keep up to date with all the new tools, agents, and models that are coming out almost daily. For the past month or so, I’ve been building a lot of scripts, apps, and code using Cursor. When used right, Cursor can save you hours or days in development and debugging effort. When used wrong, Cursor can send you down a rabbit hole of inefficient, or downright wrong, code. I wanted to share some learnings on how best to use Cursor for data apps.

If you are not familiar with Cursor, it is an AI enabled IDE. Cursor lets you specify which LLM you want to use to ask questions, or compose your app. Cursor comes with a free version that limits the number of prompts you can ask per month. I’m using the paid plan, but these best practices will work on any version. In fact, they are pretty good tips in general when using AI tools to build apps. To take advtange of all the new features, make sure you have the latest version, 0.43. I’m also using the claude-3.5 model from Anthropic. This seems to have the best knowledge of coding languages and practices. But you can use whatever model you prefer. The tips below are model independent.

Set up your environment

Let’s say I want to build a python data app using Airbyte to move the data. I’m going to use the Faker data source, which is provided OOTB. Faker returns sample data for e-commerce transactions: products, purchases, users, etc. I also know that I want to use PyAirbyte as the library. Go ahead and create a local directory on your machine called python-cursor and set up a virtual environment


python3.11 -m venv venv

Now, open the airbyte-cursor folder in Cursor, and tap cmd-I on a mac to open Composer. Composer is where you will create prompts and compose your app. Finally, activate your virtual environment but entering the following command in the terminal.


source venv/bin/activate

‍

Best Practice 1: Reference documentation

Cursor allows you @mention different types of artifacts such as code, GitHub, folders, etc. I’ll jump more into how you can utilize these shortly, but always start a prompt by giving it as much context as you can. I’ve found pointing Cursor to the relevant docs is critical for it to know which library or framework you want to use. I will typically add documentation via the IDE settings, instead of Composer, just so I know it’s always available for me to add context when I need it. Since I am using PyAirbyte, I will add the docs homepage, https://airbytehq.github.io/PyAirbyte/airbyte.html, via Cursor > Settings > Cursor Settings > Features.

Now, in any prompt, I can tap @ > Documentation > PyAirbyte and know it has the right context.

Create the prompt

Now it is time to create the prompt to build our app. At this point it’s important to acknowledge that any AI tool is an assistant, not a developer. The more specific information you provide in your prompt, especially when starting from scratch, the better your results will be. This may change with new models like o1 from OpenAI as they begin to get closer to agentic AI which can infer your prompt goals, but we are not quite there yet. The more specific you make your prompt, the better. Here is the prompt I am going to use.

Create a python app that connects to my cloud workspace. Get the source called “source-faker” and read the stream called “users”. Use Streamlit as the UI. Authenticate to Airbyte using bearer auth and use python-dotenv to store all credentials. @pyairbyte

Best Practice 2: Airbyte AI prompt template

Let’s break down the structure of the prompt. This can be your typical Airbyte cursor prompt pattern:

[use Airbyte self-managed or cloud] [select the connector (include prefix source- or destination- in the name)] [tell it the streams you are interested in] [tell it what you want to do with the connection] [tell it what you want to do with the results] [set up auth] [set up coding preferences] [link docs to avoid hallucinations]

As you can see, I already know quite a bit how to develop with Airbyte. I’ve told the prompt I want cloud, what source connector and data stream I’m interested in, what UI framework I want to use and some other developer preferences like authentication and how to handle API keys. Finally, I’ve given the prompt the PyAirbyte docs for context. The more information and structure you provide, the more accurate your results will be.

Best Practice 3: Always add a connector prefix

Airbyte automatically adds source- or destination- to the name of your connectors. When using AI, make sure you add this prefix in the name of the connector you want to access.

Go ahead and run the prompt.

Cursor does an amazing job of generating all the code you need. Go ahead and accept the code.

Best Practice 4: Have familiarity with the API or framework you are using

Cursor, and most AI models scraps information from the web. In this example, it is looking at the PyAirbyte docs as well as the GitHub repository of the code. One of the challenges is that this isn’t always the same thing as the library you are using. For example, there is an open PR in the PyAirbyte repository to merge a new func list_connections(). Currently this is a private function. When I first started working with AI prompts, Cursor would generate code that, per the repo is correct, but is incorrect in the actual library. This is where you need to still be the developer, and use AI as your assistant. I am sure this will change over time, but right now, I encourage you to have some familiarity with the API or framework you are working on to avoid spending time fixing errors such as the one below:

Getting back to our app, Cursor composer does an amazing job of telling you everything you need to get your dependencies set up and ready. It’s told me which libraries I need to install, and how to do it, and how to run the Streamlit app when I am ready.

Obtain Airbyte keys

Cursor has also gone and created a .env file for storing my API keys, based on my prompt to tell it to use the dotenv Python library.

To obtain these keys, log into Airbyte and get your access token and add it to AIRBYTE_CLOUD_API_KEY.

Then, get your workspace id from the URL and add it to the AIRBYTE_WORKSPACE_ID

Run your app

You are ready to run your app. In the terminal, type:


streamlit run app.py

There it is! An Airbyte data app generated entirely by Cursor. Cool, huh…..well, most of the time. The reality is that even using the same prompt, there have been times where the code requires some work. This is where treating AI tools as assistants is critical, and Pro Tip #3 will help you quickly debug any errors.

Additional Prompts: Add visualizations

An important part of working with AI is understanding that you will be using many prompts in your development process. Once you have the initial app up and running, you may need to debug issues, add additional functionality etc. Cursor does a really great job of this, especially with it’s recent 0.43 release. I can add additional prompts and see via a diff the code it is changing. I can also allow it to make changes to a specific file only, using the @code context, or allow it to update the entire codebase using @codebase.

In our first prompt, we just wanted to return the results and display in a table. Now, let’s add a new prompt to update the results to visualize the results. Here’s the prompt:

Take the results and create a bar chart and plot ages. Use Streamlit’s in-build bar chart libraries.

Running the prompt, I can instantly see what’s changed. I can accept or decline the updates, and even rollback to previous versions.

With my changes accepted, running the app, gives me my completed app.

There you have it. My completed app and some pro tips for using prompts in Cursor to generate apps with Airbyte. Perhaps the most important aspect of working with these new AI tools is understanding how to correctly structure your prompts. The prompt template included in this blog should save you a significant amount of time on your next project. I went through about a dozen tweaks and variations until I could get it to reliably generate code that didn’t really significant debugging and refactoring.

‍

Limitless data movement with free Alpha and Beta connectors

Introducing: our Free Connector Program ->

Try the Agent Engine

We're building the future of agent data infrastructure. Be amongst the first to explore our new platform and get access to our latest features.

Try it free Talk to sales

The data movement infrastructure for the modern data teams.

Try a 30-day free trial

About the Author

Alex Cuoci is a product lead at Airbyte focused on launching the context platform for software engineers building AI agents. Previously, Alex led product for Airbyte's offerings for data engineering teams. Before joining Airbyte, Alex was a product manager at Datadog, where he shipped observability products for asynchronous and serverless applications.

‍