This tutorial is the first part of the Series “Building an Open Data Stack”, where we explore different use cases with open-source tools and open standards. We will implement an Airbyte monitoring dashboard with dbt and Metabase on a locally deployed Airbyte instance. It will essentially allow you to get an operational view of your current running syncs and a high-level overview of what happened in the past.
Today you have several options to either monitor via Airbyte UI by clicking through the syncs and logs or set up a more sophisticated system by channeling metrics events through Prometheus. Our dashboard approach is interesting, as you can see everything at one glance, add filters and drill-downs, or correlate with other data you might have.
ℹ️All code shown here is open on GitHub in the Open Data Stack Repo. Feel free to fork and play around with it and create issues or PRs for improvements 🙂.
📝 All Metrics are implemented with dbt views; therefore, you can put any BI tool on top. We use Metabase, but I also tried Superset and Rill Data.
🚨This tutorial is not meant to use in production as we query the production database with Airbyte. But it could be if you replicate Airbyte tables to a read-only schema or database. You could also copy the data to BigQuery/Snowflake and run dbt on that data. Or follow the event-based approach mentioned exposing these metrics.
You probably use Airbyte already, and thus the interest in monitoring. If so, please jump to step 3, where we create the needed views and dashboard. If you have yet to use Airbyte and would like to follow along, continue with the next two chapters.
To understand how Airbyte logs data in Postgres, let’s look at what tables are available in OSS. We’ll focus on these six:
For more information or the complete list, check out the docs.
Next, let’s clone the Airbyte repo and add one small thing to make the Postgres database of Airbyte available outside of docker to access with dbt and Metabase. The container is called <span class="text-style-code">airbyte-db</span>.
Now let’s edit <span class="text-style-code">docker-compose.yaml</span> and add these two lines below:
After we added the two lines above, we started up Airbyte as we would generally with:
If everything worked fine, Airbyte should be running on localhost:8000, and you should see with <span class="text-style-code">docker ps</span> that the port <span class="text-style-code">5435</span> is forwarded like this:
If you do not have any source you'd like to sync, you can quickly add some demo connectors to check the monitoring dashboard with <span class="text-style-code">source-connector=Faker</span> where you can specify the number of rows with count.
As a destination, you can use a Postgres database. If you haven’t one running locally, you can start up a brand new one with:
See more details and step-by-step guides on Replicate Data Between PostgreSQL Databases
Make sure you create a connection with Airbyte (again, check the above tutorial when in doubt), and run some syncs so you have some historical runs that will show up in the dashboard.
After we have set up Airbyte and started some syncs to have some data to visualize, we can now create views on top of the Postgres tables that get updated on each Airbyte sync, including configs you define in the UI. These tables are JSON blobs, and it can be challenging to unnest to a tabular table format. That's what the dbt views are doing, plus adding some metrics to visualize.
📝 Performance: Everything is based on views that allow you to see everything in real-time. The disadvantage is that it will get slower the more data you query. If your dashboards get too slow, you have two options: filtering down to reduce the amount of data queried or toggling the <span class="text-style-code">materialize=view</span> to <span class="text-style-code">table</span>. Changing to table materialization would require a <span class="text-style-code">dbt run</span> each time you want new data, but the queries would be pre-aggregated and faster.
Suppose you followed along with the port forwarding from step 1 above in <span class="text-style-code">docker-compose.yaml</span>, we should be able to connect to the Airbyte Postgres database with the following connection (default password, if not changed, is "docker"):
And see the following tables of Airbyte:
Next, we want to create the views on top of it to convert the JSON fields into queryable tables. For that, we need to install the dbt dependencies and then run <span class="text-style-code">dbt run</span> to create them.
Running and configure above profile automatically up by changing to directory <span class="text-style-code">transformation_dbt</span> and simply run (more details on dbt_readme):
This will essentially create the virtual env in a folder called <span class="text-style-code">.venv</span>, activate that virtual environment, set up the dbt_profile and run dbt to create the views needed.
Once we have set up the dependencies above, we can check the views on top of the Airbyte database which should now be created in schemas <span class="text-style-code">monitoring_core</span> and <span class="text-style-code">monitoring_mart</span>:
If you followed the above steps and you didn’t make any changes to the dbt models, this step is not needed.
In case you would like to change your dbt models to add columns or certain measures, you can run dbt manually with <span class="text-style-code">dbt run</span> (make sure you activated the virtual environment with <span class="text-style-code">poetry shell</span>). If everything works out, you should see something like this:
The next step is to start Metabase and open the predefined dashboard. This dashboard contains measures such as <span class="text-style-code">success_rate, attempt_duration, volume_rows</span> and <span class="text-style-code">volume_mb</span>.
To start up, you must download the metabase.jar here and place it into <span class="text-style-code">visualization/metabase/metabase.jar</span>. When done, run the following commands to start Metabase:
Metabase will run on localhost:3000. As you run from the existing Metabase database you might be asked to log in. Please use the username <span class="text-style-code">email@example.com</span> and password <span class="text-style-code">airbyte123</span> if so. If you want to know more details about Metabase, check out their docs.
The opened Airbyte Monitoring Dashboard should look something like this:
If you trigger a new run and refresh the Metabase dashboard, you'll see live data with one sync with status <span class="text-style-code">running</span>.
As we created views inside the Postgres database with measures defined, we can exchange the BI tool with others or integrate it with your notebook or data apps.
In the end, it was more complex than we needed to copy data out of Postgres (which defeats the purpose of live data but is still interesting to try). If you are interested, I added some descriptions with some example data to play with on the GitHub repo.
ℹ️ About Rill Data: Rill makes creating and consuming metrics easy by combining a SQL-based data modeler, real-time database, and metrics dashboard into a single product—a simple alternative to complex BI stacks.
I hope this tutorial is helpful to you with either lot of syncs to have a historical overview or an operational dashboard of what's currently going on. This tutorial is to get you started. It's not meant to be complete by any means. There are many angles to build on top of and customize your specific reporting needs.
One example is adding more detailed error messages about why a job failed if that is important. Furthermore, check the columns on the current views for more available data you can report; some more columns are there and can be added to Metabase or your BI tool of choice.
Finally, feel free to add new dbt views with a PR or comment on what to add; happy to chat on Slack or GitHub. As part of the series, the next step might be to add a Semantic Layer to define all Metrics once in a descriptive manner outside of Metabase.
Get all your ELT data pipelines running in minutes with Airbyte.