In this article, we will be leveraging Airbyte - an open-source data integration platform and Metabase - an open-source open source way for everyone in your company to ask questions and learn from data - to build the GitHub activity dashboard above.
Airbyte provides us with a rich set of source connectors, and one of those is the GitHub connector which allows us to get data off a GitHub repo. We are going to use this connector to get the data of the Airbyte repo and copy them into a Postgres database destination. We will then connect this database to Metabase in order to create the activity dashboard.
If you prefer to follow along with a video, we have a walkthrough of this recipe on our YouTube channel. Watch that here!
What we will need:
You can skip this step if you already have Airbyte on your machine.
To set up Airbyte on your machine, make sure you have Docker and Docker compose set-up, as well as git. Then, open a terminal, and go to a location you want to download Airbyte in and run:
You will need to go into the cloned airbyte repo by running cd airbyte and then you run:
Or if you are on the newest version of the Docker CLI you can run:
The above command will create and start the Airbyte containers. After it's done, you can access Airbyte at http://localhost:8000/ (you can go ahead and set up your preference, then leave the Airbyte web app open as we will come back to it shortly)
This database will be the destination for the data coming in from GitHub. To set this up, we will be running a postgres container via docker like so:
If you are running the above command for the first time, it will download the Postgres image from Docker Hub registry and then run it as a container with the name of github-destination.
We are also setting the database password by passing an environment variable of POSTGRES_PASSWORD to a value of password.
We are also binding exposing the container’s 5432 port to our host machine on port 3003. Finally, we are running the container in the background with the -d flag.
Now that we have set up the destination database, let's head over to the Airbyte web app and create a connection from the Airbyte GitHub source to our Postgres database.
Back in the Airbyte web app in your browser, click on the new source button in the top right corner of the app to go to the page to add a new Airbyte source.
Enter the name github-source as the source name and click the drop down and select Github connector as source type. After selecting the GitHub source type, you will be presented with two text boxes.The first is to enter a repository you want. In this box, type in airbytehq/airbyte, and then, in the second box, you will need to provide a GitHub access token which you can obtain from here.
Make sure you grant the token the repo and write:discussion permissions. After you've filled all fields, hit the set up source button.
If the setup was successful, you will be taken to the destination screen where you will add a new destination.
Click the add destination button, and, in the drop down that follows, click add new destination. Then, you will see a page to add the destination name. Type in the name we gave the Postgres container we created earlier (github-destination), and then choose Postgres as the destination type.
After, you will be presented with some text boxes to enter the database connection details. Enter the values for the Postgres container we created earlier:
Then click on the basic normalization toggle button to check it on as we want Airbyte to normalize the data coming in from GitHub. Overall the UI should look like this:
Then click on the Set up destination button. If your credentials are all good for the database, the postgres destination would have been set, and now you will need to make the connection from the source (GitHub) to the destination (Postgres).
You should check the boxes that are checked in the screenshot below, and then choose how often Airbyte will attempt to replicate data to be every hour in the Sync frequency drop down. Then, click on the Set up connection button.
You will be taken to the source page, click on the source and you will see the status of your sync. The sync should take a while to complete.
When done, you will see the status change from running to Succeeded. Also, the byte count will be referring to the byte of data Airbyte has pulled from GitHub into your Postgres database.
That wasn’t a lot of work, was it? You can pat yourself in the back as you just synced data from GitHub to a Postgres database.
Let's move on to connecting that database to Metabase, so we can start creating our dashboard.
Metabase is an open source analytic tool that you can get started working with in a couple of ways; using the .jar file, installing it as a Mac app or using a Docker container.
For this tutorial we will be using the Mac app(you can alternatively look up the installation that suits you here). To do this, simply run:
If you are on a Mac, you can download the Metabase app from here. After download, launch the app and complete registration for an account and you will see the Metabase dashboard:
In order to set up our database, we will click on the cog(settings) icon on the menu bar of the Metabase app and choose the admin option to be taken to the admin view where we can add a database:
Click on the add a database button and you will be presented with a form where you should select PostgreSQL as your database of choice and then fill out the connection parameters which will be the one for the PostgreSQL database we created to hold the data from GitHub
Fill out the details to match the credentials of the PostgreSQL database we created earlier.
Afterwards, hit save when you are done entering the database credentials and your database would have been fully loaded onto Metabase ready for us to start creating our dashboard.
Metabase is based on asking questions on Data, so to build our GitHub Dashboard we will be asking the following questions:
Note: We mostly need to work with datetime type in order to find the evolution metrics for our dashboard, so if you noticed that a field you need is not in the appropriate type like starred_at being a VARCHAR type instead of data time, you can go to the Admin view in Metabase and click on the Data Model option and tell metabase to cast that field
So the general workflow we will be using with Metabase is to click on the Ask Question button on top of the menu bar. Then pick the type of question you want to ask. We will start off with a simple question:
Then you choose the data you want to work with. If everything is configured right, a data source called Airbyte_GitHub should show up. Then let's select the table we want to work with, so since for this first question we want the number of stargazers we select the Stargazers table.
You will then see the data in the table chosen. For this first one, we will see the data in the Stargazers table.
To compute the answer to our first question, hit the Summarize button and group by Starred At and summarize by Count and then you’d have your very first question answered: the number of new stargazers in a day.
Hit the blue Save button in the top right to save the question and make sure you are visualizing in the line mode. When saved, Metabase will prompt you to add it to a Dashboard, which is exactly what we want!
Click on create a new dashboard, Name your Dashboard and hit create. Then you can size your dashboard card to your suiting, hit save and we have our first question answered!
For the rest of the questions we need to ask, the steps are pretty similar so we will just mention the table we are using and show the visualization as well as the settings we used in arriving there.
Table: Pull Requests
We will be filtering with the Author Association field to filter out any comment from collaborators.
In this article, we have worked through getting data from a GitHub repo using the GitHub Airbyte connector and storing that data in a PostgreSQL database. We then set up Metabase and asked questions to visualize the data.
Here is the finished GitHub dashboard with our visualizations on Metabase:
Here are other questions you can get answers to using the same process:
Get all your ELT data pipelines running in minutes with Airbyte.
Learn how to move all your data to a data lake and connect your data lake with the Dremio lakehouse platform.
Learn how to easily replicate your data from MySQL to BigQuery, where it can be combined with data from other sources to get a holistic view of your business.