Ink-credible Data People: Airbyte OSS Maintainer Yiyang Li

Kicking off December, we're bringing you our next ink-credible data person: Yiyang Li, an Airbyte open source contributor and maintainer based out of Seattle, Washington. Yiyang is a self described a tech enthusiast who has spent his career developing enterprise SaaS products and has recently grown a love for open source projects and the community around them. Keep reading for some tips on submitting your first PR, the difference between being a contributor versus a maintainer and his fav Airbyte Slack channel.

PS, Yiyang also participated in Hacktoberfest this year. Check out the results.

How/when did you first discover Airbyte? What was your Aha moment?

My team built a powerful analytics tool that fetches data on behalf of our customers. However, every integration to the external data source has a different code base so we had to reinvent the wheel for each new source. I was asked to build a new platform that contains the shared modules so that we could lower the development cost. 

We realized that Airbyte offers almost all the sources we have supported and plan to support. It’s an open source project, and the license is friendly for us to host a private version. All we need is to develop the destination connector so that Airbyte can load data for us. Easy! 

What is your top tip for using Airbyte that others may not know?

The Airbyte platform is adaptive, powerful and flexible. You can mix and match the dependent services like database and logging. Consequently, the deployment configuration can be complex, and it can get more complex as more underlying services can be swapped. Given this, I would suggest to use the default setup and run it in your local docker, and if you like the platform, start a free trial with Airbyte Cloud. You may even realize that the pricing model is pretty reasonable. 

Favorite Airbyte slack channel: 

#shameless-plugs I am new to the data engineering ecosystem, I learned a lot from the random podcasts, mini tech talks and blogs that people share in that channel.

We can’t thank you enough for your contributions and now your work as a maintainer! Can you tell us any learnings you’ve had along the way? 

I appreciate working with other community members and collaborating on my work with the Okta source connector. I worked for Okta for almost 6 years, and developed a few public APIs. In order to learn Airbyte, I chose to enhance the Okta source connector and fetch data from the API I am most familiar with. I added data streams related to roles and permissions. At the same time, I noticed that the Okta connector is about to GA in Airbyte and I was happy to assist with the GA journey which is part of the reason I got involved with the program. I’ve enjoyed working with some contributors making improvements to my implementation and adding missing unit tests. Although I didn’t talk to them, I learned quite a lot from reading their codes and reflecting on their suggestions. My code reviewers were always supportive, and they even helped set up the Okta account so that the integration tests ran as expected. 

Why did you decide to contribute and then apply to become a maintainer? 

Initially I was excited about the bounty system, to contribute to an open source project and get compensation. Now, as I’ve gone through on-boarding, I realized I’m getting much more value than just the financial incentives. I’ve learned a lot as a maintainer and I have appreciated being a part of the community and interacting with other contributors. 

How is being a contributor differ from being a maintainer? What do you like about being a maintainer? 

Being a maintainer is a bigger commitment and has its own rewards. When you are assigned a new pull request to review, you have to read the API doc, set up the integration test and understand the existing codebase. It’s more of a learning curve to overcome than if I was assigned a bug fix or feature request. It’s more work, but you really learn a lot because you need to understand the deployment cycle, including the test setup, code standard, and the CI tools. 

Tip for first time contributors? 

I only contributed to airbyte connectors. So, I can only speak this part. 

Everything is based on a schema. So, be patient to learn schema and be prepared to read the documentation again and again. I feel like there is a three level learning curve. So hang in there and be patient.

  • Level 1 is the schema language – JSON or YAML. 
  • Level 2 is the connectors platform, you will learn how all types of connectors are defined by a single type – AirbyteCatalog and AirbyteConfiguredCatalog. 
  • Level 3 is the data in the schema, which is related to a specific connector (e.g. github) you plan to develop or enhance. 

Once you’ve completed the three levels, you will see the magic – everything just works and connects. The UI renders correctly, the job runs, and the right amount of logs are produced after that. Try to be patient, you will get there.

Tip for those looking to become a maintainer? 

Give Specific Feedback

Err on the side of being verbose and provide detailed feedback. If you want to promote a suggestion or alternative solution, you may explain the issues behind it first. Since the codebase is huge, you might link to other pieces of code that follow the pattern. 

Be kind and empathetic

Don’t make assumptions about why the author took a particular approach. Airbyte is powerful and thus complex, they might not have the context you have, or overcome the learning curve you did. 

Trust and be grateful to your mentor at Airbyte

Rest assured, they will pay you after the PR is merged. If you are stuck, always ask your mentor, they are very supportive. In fact, the whole airbyte community is nice, as long as you want to learn, don’t feel ashamed to ask. 

A few fun facts:

Most used emoji 💪 

Spring or Fall? Spring

Favorite account to follow for data engineering info? Seattle Data Guy (PS Seattle Data Guy ranked #1 newsletter in the Airbyte Community Survey this year!)

Ready to unlock all your data with the power of 300+ connectors?

Open-source data integration

Get all your ELT data pipelines running in minutes with Airbyte.