The Drip | July 2022 Airbyte Product Updates

Hey everyone, welcome to the July edition of the Drip where we take you downstream to cover highlights of our change-log, community and anything Airbyte related.

Airbyte Turns Two

A couple weeks ago, we celebrated Airbyte’s second birthday! In celebration, the team took two days off. We also want to thank the community for being with us these last two years and helping us achieve our vision of solving data integration. Here’s to many more!

Connectors Progress

In July, we had a lot of movement with connectors! GitHub, Google Analytics, Instagram, TikTok Marketing, Bing Ads, and S3 have been moved to General Availability. These are just the highlights but you can take a look at the actual patch notes to see the full list of changes we’ve made!

Octavia CLI Improvements

We’ve made a lot of improvements to the Octavia CLI but here are some notable changes.

You can now set custom HTTP headers on your requests made to the Airbyte server! If you have an instance with basic auth or IAP, this will allow you to now use the CLI which is a significant step to make the CLI compatible with Airbyte Cloud once the public API is released.

Other improvements we’ve made is the ability to enable normalization or the use of custom DBT transformations within your yaml configurations. You also now have the ability to easily switch between different Airbyte instances you may have and deploy the same configuration to multiple instances.

If you haven’t used the CLI just yet, check out the full docs here.

Per-Stream State to OSS

The team has been working hard on getting Per-Stream State into OSS and the time has finally come! Per-Stream state unlocks the ability to sync data only for affected streams, which also allows for a future where incremental sync will just become the norm! We are very excited about this and will continue to expand on. You can check out the progress of this here. Docs are coming soon.

Open Telemetry Metrics

We have added an OpenTelemetry interface to Airbyte OSS so that you can export some built-in metrics when using Airbyte at scale. Some actionable metrics have also been added which you can check out here.

As an example, we can run Airbyte locally along with an Open Telemetry Collector. The Open Telemetry Collector will expose port 4317 to the localhost as the receiving endpoint for metrics to be sent to. You can check out the docs here on how to setup Open Telemetry and start collecting metrics!

Wrap Up

And thats all we have for July’s edition of The Drip. Thank’s for reading through. If you have any questions:

  • Please join our Slack community to talk to us on the Airbyte team as well as other fantastic folks in the community!
  • Also sign up for our Newsletter to keep up with the state of the art in Data Integration and the broader Data Engineering Ecosystem!

Full Patch notes:

New and improved features

  • Source S3: now GA
  • Source Postgres now in beta (#14326)
  • New alpha source: Glassfrog (#13868)
  • New alpha source: Kyriba (#12748)
  • New alpha source: Elasticsearch (#14118)
  • New alpha source: (#13390)

Octavia CLI improvements (full docs):

  • Users should be able to easily switch between Airbyte instances and deploy the same configurations on multiple instances.
  • Users can enable normalization or custom DBT transformation from their yaml configurations.
  • Users can set custom HTTP headers on the requests made to the Airbyte server. It allows users with instances secured with basic auth or IAP to use the CLI, and it’s a significant step to make the CLI compatible with Airbyte Cloud once we release the public API.
  • Users can import existing remote resources to a local octavia project with octavia import
  • Users can get existing resources (#13254)
  • Users can retrieve the JSON configuration from a remote resource using octavia get , this can be useful for some scripting / orchestration use cases.

Core Airbyte features:

  • Per Stream State
  • Per-Stream state new flow (#14634)
  • Release per stream to the OSS project (#15008)
  • Display new per-stream and global state to users when viewing connection settings (#15020)
  • Base Normalization: handle airbyte_type from stream schema in normalization (#13591)
  • Security: Upgrade platform to openjdk:19-slim-bullseye (#14971)
  • Self Hosting: Refactor OSS Helm Charts (#14794)
  • CDK: Add support for enabling debug from command line and some basic general debug logs (#14521)
  • CDK: Add a schema_generator tool (#13518)
  • Docs: improve doc for contributing locally (#14661)
  • Docs: Airbyte Cloud’s Single vs. Multiple Workspaces (#14608)

New Connector features:

  • Redshift, Databricks, Snowflake, S3 Destinations: Make S3 output filename configurable (#14494)
  • Destination S3: update INSTANCE_PROFILE to use AWSDefaultProfileCredential (#14231)
  • Destination Oracle: custom JDBC parameters (#13841)
  • Source Amazon Seller Partner: add FBA storage fees report (#14625)
  • Source Amazon Seller Partner: Add new streams (#13604)
  • Source Azure Table Storage: Add incremental append capability (#14212)
  • Source BingAds: expose hourly/daily/weekly/monthly options from configuration (#13801)
  • Source add order_statuses and increase pagesize to 1000 (#14752)
  • Source File: add user-agent option (#14488)
  • Source File: Add YAML format (#14588)
  • Source Gitlab: add GroupIssueBoards stream (#13252)
  • Source PayPal Transaction: added OAuth2.0, fixed bug with normalization (#15000)
  • Source Postgres: make initial cdc waiting time configurable (#14451)
  • Source Okta: add User_Role_Assignments and Group_Role_Assignments stream (#14556)
  • Source Okta: add GroupMembers stream (#14380)
  • Source Okta: OAuth2.0 authorization method (#14710)
  • Source Okta: add custom roles stream (#14610)
  • Source Notion: add OAuth authorization for source-notion connector (#14706)
  • Source Tiktok Marketing: Video metrics stream (#13650)

Open-source data integration

Get all your ELT data pipelines running in minutes with Airbyte.