There was an awesome debate on DBT’s Slack last week discussing mainly about 2 things:
If you’re already on DBT’s Slack, here is the thread’s URL. Even Fivetran’s CEO was involved in the debate.
In this article, we want to discuss the second point and go over the different points mentioned by each party. The first point will come in another article. It’s more relevant to discuss whether an OSS approach makes sense before drilling down into the different alternatives.
We’ll go over the main challenges that companies face and see which approach fits best. We’ll call “commercial companies” the ones with a commercial software product, and “OSS companies” the ones with an open-source approach.
Here is a table summarizing the points mentioned in the debate.
To better understand this table, we invite you to read the list of challenges each approach faces below.
This might be the most challenging part for the open-source approach, but there are actually choices that can make an OSS approach even stronger than a commercial one on that matter.
In this case, a company supports a limited number of connectors (the most used ones) and actively maintains them. They know when there is a schema change and when they need to update the connector, and can be pretty responsive on that, if the organization is responsive and scales well.
However, the more connectors, the more difficult it is for a commercial company to keep the same level of maintenance across all connectors. In an ideal world, the organization will grow linearly with the number of connectors. But most often, there are inefficient processes, so every organization will reach a limit. The more efficient they are, the higher the limit.
When you look at Singer.io, you would think that an OSS approach is at a serious disadvantage here. But Singer.io has actually made several significant poor choices here:
So can we actually achieve a large number of high-quality well-maintained connectors with an open-source approach? The answer is probably, but only if the following conditions are met:
A last note here: when users use a connector on prod and see it fail, they will be able to fix it without waiting for external support teams (as for commercial companies), and the whole community will be able to benefit from the fix.
Every company is opinionated about the way it wants to use a connector; each has specific needs.
Commercial companies have a ROI-driven relationship with connectors. Any additional work on any edge case will need to have some ROI, or the company will have a hard time scaling. That also means that the commercial company will most likely have very high quality connectors for the most used ones. But it can’t afford to keep that level of quality across all integrations, and thus won’t be addressing their corner cases. With their finite resources, it will always be a trade-off. Commercial companies can’t address the long tail of integrations and can’t meet all unique needs.
Furthermore, in the case a client’s needs are not met, the client won’t be able to customize the existing connector to their needs. That is the limitation of this approach.
It is actually not obvious that an OSS approach would easily address all unique corner cases. Contributors build the connectors with their use cases and needs in mind; they won’t be building a high quality connector to address most companies’ needs.
However, there are ways to avoid that:
Being open source lets any engineer adapt any pre-built connectors to their needs, so every company will be able to address their own needs. The question is more about how much effort would be needed for each of their cases.
All existing commercial products have their pricing indexed in some way on the volume of data transferred and the number of connectors used. That means that teams cannot use those services without always thinking about costs.
On the other hand, open-source products will base their pricing on the features used and not the volume of data as it is self-hosted. So once adopted, teams do not have to worry about costs and can use all connectors freely.
This is an easy one. When dealing with a commercial product, if you face an issue, the only thing you can do is contact their support, and hope that your issue will be prioritized in the list of issues the company’s engineering team must resolve. You won’t be able to know when the issue will be fixed. It can take hours at best, or months.
With open source, you don’t need to wait on anybody. You can fix the integration yourself. Having this ability is important if the integration is vital to your business.
By resources, we mean either budget or the engineering time to work on the integrations. At Airbyte, we don’t like the word resources when talking about people. But, in this context. it is the only common one we could think of.
The truth is that in a lot of enterprises, it can be easier to justify hiring a new employee than getting even a modest budget approved for an external vendor. Depending on the open-source project, most or all of the job could be already done. That’s one of the reasons why most companies use open-source technologies rather than external vendors.
Once consolidated, the data is used by different teams: business intelligence / analysts, data scientists, finance teams, customer success, product or marketing team, etc. That is the beauty of products like Fivetran: anybody can use them and anyone can add more data.
One would think that ease of use is exclusive to commercial products, but that is actually not true. For instance, Gitlab and many other OSS projects have UIs that are usable by a non-technical audience. Airbyte has one and it is one of its primary features.
We’re talking about data, and most probably this involves customer data. All companies are subject to privacy compliance laws, such as GDPR, CCPA, HIPAA, etc. As a matter of fact, above a certain stage (about 100 employees) in a company, all external products need to go through a security compliance process that can take several months.
With open source, teams just use the technology directly without such processes. The adoption is frictionless, and the engineering need can be met overnight.
Commercial products mostly sit in the cloud. So if you have to replicate data from an internal database to another, it makes no sense to have the data move through the external vendor. In addition to the vendor’s costs, you need to add egress costs and the issue it poses in terms of privacy.
Open-source products are mostly self-hosted, so they don’t have this problem.
One of the comments in the thread was that “open source is hard without a direct revenue channel to support it.” That is actually true for any company. If you have a hard time getting revenues, you can’t scale the team or your effort.
According to Bessemer, open-source companies grow faster in revenues than many cloud leaders.
“The general argument for open-source software that applies to all software: Having access to the source code means users have freedom--the freedom to change the software how they see fit and fix issues.”
“If you spent months or years using, say, Fivetran, and all of a sudden Fivetran goes out of business or significantly changes its business, you could run the software yourself, even if just for a period of time. I’ve been there when it’s happened with other software; it’s not pretty, and sometimes you only have days or weeks of heads up.”
“If it’s open source, I can run it without Fivetran, so for special circumstances, like compliance or high volume, I don’t have to deal with legal or cost limitations.”
“I can see the code running my integration; I can more quickly debug issues. I can also easily create integrations anyone can choose to use. Tools like Fivetran and Stitch only work a percentage of the time; open source allows me to fill in that gap without creating one-off scripts (or Fivetran lambda functions) or going back and forth for weeks or months with support on a bug that I could fix in a few hours.”
If the open-source project is well structured and well thought through, there will be no points where a commercial approach can be better. The issue is that, until now, there were no good open-source projects that could alleviate the dangers to this approach.
If you like how we think an open-source project can address all the possible concerns by building a single platform with a single repo, a defined data protocol, a ready-to-use UI, and decoupling of EL and normalization, you should definitely have a look at what we’re building at Airbyte. We want to become the open-source standard for data integrations.
Let us know if you think we missed anything. Our goal is to see things from all perspectives and to keep this article up to date.
Get all your ELT data pipelines running in minutes with Airbyte.