As an analytics engineer with nearly 7 years of experience under my belt, I’ve worked at both startup and enterprise companies, each of which handles tooling decisions very differently. Over the last few years, I’ve been able to work with home-built tools and many different types of modern data stack tools. While self-built tools seem great in theory, in almost every interaction I’ve had, they’ve been lacking in some capacity.
While working for a large enterprise organization, my entire line of business was abruptly assigned to a project migrating data models from an in-house legacy tool to a better, external tool. I was required to learn a completely new language, one essentially made up just for this tool, and overhaul all of the code to a more scalable option.
The migration took dozens of engineers and multiple months to complete. Not only did the company waste engineering resources when it first decided to build out this tool, but it maintained for years and then eventually migrated to a better option. Talk about a waste of time and money!
After this experience, I tend to always err on the side of caution whenever someone wants to build out an ELT tool in-house, especially when there’s a perfectly viable option on the market.
The Hidden Costs of In-House ELT Solutions When building an ELT solution in-house, there are many costs to consider–especially those related to human and technical resources.
Human Resources Costs It’s easy to forget about the engineers who need to actually build an in-house solution. There needs to be enough engineers and they need to have the skills to take on the work (or have extra time to learn them). They also need to be dedicated to maintaining and scaling the solution. There are a lot of different factors at play here. Let’s take a closer look at each one of them.
New Positions and Salaries: Hiring the Right Team When building an ELT tool in-house, you need more engineers on your team than you would think.
It’s really difficult to utilize existing engineers and create the space for them to dedicate time and effort to the building process. Because, after all, you don’t just need a person to build out a tool and be done with it. You need someone who will constantly be monitoring it, fixing bugs, and iterating on the initial designs.
In addition to souring specialized data engineers, you need to consider the other roles involved in creating an ETL tool.
You will most likely need to hire a data architect, product manager and engineering manager to keep the project on track and ensure a workable product at the end.
Data engineers aren’t the only roles involved in building a tool from scratch; the tool will require following basic UX principles, an easy-to-navigate design, and proper planning.
Keep in mind that hiring for new positions also takes lots of time! Factor in the time your HR team will need to spend finding the ideal candidates to take on this project.
Product Lifecycle Planning: More Than Just Building Unfortunately, scalable solutions can’t be built overnight. Technical projects like building an ELT tool from scratch require proper scoping and planning to be completed the right way.
In addition to the considerations in the initial planning phase like how the tool will function and what features it will have, there will be significant considerations throughout the entire product lifecycle.
You will need to iterate what is first built and maybe even add new features. It’s not a one-and-done kind of project.
A huge benefit of modern data stack tools is the large, talented team behind them. They are always adding new features, staying up to date with data engineering trends, and building based on user requests. They have a whole team working on product feature development alone.
The teams behind these modern tools understand and solve issues that you aren’t even aware of, but are guaranteed to eventually run into yourself.
Learning New Skills: The Cost of Training Engineers As mentioned before, unless you hire engineers specifically for the role of building out an in-house ELT solution, your engineers will most likely need to learn new technical skills.
Accounting for the time spent learning something new and working through the learning curve is crucial when planning any large project like this.
Keep in mind that the engineers will also have to learn what is and isn’t working about the tool as it's used over time. This means more time and money spent maintaining a tool and ensuring it's usable in your data stack.
Ongoing Maintenance: Keeping the Tool Up and Running Whenever you build something from scratch, you need to consider all of the maintenance involved. There’s a reason why modern data stack tools release frequent updates to their products: code is always breaking, making constant adjustments necessary.
Not to mention, tools need to be maintained to keep up with the ever-changing data landscape . I mean, now that AI has become so popular, your tool should be able to handle unstructured data, right?!
All jokes aside, maintenance is the most time-consuming aspect of any tool once it is built out. Nothing is perfect! Bugs and small fixes in a tool need to be prioritized above all else because of how it can affect the tool’s ability to work properly.
And, once you add one feature, the chances are the users will only request more to enable them to successfully do their jobs.
Putting Out Fires: Dealing with Unexpected Issues What happens if an analytics engineer tries to ingest data for a critical sales data model and the in-house data ingestion tool breaks? It is up to the data engineers who built and maintained that tool to fix it, AND FAST!
Instead of having an entire Slack community working to resolve an issue, or a huge team of data ingestion experts working together, you only have a few select engineers. This leaves it to you and only you to discover what is blocking your data pipeline from running as expected.
Data Downtime: The Real Impact on Business Operations Speaking of putting out fires, you also need to consider the data downtime that occurs as a result of these data fires. Data downtime refers to the time the business stakeholders are left without data (or incorrect data) once an issue arises.
In an ideal world, data downtime wouldn’t occur at all. In a realistic world, this is minimal and barely noticed by business teams.
While this can happen with modern data stack tools as well, it happens less often due to service level agreements (SLAs) they make when you sign a contract. They also want you to continue using the tool, so they will make this as low as possible to provide you with the most value!
When upgrading tools and deploying new features, minimal data downtime is a large consideration of the strategy sophisticated tooling teams use.
Technical resources Not only are human resources an important cost to consider when building an ELT solution in-house, but you also need to consider the technical resources costs. Just because you are building it doesn’t mean there aren’t still infrastructure-related expenses!
Hosting Costs: The Price of Infrastructure Remember that even when building your own solution, you deploy the code to an infrastructure tool that can host what you build. You may even need to pay to store your data if that is a critical component of your ELT tool. AWS isn’t free to host your code, in fact, it’s not even cheap. To ensure your setup is cost-effective, always compare vendors and cloud platforms offering the best rates for your infrastructure needs.
Opportunity Costs: What Are You Sacrificing? When one project is prioritized, something else always falls to the back burner. While we engineers wish we could do it all, we can’t. One thing must fall to the wayside when another one becomes top-of-mind. With this, it’s important to ask yourself- is this project the best use of my engineering team’s time?
If another project can create a bigger impact on the business, it may be worth allocating resources to that instead. I see a lot of engineering time wasted on smaller projects that can be outsourced, rather than having them focus on the one thing that really moves the needle.
Factors to Consider Before Building In-House In addition to the resources it requires to build out an ELT tool in-house, there are a few other aspects to consider when deciding how to move forward.
Existing Tools on the Market: Is There a Better Option? First, is there another tool that has built out what you want to build? If there is already a tool on the market that fits your needs, it may make the most sense to adopt that tool.
Also, ask yourself if there are any cons of using this pre-built tool. If there are cons, are there ways of mitigating them? If a tool isn’t lacking key features and has minimal downsides, you’ll free significantly more human resources by bringing that into your data stack.
If the costs seem too high, look at how the costs of this tool compare to the human and technical costs required to build in-house. Does one offer significantly more savings than the other? This will hopefully make the decision to use an external tool much easier!
We recommend checking out these ELT tools for best-in-class solutions:
Weighing the Tradeoffs: Time vs. Value Next, think about how you want your team to spend the next 6 months to 1 year. Is their time best spent on building out this new tool? Or are there other initiatives that would provide more value to the business? When considering tradeoffs, you always need to think in terms of business value.
What other important engineering projects are in the pipeline? Will the business suffer more when time is allocated to building an in-house solution or when money is spent on an external tool?
It’s typically a good idea to add a few months to the estimated project time as well. Remember, most of your engineers will not be experts in building out an ELT tool from scratch! There will be a learning curve and lots of iteration.
The Potential for Things to Go Wrong: Understanding the Risks Building a new tool or data pipeline from scratch is always a risk. There is a risk the process will take longer than expected. There is a risk that it will cost more than expected. There is a risk that you may not even end up with the product you dreamed of!
Make sure you understand all of the risks and unknowns involved with a project like this. You must prepare for the worst and hope for the best. What are all the things that could go wrong? Do you have a way to handle all of these worst-case scenarios? How do these worst cases compare to the cons of using a pre-built tool?
Conclusion Choosing between building your own data infrastructure and using an out-of-the-box tool is a decision that requires extensive research and thought. Instead of rushing into it, involve executives, engineering teams, financial teams, and data teams to get everyone’s point of view.
Spend time weighing the pros and cons of each solution and determining how you would handle the curve balls thrown your way.
I ended up leaving the company I worked for when I was assigned to migrate data models from a legacy tool to an external one. Learning how to use an internal tool, where I wasn’t gaining transferable skills, wasn’t something that interested me as a growing analytics engineer.
Many employees feel the same way I did when assigned to projects like this. Most engineers would rather work with popular modern data stack tools that aren’t super specific to the business and its needs.
This is just another thing to consider when deciding to build an ELT solution in-house. Make sure your team understands what they are signing up for and that they are on board for the ride.
If you go with an external tool, evaluate the pros and cons, spending proper time trialing the tools to understand how they work and if they will solve your use case.
If you’re looking for a data ingestion tool to scale with you and your company, sign up for a free trial of Airbyte . You can also consult the (really useful) cost estimator calculator to know your Airbyte cost upfront!