What is an ELT data pipeline?
An ELT pipeline is a data pipeline that extracts (E) data from a source, loads (L) the data into a destination, and then transforms (T) data after it has been stored in the destination. The ELT process that is executed by an ELT pipeline is often used by the modern data stack to move data from across the enterprise into analytics systems.
In this article, I will introduce you to the key ELT concepts, and will cover ELT-related questions that you may have, including the following:
- What is an ELT pipeline?
- What is an ELT tool?
- What are the ELT steps?
- What is an ETL pipeline?
- What is the difference between ELT and ETL?
- What are the benefits of ELT versus ETL?
Let’s get started!
What is an ELT pipeline?
ELT is an acronym that refers to the three steps that are executed when moving data from a source to a destination system: extract, load, and transform. First raw data is read (extracted) from a source system, then it is written (loaded) into a destination system, and finally the data is modified (transformed) after being written to the destination. ELT is often used for data integration into a database (Postgres, MySQL, etc), a data warehouse (BigQuery, Snowflake, etc.) or a data lake (S3, GCS, etc.). Once the data is loaded into the destination, dbt is commonly used for the creation and management of SQL statements that are executed by the destination to transform the data. The ELT process is demonstrated in the following image:
What is an ELT tool?
An extract, load, transform (ELT) tool executes the ELT pipeline that is used for moving data between systems. Historically, because the number of sources and destinations that were deployed in an enterprise were limited, enterprises may have created custom scripts or tools to move data between their systems. However, with the explosion of systems that are generating and collecting data, this approach has become infeasible. It is therefore becoming increasingly common to make use of fully-featured ELT tools that support hundreds of sources and destinations rather than building custom solutions. Airbyte is an example of an open-source ELT tool that meets these requirements.
What are the ELT steps?
Now that I have covered ELT at a high level, let's dive into the details of each step that is executed by an ELT pipeline.
ELT step one: Extract data
Extracting data from a source system is one of the most important aspects of ELT, as this sets the stage for the next steps. Your ELT solution must be flexible enough to extract data from a multitude of systems, in different formats, with different structures, and via different APIs. Common formats for source data include relational data, XML, JSON, and files. Airbyte supports hundreds of data sources and their associated data formats.
ELT tools often support various replication modes. The replication mode that is chosen determines how data is extracted from the source, which data is extracted from the source, and how often source data is sent to the destination. For example, incremental replication modes only extract new or modified data from the source during a given sync run. On the other hand, full refresh replication modes read the entire source dataset during a given sync run. In the case that change data capture replication is used, then a log of changes made to the source is read from the source.
ELT step two: Load data
The Load phase of ELT is responsible for writing (loading) data that was extracted from the source into a destination system. ELT pipelines are often used for data integration into a database, a data warehouse or a data lake. In the ELT approach, the data that is extracted from the source system should be loaded into the destination in a raw and unmodified form. Depending on the replication mode that has been selected, the raw table in the destination will either be overwritten or appended to. Additionally, given the breadth of destinations that may be used for storing data in an enterprise, an ELT tool should be able to send data into a multitude of systems. Airbyte supports many destinations.
ELT step three: Transform data
Data transformation is the process of converting data from one format into a different format. Reasons for doing this could be to optimize the data for a different use case than it was originally intended for, or to meet the requirements for storing data in a different system. Data transformation may involve steps such as cleansing, normalizing, structuring, validating, sorting, joining, or enriching data.
In the ELT process, data transformation is done on the data after it has been loaded into the destination – and the transformation may be performed inside the destination system itself. For example, if Airbyte is used for driving data into a data warehouse, the data that has been loaded into the raw data table can be normalized into the final (transformed) table by executing SQL commands in the data warehouse. This is done with the help of dbt, which is used to manage and create the SQL that is executed during the transformation stage.
What is an ETL pipeline?
ETL is an acronym that stands for extract, transform and load. ETL is an alternative to the ELT paradigm for moving data between systems. An ETL pipeline reads (extracts) data from one system, modifies (transforms) the data, and then writes (loads) the transformed data into a destination system. ETL is demonstrated in the following image:
What is the difference between ETL and ELT?
The main difference between ETL pipelines and ELT pipelines is that in the ETL process, data is transformed before being sent to the destination, while in the ELT approach, raw data is first stored in the destination and then it is transformed. The ETL paradigm emerged in the 1970s. However, the cost of memory and storage has plummeted over the years, as shown in the image below. This has made ELT pipelines, which store both the raw and transformed data in the destination, economically feasible.
In an ELT pipeline, transformation of the raw data is often done by new tools such as dbt (data build tool) which make it easy to create, manage, and execute SQL statements that are used to transform data inside the destination. ELT is often associated with the modern data stack, and has several advantages over traditional ETL.
What are the benefits of ELT versus ETL?
Business needs and data analytics requirements change quickly. However, if you use ETL to transform your data before it is loaded into your destination, you are baking in assumptions about how that data will be used in the future. Changes to your requirements will require changes to your ETL pipeline, which may require specialized knowledge, heightened security privileges, and data engineering support. Additionally, once the ETL pipeline has been modified, previously synced data will need to be re-synced if you wish to apply the new ETL logic to it.
In contrast to ETL, the ELT approach matches the reality of data needs that rapidly change. Because the ELT process transforms data after it has been loaded, it is not necessary to know in-advance exactly how the data will be used – new transformations can be performed on the raw data as the need arises. Furthermore, analysts can always access the original raw data, as subsequent transformations will not have compromised its integrity. This gives analysts autonomy from data engineers, since it is not necessary to modify ingest pipelines in order to transform the data in a new way. If a raw and unmodified version of the data exists in the destination, it can be re-transformed in the future without the need for a resync of data from source systems.
Furthermore, functionality such as dbt has made it easy to transform data in destinations such as databases and data warehouses. This allows data analysts, data scientists, and data engineers to transform the data however they like, as often as they like, with tools that they are already familiar with. This is in contrast to the ETL approach, for which transforming data may depend on specialized data engineers that can access and operate complex ETL pipelines.
This article has given a brief overview of ELT including a definition of ELT pipelines, a discussion of what an ELT tool is, and an overview of the ETL process. It has also covered ETL steps, discussed the difference between ETL and ELT pipelines, and talked about the benefits of ELT over ETL.
Using Airbyte’s open-source data integration platform allows you to leverage community-built and quality-assured ELT connectors. Furthermore, in the case that you can’t find an ELT connector that suits your requirements, Airbyte makes it easy to build your own custom ELT connectors by using our connector development kit.
If you haven’t yet tried out Airbyte to fulfill your ELT and data integration needs then you may wish to try Airbyte Cloud for a managed experience, or download Airbyte Open-Source to deploy it in your own infrastructure!