Data Insights
Article

The Evolution of The Data Engineer: A Look at The Past, Present & Future

Thalia Barrera
October 19, 2022
15 min read
Limitless data movement with free Alpha and Beta connectors
Limitless data movement with free Alpha and Beta connectors
Introducing: our Free Connector Program ->

The Future: Where is The Data Engineer Going?

It seems that most, if not all, of the data engineering trends point to increased abstraction, simplification, and maturity of the field by adopting software engineering best practices. What does this mean for the future? I see four main tendencies that can be summarized as follows:

  • Data tools will decrease in complexity while adding more functionality and features.
  • There will be a specialization increase, giving rise to new roles within data engineering.
  • The gap between data producers and consumers will narrow.
  • Improved data management thanks to the adoption of DataOps.

Let’s go deeper into the above tendencies.

If we take a step back, we can see that one of the data engineers’ primary focuses has continually been establishing and maintaining connections between data sources and destinations via elaborate pipelines. A noteworthy development on this end that perfectly exemplifies the tendency toward easy-to-use tools and simplification is managed data connectors

As Alex Gronemeyer mentions when asked about working with data connectors, “It was really interesting bringing in data from business systems, it just takes a couple minutes to set up a new connector and get data flowing in, and that was something I'd never experienced before. Then, when I wanted to start modeling a new dataset to be used in a report downstream, most of my work focused on the data modeling, cleansing, and joining things together. I didn't have to bake in another week of time just to get the data and see what it looked like; that was already taken care of.”

Airbyte is an open-source tool that provides hundreds of off-the-shelf data connectors. For example, you could create a data pipeline from Postgres to Snowflake without writing code. This new generation of data tools is exciting and appealing even for highly technical professionals because when removing the need to create yet another ELT script, they get time and bandwidth for other initiatives that are more important to their company. This trend doesn’t seem to be slowing down in the future.

The immediate result of simplified tools that allow any data practitioner – such as data analysts and data scientists – to set a data pipeline in minutes is that data engineers are no longer bottlenecks. Self-serve analytics will likely continue to empower downstream data consumers in the future.

As mentioned before, new roles in the data world might emerge, just like the analytics engineer role that appeared in the early 2020s. An analytics engineer is a professional who most likely started their career as a business/data analyst; hence they’re well versed in SQL and building dashboards. Self-serve platforms and transformation tools like dbt allow them greater autonomy from data engineers. We might see more of these specialization roles appear in the future.

Companies are ingesting more data than ever before, thanks to the expanded capabilities given by improved data tools. As more stakeholders interact with data throughout its lifecycle and make decisions based on it, being able to trust the data has become critical. Because of that, data quality will remain at the top of a data team's priority list.

Increased focus on data quality has recently led to the emergence of a new role that, I believe, will continue to grow: data reliability engineer, a specialization of the data engineering role that focuses on data quality and availability. Data reliability engineers apply DevOps best practices to data systems, such as CI/CD, service-level agreements (SLAs), monitoring, and observability. 

Titles and responsibilities may also shift on the other side of the spectrum, where software engineering and data engineering meet. The shift may be propelled by data apps that combine software and analytics. It’s possible that in the future, software engineers will need to be well-versed in data engineering. With the advent of streaming and event-driven architectures, the separation between upstream backend systems and downstream analytics will fade.

The trend of data producers becoming more conscious of analytics and data science use cases will continue to grow. There’s already an increasing adoption of data contracts: an agreement between the owner of a source system and the team responsible for ingesting data into a data pipeline, which only suggests a tighter coupling between producers and consumers in the future.

If we look at the big picture – beyond technology or tools – the data ecosystem is moving towards increased collaboration between stakeholders. This has led to the development of new mindsets, such as DataOps. As defined by Gartner: “DataOps is a collaborative data management practice focused on improving the communication, integration, and automation of data flows between data managers and data consumers across an organization.”

At the end of the day, the explosion of new data tools and practices converges to solving a persistent problem: data management, working better together, and providing value. This area will dramatically improve in the coming years. 

Some question if all of the circumstances mentioned above will lead to the disappearance of data engineers in the future. I don’t believe that will be the case. More sophisticated tools, the fading gap between producers and consumers, and the implementation of DataOps mean that data engineers will focus on more strategic tasks without necessarily being intermediaries but rather advisors and enablers of automation.

Titles and responsibilities will also morph, potentially deeming the “data engineer” term obsolete in favor of more specialized and specific titles. But data engineering will always be necessary, as companies increasingly rely on data and require the development of new data-driven infrastructure and processes.

The future data engineer will be responsible for designing flexible data architectures that adapt to changing needs. That includes making decisions about tools and processes that provide the most value to the business.

The data movement infrastructure for the modern data teams.
Try a 14-day free trial