~Written mostly in Rust, Cube’s data processing and storage are based on the Arrow DataFusion query execution framework, which uses Apache Arrow as its in-memory format. Especially the core of Cube, the cache layer called Cube Store is 100% built-in Rust
‍Vector.dev: A high-performance observability data pipeline for pulling system data (logs, metadata)
ROAPI: Create full-fledged APIs for slowly moving datasets without writing a single line of code
Meilisearch: Lightning Fast, Ultra Relevant, and Typo-Tolerant search engine
Tauri: Tauri is a framework for building tiny, blazingly fast binaries for all major desktop platforms
Yew: A modern Rust framework for creating multi-threaded front-end web apps with WebAssembly.
Rust vs. Python
The downside of Rust, the learning curve is much higher than other languages, such as Python. That's why most Rust programs in data engineering will have a Python wrapper for integrating it into any Python data pipelines for a long time. It's also a shift from an interpreted language such as Python to a more Functional Language (FP) style, which Rust certainly supports.
📝 The upside and downside of the Python language
What makes Python popular right now:
* It’s old
* It’s beginner-friendly
* It’s versatile
The downsides of Python:
* Speed / Multithreading
* Scope
* Mobile Development
* Runtime Errors
Newer programming languages follow the functional programming approach. New functional programming languages started, such as Scala with Akka, Elixir, or multi-paradigm programming languages such as Julia, Kotlin (a fastest-growing language since Google made it default for Android development), and Rust.
GoLang seems to be a good compiled programming language usedin DevOps.
Elixir has servers monitoring data pipelines and re-tries included in the language; no framework is needed. It makes an excellent fit for data engineering and would replace parts of the Data Orchestrators.
Rust as a Primary Language?
Let's see an example of a modern data pipeline integrating with Airbyte, dbt, and some ML models in Python.
Each step can have errors and data mismatches. That's why we have orchestrator frameworks such as Dagster, which force you to write functional code or the concept of Functional Data Engineering. There is also lots of adoption in Python with the type hint or writing more Python and Functional Programming style. Or to bring up an example of another language, JavaScript, the rise of TypeScript.
âť“ The exciting question to me is whether Rust will be adapted as a primary language and can do data orchestration work?Â
As we typically load data into a data frame and transform or add some business logic within our data pipelines. This could be done efficiently with Rust and Apache Arrow, and DataFusion, which is type-safe, and a good ecosystem. Time will tell.
Will Rust Be the Programming Language for Data Engineers?
Rust is a multi-use language and gets the job done for many problems of a data engineer. But the data engineering space is dominated by Python (and SQL) and will stay that way for the foreseeable future. There is no "until people fully move into Rust". It's hard to express how many tools and frameworks are written in Python to interoperate with other Python tools. It's pretty hard to imagine that inertia changing substantially in the next decade.
The Rust projects we have seen above are excellent and will continue to grow for vital and core components, but for them to be helpful for the average data engineer. What was once supposed to be Scala will now be Rust —a backend tooling language to do tasks that need fast and well-maintained code, including a Python wrapper on top.
Writing libraries in Rust feels more like writing long-term infrastructure than writing in higher-level languages such as Python, Java, or the JVM.
What do you think? What is your take on Rust for data engineers?
Read more to gain insights into the evolving landscape of programming languages in the data engineering domain and explore our comprehensive article delving into the comparison of SQL vs. Python for data analysis.
Resources to Learn More on the Topic
Suppose you want to be up and running within minutes. Karim Jedda has an article, carefully exploring the Rust programming ecosystem as a 10+ years Python developer, checking how to do everyday programming tasks and what the tooling looks like. Shared Services of Canada did a hands-on example with Rust converting raw archive files into JSON for data analysis. Or Mehdi Ouazza's article where he debates the Battle for Data Engineer's Favorite Programming Language.Â
Or do you want to get hands-on and search for an example project? How about building an Airbyte Delta Lake Destination (Python interface) with delta-rs?
The data movement infrastructure for the modern data teams.
Simon is a Data Engineer and Technical Author at Airbyte. He is dedicated, empathetic, and entrepreneurial with 15+ years of experience in the data ecosystem. He enjoys maintaining awareness of new innovative and emerging open-source technologies.