BigQuery Legacy SQL vs Standard SQL

February 13, 2024
20 mins

Google BigQuery features two SQL dialects that users can employ to query data—Legacy SQL and Standard SQL. Legacy SQL is the syntax introduced by Google for the BigQuery platform, while Standard SQL fits closer to widely accepted ANSI SQL standards. BigQuery’s SQL options may initially seem identical, but understanding the key differences between Legacy SQL and Standard SQL is essential. This article will dive into the core distinctions between BigQuery’s Legacy and Standard SQL capabilities across syntax, functionality, portability with other tools, and recommended use cases. 

What is Legacy SQL?

BigQuery Legacy SQL was the initial version of SQL used by Google. It was created to provide a SQL interface for querying and analyzing large datasets stored in BigQuery. The syntax and query structure of Legacy SQL is different from Standard SQL. 

Legacy SQL supports simple SQL operations like SELECT, ORDER BY, and WHERE clauses. In addition, it supports INNER, [FULL | RIGHT | LEFT] OUTER, and CROSS JOIN operations. However, if left unspecified, the default is INNER. With Legacy SQL, you can also use various window functions like RANK(), ROW_NUMBER(), FIRST_VALUE(), AVG(), MIN(), MAX(), etc.

Why Legacy SQL? 

Legacy SQL arose because it provided users with a simple and familiar SQL interface to interact with their data. However, as the technology and SQL standards evolved, Google introduced Standard SQL in BigQuery to align with broader industry standards, which offered more features and enhanced compatibility.

Here are some critical syntax and query structure details when working with BigQuery’s Legacy SQL: 

  • Legacy SQL uses legacy syntax like # instead of -- for comments. 

# This is a Legacy SQL comment

  • Table data is referenced using [dataset.table] convention in the query.
 
#legacySQL
SELECT
word
FROM
[ bigquery-public-data:samples.shakespeare ] ;
 
  • Legacy SQL has limited support for things like JOINs, window functions, etc. Queries are more straightforward as a result. 
 
 #legacySQL
SELECT
mother_age,
  COUNT
(mother_age) total
FROM
[ bigquery-public-data:samples.natality]
WHERE
state NOT IN (
SELECT
state
FROM
( SELECT
state,
COUNT (state) total
FROM
[ bigquery-public-data:samples.natality ]                
GROUP BY
state                
ORDER BY
total DESC
LIMIT 10))
AND mother_age > 50
GROUP BY
mother_age
ORDER BY
mother_age DESC
 
  • Each window function of Legacy SQL requires an OVER clause, which specifies the top and bottom of the window. The OVER clause components (partitioning, ordering, and framing) give additional control over the window.
 
#legacySQL
SELECT
name,
value,
AVG (value)
OVER
(ORDER BY value
ROWS	BETWEEN 1 PRECEDING AND CURRENT ROW )
AS MovingAverage
FROM
( SELECT "a" AS name, 0 AS value ),
( SELECT "b" AS name, 1 AS value ),
( SELECT "c" AS name, 2 AS value ),
( SELECT "d" AS name, 3 AS value ),
( SELECT "e" AS name, 4 AS value );
 

Benefits of Legacy SQL 

  • Compatibility with Existing Codebase: If a significant amount of code is written in Legacy SQL, maintaining compatibility with the existing codebase might be a reason to continue using it. Migrating to Standard SQL would require updating and testing all current queries. 
  • Minimal Learning Curve for Legacy Users: Users already familiar with Legacy SQL may find it more convenient to continue using it. Switching to Standard SQL might require additional training and adjustments. 
  • Transition Flexibility: Organizations may continue using Legacy SQL for existing projects while gradually transitioning to Standard SQL for new projects. 
  • Stability and Mature Syntax: Legacy SQL might be better suited for specific use cases or scenarios in some cases. Users should evaluate their requirements and consider whether Legacy SQL provides advantages for their data processing needs. 

What is Standard SQL?

Standard SQL refers to BigQuery’s SQL dialect that aligns with standard ANSI SQL syntax and capabilities. It adheres closely to syntax and functionality defined in ANSI SQL standards widely used across databases. Standard SQL supports ANSI SQL elements like subqueries, common table expressions (CTE), window functions, CASE statements, etc. 

Data handling and manipulations leverage Standard SQL methods like EXTRACT, DATE_TRUNC, and DATETIME data types rather than legacy options. Joins between tables use ANSI SQL JOIN clause semantics and operators like INNER JOIN, LEFT JOIN, etc. 

Overall, it provides a portable SQL environment that is familiar to other database platforms like PostgreSQL, SQL Server, and MySQL. 

How does it align with ANSI SQL standards?

Here’s how BigQuery’s Standard SQL aligns with the ANSI SQL standard: 

  1. Syntax and Semantics: Standard SQL adheres to the syntax and semantics defined by the ANSI SQL standard. This ensures a consistent and interoperable way of interfacing with relational databases. 
  2. Basic SQL Operations: Standard SQL includes fundamental SQL operations such as SELECT, INSERT, UPDATE, DELETE, and CREATE statements. These operations are part of the ANSI SQL standard and provide a common foundation for database interaction. 
  3. Data Types: Standard SQL defines a set of data types for representing different kinds of data. ANSI SQL standardizes these data types, ensuring consistency across database systems.
  4. Joins and Relationships: Standard SQL supports defining relationships between tables and performing joins. ANSI SQL specifies the syntax and semantics for various types of joins, allowing for consistent query execution.
  5. Constraints: StandardSQL includes features for defining constraints on tables, such as primary keys, foreign keys, unique constraints, and check constraints. These constraints follow the guidelines set by the ANSI SQL standard. Here, 
  • The PRIMARY KEY is the unique key in a database; a table cannot have more than one unique key. 
  • The FOREIGN KEY is a field in one table labeled as a primary key in another. 
  • The UNIQUE constraint ensures that every data item in a table is unique. 
  • The CHECK constraint limits the value range that the user placed in a column. 

Benefits of Standard SQL

Standard SQL in Google BigQuery provides several benefits compared to using Legacy SQL: 

  • Standardization and Familiarity: Standard SQL adheres to the SQL 2011 standard, making it more consistent with other relational databases. Users familiar with traditional SQL syntax will find it easier to work with Standard SQL.
  • Improved Handling of Semi-Structured Data: It offers enhanced support for nested and repeated data structures commonly found in a semi-structured format like JSON. This way, you can efficiently handle various data types. 
  • Advanced Analytical Functions: Standard SQL provides a broad set of analytical functions, including window functions, advanced aggregations, and statistical functions. Users can perform more sophisticated analyses directly within their SQL queries. 
  • User-Defined Functions (UDFs): It allows the creation and use of User-Defined Functions (UDFs), enabling users to write custom functions in JavaScript. The extensibility enhances the capabilities of SQL queries by incorporating user-defined logic. 
  • Compatibility with BI Tools: Standard SQL is often the preferred choice for integrating various BI tools and data visualization platforms, including Google Data Studio. This compatibility makes it easier for users to connect and analyze data using their preferred tools.

Example of Standard SQL

 
-- creating a new database

CREATE DATABASE IF NOT EXISTS employeeDB;
USE employee;

CREATE TABLE IF NOT EXISTS employees (
	employee_id INT PRIMARY KEY,
	first_name VARCHAR (255) NOT NULL,
	last_name VARCHAR (255) NOT NULL,
	job_title VARCHAR (255),
	salary DECIMAL (10,2)
);

-- Insert data into the employees table 

INSERT INTO employees (first_name, last_name, job_title, salary)
VALUES
	(‘John’, ‘Adams’, ‘Data Engineer’, 80000.00),
	(‘Joe’, ‘Smith’, ‘Software Engineer’, 70000.00),
	(‘Jack’, ‘Wilson’, ‘Manager’, 60000.00);

-- retrieving information 

SELECT*FROM employees; 

-- retrieving specific column 

SELECT first_name, job_title FROM employees;

-- filter data with WHERE clause 

SELECT*FROM employees WHERE salary > 80000;

-- update data

UPDATE employees SET job_title = ‘Senior Data Engineer’ WHERE first_name = ‘Jane’ AND last_name = ‘Smith’ ;

-- INSERT new data

INSERT INTO employees (first_name, last_name, job_title, salary)
VALUES (‘Emily’, ‘Jones’, ‘Marketing Specialist’, 76000.00);

-- DELETE information

DELETE FROM employees WHERE last_name = ‘Johnson’ ;
 

BigQuery Legacy SQL vs Standard SQL

Legacy SQL

Standard SQL

It does not adhere to SQL standards; it has a unique syntax specific to BigQuery.

Standard SQL adheres to the SQL 2011 standard for consistency with traditional databases.

Legacy SQL uses special syntax, like square brackets, for array scripting. 

It follows standard SQL conventions for familiarity. 

Limited support for nested data structures (e.g., JSON).

Enhanced support for nested data structures (e.g., JSON). 

Legacy SQL supports JavaScript-based User-defined Functions (UDFs). 

Support both JavaScript and SQL-based User-Defined Functions (UDFs). 
It offers a set of analytical functions with potentially limited capabilities.  Standard SQL offers a broader range of analytical functions, including window functions. 
Security features are present but with less granularity. 

Enhanced security features with finer control over access and permissions.

Primarily deals with flat, table-based structures. 

It supports structured data types like arrays and structures for flexible data modeling. 
Compatible with BI tools but less preferred for seamless integration.  Preferred for seamless integration with BI and data visualization tools (e.g., Power BI). 

Legacy SQL is still supported but receives fewer features and optimizations. 

Actively developed with the regular introduction of new features and optimizations. 

Using Airbyte to Enhance Efficiency Beyond SQL Dialects

While understanding the difference between the two SQL dialects is important to enhance the efficiency of your workflows, the technique you use to move data into your BigQuery data warehouse can further enhance your productivity. 

By incorporating Airbyte into your workflow, you can seamlessly automate the ELT process, facilitating the extraction and loading of data from various sources into Google BigQuery. This integration provides flexibility and scalability and enhances data management efficiency for analytics and reporting requirements. Airbyte is an open-source platform with 350+ data connectors, allowing you to connect to various data sources. However, if you wish to create your custom connector to connect with specific databases, APIs, or systems, Airbyte provides you with the connector development kit (CDK), where you can build your connectors in only 10 min without any code. 

Conclusion 

When deciding between BigQuery Legacy SQL and Standard SQL, considerations like an existing codebase, data complexity, and desired functionality play a pivotal role. Legacy SQL suits projects valuing simplicity and backward compatibility, while Standard SQL offers standardization and compatibility with modern tools. Transition decisions should align with specific project needs, ensuring optimal performance and ease of use. 

Limitless data movement with free Alpha and Beta connectors
Introducing: our Free Connector Program
The data movement infrastructure for the modern data teams.
Try a 14-day free trial