No items found.

A Comprehensive Guide to BigQuery Data Types

A guide that will inform you briefly about every BigQuery data type.

Should you build or buy your data pipelines?

Download our free guide and discover the best approach for your needs, whether it's building your ELT solution in-house or opting for Airbyte Open Source or Airbyte Cloud.

Download now

Google BigQuery is one of the widely known data warehouses today. That's because of many reasons, such as its storage capacity, analytical capabilities, and more. One of the stand-out features of BigQuery is its data types. It supports lots of different data types ranging from numerical to complex arrays. These data types give the flexibility to handle complex data. 

However, it can be challenging to work with its different data types at the start. Therefore, to make things easier, here is a guide that will inform you briefly about every BigQuery data type. So, let's get started. 

Google BigQuery Data Types

Every data management platform that allows you to store and manipulate data has its own data types to display data in logical order. The available data types in BigQuery can be grouped into the following categories: 

  • Numeric (INT64, Numeric, Bignumeric, Float64). 
  • Boolean.
  • String. 
  • Bytes. 
  • Time (Date, time, datetime, and timestamp).
  • Geography. 
  • Array. 
  • Struct. 
  • JSON.

As you import the data into BigQuery, you must map the data types from the sources to the above data types of BigQuery. 

Let's discover each of the data types in BigQuery individually: 

Numeric Data Type

These are some of the most common data types used in calculations, analyses, and data reporting. You can break down Numeric types into the following four categories: 

  • INT64.
  • NUMERIC. 
  • BIGNUMERIC.
  • FLOAT64. 

Integers or INT64 are decimal digits or hexadecimal values prefixed with '0x' or '0X'. You can prefix an integer by "+" or "-" to represent positive or negative values. The values in this data type can't contain fractional or decimal components Examples of integer values are 123, 0xABC, and -123

The NUMERIC and BIGNUMERIC values are numeric with fixed decimal precision and scale. Precision is the number of digits a number can contain, and scale is how many digits appear after a decimal point. 

Floating points or Float64 are numbers with approximate numeric values with decimals or fractional components. These values include non-number values: Nan, +inf, and -inf. An example of a float value are 123.4567, 58., and .14

Boolean Data Type

Boolean or BOOL values are represented by the keywords TRUE or FALSE but can also be NULL. The values of these data types are sorted in this order, from the least to greatest: NULL, FALSE, TRUE.

String Data Type

STRING types refer to the variable length data operating in Unicode characters rather than bytes. The input string values must be UTF-8 encoded. 

To quote a string, you must use either single (') or double (") quotation marks. It can also be triple-quoted with a group of three single ("') or three double ("""). Some examples of string values are: "abc", "it's", 'it \'s', """ abc""","' it'"

Strings are frequently used by online businesses to store any user-generated data, including logins, email addresses, etc. 

Bytes Data Type

BYTES represents variable-length data but operates on raw bytes rather than Unicode characters. Therefore, strings and bytes are separate and can't be used interchangeably. However, the syntax of strings and bytes are quite similar; they must be quoted in single, double, or triple quotation marks. Bytes are commonly used when certain strings need to be encoded into a common programming format. 

Date Data Type

DATE data type represents the Gregorian calendar date independent of the time zone. It doesn't represent a specific 24-hour period. Rather, a given data value represents a different 24-hour period in different time zones. For example, a PST date type defined as 2023-05-01 will represent a period between 5 p.m. on May 1st and 5 p.m. on May 2nd, while for Korea standard time, the date type will represent a period between 9 a.m. on May 2nd and 9 a.m. on May 1st. 

The canonical format of this data type is - YYYY-[M]M-[D]D. Here, the year is four digits, the month is one or two digits, and the date is one or two digits, respectively. 

Time Data Type

As you'll see in a digital watch, a TIME data type represents time. Like the date type, the time data type can refer to different times according to the time zone. The canonical format of this is: [H]H:[M]M:[S]S[.DDDDDD|.F]. Where:

  • [H]H: One or digit hours (valid values range from 00 to 23). 
  • [M]M: One or two-digit minutes (valid values from 00 to 59).
  • [S]S: One or two-digit seconds (valid values range from 00 to 60).
  • [.F]: It can go upto six fractional digits (microsecond precision).

Datetime Data Type

DATETIME data type displays not only the date but also the time value independent of the time zone. It includes the year, month, day, hours, minutes, seconds, and even subseconds. 

The format of this data type is: 

civil_date_part [time_part]civil_date_part:YYYY-[M]M-[D]Dtime_part:{|T|t}[H]H:[M]M:[S]S[.F]

In the above format, the civil_date_part and time_part are the same as the Date and time data type.

Timestamp Data Type

A TIMESTAMP or timestamp value represents the absolute point in time, independent of the time zone, with microsecond precision. The display of timestamps for human readability usually includes Gregorian data, a time, and a time zone in an implementation-dependent format. An example of the canonical format for the timestamp data type is "2014-09-27 12:30:00.45-08".

Geography Data Type

A GEOGRAPHY is a collection of points, line strings, and polygons represented as a point set or a subset of the earth's surface. The points specify a particular location using the same longitude and latitude values as on the GPS. The most basic syntax for this data type is specifying the points with x and y coordinates: POINT(x_coordinate y_coordinate). Learn more about this data type here

Array Data Type

ARRAY is an ordered list of one or more elements of non-array values. Every element in the array must share the same data type. You can declare an array type using ARRAY <T> syntax. Examples of array declarations are ARRAY <INT64>, ARRAY <BYTES (5)>, ARRAY <STRING>. A string value will look like this: ARRAY<string>['x', 'y', 'xy']. 

When working with arrays, if an array contains a NULL value, BigQuery will return an error. 

BigQuery array data types are very common when working with large data sets as they eliminate the need to declare thousands of variables. 

Struct Data Type

A STRUCT is a container of ordered fields, each with a type which is required and field name (optional). You can freely mix data types within a struct data type. There are two ways to define a struct: one is a typeless struct, and the other is typed. 

The typeless struct uses a tuple, and the syntax for that is as follows (expr1, expr2 [, … ]). This struct includes STRUCT (1,2,3) and STRUCT('abc')

The syntax of the typed struct looks like this: STRUCT<[field_name] field_type, ...>( expr1 [, ... ]). Examples of this struct are STRUCT<int64>(5) and STRUCT<date>("2011-05-05")

A STRUCT data type has many use cases, as it can contain any BigQuery and combine different types. 

JSON Data Type

A JSON data type represents a lightweight data-interchange format with which you might be familiar. You can expect these canonicalization behaviors when creating a value with JSON type: 

  • Booleans, nulls, and strings are preserved exactly. 
  • It can store integers in the range of -9,223,372,036,854,775,808 (minimal) to 18,446,744,073,709,551,615 (maximal) and floating point numbers within a domain of FLOAT64.
  • The order of each value in an array is preserved exactly.
  • The order of object members is not guaranteed or preserved. 

The syntax of BigQuery JSON data type is JSON 'formatted data'. An example can be JSON' {"name": "Cooper", "forname": Alice}'

BigQuery Integration With Airbyte

The above section in the article will help you deal with data types. However, data types are not the only thing if you are managing data. Therefore, to deal with other aspects of data management, such as data integration, you should use SaaS tools like Airbyte. 

Airbyte is an ELT tool that can help you automate data integration from multiple sources of your choice to BigQuery. With robust orchestration capabilities, an intuitive user interface, and integration with data transformation tools like dbt, it streamlines complex data integration processes. 

You can use Airbyte to centralize data (of any BigQuery-supported data types) in BigQuery without writing any code. 

Sign up (if you haven't already) with Airbyte Cloud today to handle data integration tasks easily. 

Conclusion 

To bring the data from outside data sources to the BigQuery database, you must have a detailed knowledge of the data types BigQuery supports. When you pick the right data type, you can enhance data integrity, analytical capabilities, and system performance. Following the guide above, you can easily learn about all the data types BigQuery supports with their syntax and examples. 

If you want to move from data type mapping to data integration in BigQuery, you can use Airbyte Cloud. The tool has an extensive library of more than 350+ pre-built connectors to smooth the process of data ingestion from various operational systems to BigQuery.

Should you build or buy your data pipelines?

Download our free guide and discover the best approach for your needs, whether it's building your ELT solution in-house or opting for Airbyte Open Source or Airbyte Cloud.

Download now

Similar use cases

No similar recipes were found, but check back soon!