A guide that will inform you briefly about every BigQuery data type.
Download our free guide and discover the best approach for your needs, whether it's building your ELT solution in-house or opting for Airbyte Open Source or Airbyte Cloud.
Download our free guide and discover the best approach for your needs, whether it's building your ELT solution in-house or opting for Airbyte Open Source or Airbyte Cloud.
Google BigQuery is one of the widely known data warehouses today. That's because of many reasons, such as its storage capacity, analytical capabilities, and more. One of the stand-out features of BigQuery is its data types. It supports lots of different data types ranging from numerical to complex arrays. These data types give the flexibility to handle complex data.
However, it can be challenging to work with its different data types at the start. Therefore, to make things easier, here is a guide that will inform you briefly about every BigQuery data type. So, let's get started.
Every data management platform that allows you to store and manipulate data has its own data types to display data in logical order. The available data types in BigQuery can be grouped into the following categories:
As you import the data into BigQuery, you must map the data types from the sources to the above data types of BigQuery.
Let's discover each of the data types in BigQuery individually:
These are some of the most common data types used in calculations, analyses, and data reporting. You can break down Numeric types into the following four categories:
BigQuery integers data type store signed integers. It's best for storing values like product IDs, counts, or years without decimals. BigQuery supports various integer sizes, permitting you to select the most efficient option based on your data range. Here's a breakdown of commonplace INT sizes:
Choosing the proper INT length guarantees efficient storage usage. For example, if you simply have product IDs ranging from 1 to 1000, the usage of INT64 might be overkill. INT32 could be a more space-saving option in this case.
Example:
A table storing consumer IDs can be defined with an INT64 column named user_id.
Size of the Data Type: 8 bytes of storage per value.
The FLOAT data type provides numbers with decimal points. It's normally used for clinical data, measurements, or economic calculations. However, FLOAT has inherent boundaries because of its internal illustration. It can result in mild inaccuracies while storing and appearing operations on specific decimal values.
Here are important factors to remember:
Example:
A column storing clinical measurements like temperature variations may be described as:
Size of the Data Type: 8 bytes of storage per value.
The NUMERIC data type is designed for specific decimal value garage and calculations. It offers a large benefit over FLOAT: it specifies the exact quantity of digits and decimal places a fee can preserve. This guarantees accurate effects when managing economic data, clinical measurements, or any situation wherein specific decimal values are essential.
Here's what makes NUMERIC stand out:
While NUMERIC gives advanced precision, it requires extra storage area in comparison to FLOAT. Consider the exchange-off between garage efficiency and the extent of precision your data needs while selecting between these data types.
Syntax:
Example:
A column storing currency values with two decimal locations can be described as:
Here, 10 represents the whole quantity of digits (which include integer and decimal places), and 2 indicates the variety of decimal places.
Size of the Data Type: 16 bytes.
While BigQuery offers the NUMERIC data type for decimal values, BIGNUMERIC takes precision to an entirely new degree. It's designed to address extremely massive numbers with as much as 76.76 digits of precision, inclusive of nine decimal locations. This makes BIGNUMERIC ideal for economic calculations, medical data evaluation, or any situation requiring notably high-accuracy decimal values.
Usage: BIGNUMERIC is suitable for situations requiring pretty unique calculations or dealing with extraordinarily big numerical values, together with monetary calculations or medical computations.
Syntax:
Example:
A scientific dataset storing astronomical distances with excessive precision can be described as:
Here, 30 represents the total wide variety of digits and 10 indicates the variety of decimal locations.
Size of the Data Type: 32 bytes.
Boolean or BOOL values are represented by the keywords TRUE or FALSE but can also be NULL. The values of these data types are sorted in this order, from the least to greatest: NULL, FALSE, TRUE.
Example: Indicating product availability (BOOLEAN in_stock). A table storing patron account status may have an is_active column with a Boolean data type. TRUE shows an active account, while FALSE signifies an inactive one.
Size of the Data Type: 1 byte
STRING types refer to the variable length data operating in Unicode characters rather than bytes. The input string values must be UTF-8 encoded.
To quote a string, you must use either single (') or double (") quotation marks. It can also be triple-quoted with a group of three single ("') or three double ("""). Some examples of string values are: "abc", "it's", 'it \'s', """ abc""","' it'".
Strings are frequently used by online businesses to store any user-generated data, including logins, email addresses, etc.
Example: Storing client names or product descriptions (STRING customer_name, product_description)
Size of the Data Type: 2 bytes+UTF-8 encoded string
BYTES represents variable-length data but operates on raw bytes rather than Unicode characters. Therefore, strings and bytes are separate and can't be used interchangeably. However, the syntax of strings and bytes are quite similar; they must be quoted in single, double, or triple quotation marks. Bytes are commonly used when certain strings need to be encoded into a common programming format.
Size of the Data Type: 2 bytes according to cell
DATE data type represents the Gregorian calendar date independent of the time zone. It doesn't represent a specific 24-hour period. Rather, a given data value represents a different 24-hour period in different time zones. For example, a PST date type defined as 2023-05-01 will represent a period between 5 p.m. on May 1st and 5 p.m. on May 2nd, while for Korea standard time, the date type will represent a period between 9 a.m. on May 2nd and 9 a.m. on May 1st.
The canonical format of this data type is - YYYY-[M]M-[D]D. Here, the year is four digits, the month is one or two digits, and the date is one or two digits, respectively.
Example: Storing order dates (DATE order_date)
Size of the Data Type: 8 bytes
As you'll see in a digital watch, a TIME data type represents time. Like the date type, the time data type can refer to different times according to the time zone. The canonical format of this is: [H]H:[M]M:[S]S[.DDDDDD|.F]. Where:
Example: Storing transaction timestamps (TIME transaction_time)
Size of the Data Type: 8 bytes
DATETIME data type displays not only the date but also the time value independent of the time zone. It includes the year, month, day, hours, minutes, seconds, and even subseconds.
The format of this data type is:
In the above format, the civil_date_part and time_part are the same as the Date and time data type.
Size of the Data Type: 8 bytes
A TIMESTAMP or timestamp value represents the absolute point in time, independent of the time zone, with microsecond precision. The display of timestamps for human readability usually includes Gregorian data, a time, and a time zone in an implementation-dependent format. An example of the canonical format for the timestamp data type is "2014-09-27 12:30:00.45-08".
Size of the Data Type: 8 bytes
A GEOGRAPHY is a collection of points, line strings, and polygons represented as a point set or a subset of the earth's surface. The points specify a particular location using the same longitude and latitude values as on the GPS. The most basic syntax for this data type is specifying the points with x and y coordinates: POINT(x_coordinate y_coordinate). Learn more about this data type here.
Example: Storing consumer locations (GEOGRAPHY user_location)
Size of the Data Type: 16 bytes
ARRAY is an ordered list of one or more elements of non-array values. Every element in the array must share the same data type. You can declare an array type using ARRAY <T> syntax. Examples of array declarations are ARRAY <INT64>, ARRAY <BYTES (5)>, ARRAY <STRING>. A string value will look like this: ARRAY<string>['x', 'y', 'xy'].
When working with arrays, if an array contains a NULL value, BigQuery will return an error.
BigQuery array data types are very common when working with large data sets as they eliminate the need to declare thousands of variables.
Example: Storing a list of product categories for each object (ARRAY<STRING> product_categories)
A STRUCT is a container of ordered fields, each with a type which is required and field name (optional). You can freely mix data types within a struct data type. There are two ways to define a struct: one is a typeless struct, and the other is typed.
The typeless struct uses a tuple, and the syntax for that is as follows (expr1, expr2 [, … ]). This struct includes STRUCT (1,2,3) and STRUCT('abc').
The syntax of the typed struct looks like this: STRUCT<[field_name] field_type, ...>( expr1 [, ... ]). Examples of this struct are STRUCT<int64>(5) and STRUCT<date>("2011-05-05").
A STRUCT data type has many use cases, as it can contain any BigQuery and combine different types.
Example: Storing consumer data like name, email, and deal with (STRUCT consumer call STRING, e-mail STRING, address STRUCT street_address STRING, town STRING )
A JSON data type represents a lightweight data-interchange format with which you might be familiar. You can expect these canonicalization behaviors when creating a value with JSON type:
The syntax of BigQuery JSON data type is JSON 'formatted data'. An example can be JSON' {"name": "Cooper", "forname": Alice}'.
Example: Storing product data with diverse attributes (JSON product_details)
By following these strategies, you could quickly decide the data varieties of columns for your BigQuery tables, permitting you to better understand and work with your data.
Changing the data form of a column in Google BigQuery may be achieved through various techniques, depending in your precise requirements and possibilities. Here are a few common techniques:
BigQuery only allows changing a column's data type to a compatible type. For example, you cannot directly change a STRING column to an INT64 column if the string values contain non-numeric characters.
By using those techniques, you could efficiently alter the data types of columns in your BigQuery tables to better match your data processing and evaluation needs.
The above section in the article will help you deal with data types. However, data types are not the only thing if you are managing data. Therefore, to deal with other aspects of data management, such as data integration, you should use SaaS tools like Airbyte.
Airbyte is an ELT tool that can help you automate data integration from multiple sources of your choice to BigQuery. With robust orchestration capabilities, an intuitive user interface, and integration with data transformation tools like dbt, it streamlines complex data integration processes.
You can use Airbyte to centralize data (of any BigQuery-supported data types) in BigQuery without writing any code.
Sign up (if you haven't already) with Airbyte Cloud today to handle data integration tasks easily.
To bring the data from outside data sources to the BigQuery database, you must have a detailed knowledge of the data types BigQuery supports. When you pick the right data type, you can enhance data integrity, analytical capabilities, and system performance. Following the guide above, you can easily learn about all the data types BigQuery supports with their syntax and examples.
If you want to move from data type mapping to data integration in BigQuery, you can use Airbyte Cloud. The tool has an extensive library of more than 350+ pre-built connectors to smooth the process of data ingestion from various operational systems to BigQuery.