No items found.

A Comprehensive Guide to BigQuery Data Types

A guide that will inform you briefly about every BigQuery data type.

Should you build or buy your data pipelines?

Download our free guide and discover the best approach for your needs, whether it's building your ELT solution in-house or opting for Airbyte Open Source or Airbyte Cloud.

Download now

Should you build or buy your data pipelines?

Download our free guide and discover the best approach for your needs, whether it's building your ELT solution in-house or opting for Airbyte Open Source or Airbyte Cloud.

Download now

Google BigQuery is one of the widely known data warehouses today. That's because of many reasons, such as its storage capacity, analytical capabilities, and more. One of the stand-out features of BigQuery is its data types. It supports lots of different data types ranging from numerical to complex arrays. These data types give the flexibility to handle complex data. 

However, it can be challenging to work with its different data types at the start. Therefore, to make things easier, here is a guide that will inform you briefly about every BigQuery data type. So, let's get started. 

Google BigQuery Data Types

Every data management platform that allows you to store and manipulate data has its own data types to display data in logical order. The available data types in BigQuery can be grouped into the following categories: 

  • Numeric (INT64, Numeric, Bignumeric, Float64). 
  • Boolean.
  • String. 
  • Bytes. 
  • Time (Date, time, datetime, and timestamp).
  • Geography. 
  • Array. 
  • Struct. 
  • JSON.

As you import the data into BigQuery, you must map the data types from the sources to the above data types of BigQuery. 

Let's discover each of the data types in BigQuery individually: 

Numeric Data Type

These are some of the most common data types used in calculations, analyses, and data reporting. You can break down Numeric types into the following four categories: 

INT (Integer)

BigQuery integers data type store signed integers. It's best for storing values like product IDs, counts, or years without decimals. BigQuery supports various integer sizes, permitting you to select the most efficient option based on your data range. Here's a breakdown of commonplace INT sizes:

  • INT64: This is the default integer type in BigQuery, capable of storing numbers from -9223372036854775808 to 9223372036854775807.
  • INT32: This size stores integers among -2,147,483,648 and 2,147,483,647, appropriate for maximum everyday data desires.

Choosing the proper INT length guarantees efficient storage usage. For example, if you simply have product IDs ranging from 1 to 1000, the usage of INT64 might be overkill. INT32 could be a more space-saving option in this case.


column_name INT64 

Example:  

A table storing consumer IDs can be defined with an INT64 column named user_id.

 
user_id INT64

Size of the Data Type: 8 bytes of storage per value.

FLOAT (Floating-Point)

The FLOAT data type provides numbers with decimal points. It's normally used for clinical data, measurements, or economic calculations. However, FLOAT has inherent boundaries because of its internal illustration. It can result in mild inaccuracies while storing and appearing operations on specific decimal values.

Here are important factors to remember:

  • FLOAT precision is restricted. While it could represent a huge range of numbers, the exactness of decimal places might be compromised.
  • For eventualities requiring high-precision decimal calculations, BigQuery gives opportunity data types like NUMERIC (defined below).
 
column_name FLOAT64

Example: 

A column storing clinical measurements like temperature variations may be described as:

 
temperature FLOAT64

Size of the Data Type: 8 bytes of storage per value.

NUMERIC (Decimal)

The NUMERIC data type is designed for specific decimal value garage and calculations. It offers a large benefit over FLOAT: it specifies the exact quantity of digits and decimal places a fee can preserve. This guarantees accurate effects when managing economic data, clinical measurements, or any situation wherein specific decimal values are essential.

Here's what makes NUMERIC stand out:

  • User-defined precision and scale: You can define the entire wide variety of digits (precision) and the variety of digits after the decimal factor (scale) to perfectly shape your data necessities.
  • High accuracy: Unlike FLOAT, NUMERIC removes rounding mistakes and keeps the precise decimal cost all through calculations.

While NUMERIC gives advanced precision, it requires extra storage area in comparison to FLOAT. Consider the exchange-off between garage efficiency and the extent of precision your data needs while selecting between these data types.

Syntax:

 
column_name NUMERIC(precision, scale)

Example:

A column storing currency values with two decimal locations can be described as:

 
price NUMERIC(10, 2)

Here, 10 represents the whole quantity of digits (which include integer and decimal places), and 2 indicates the variety of decimal places.

Size of the Data Type: 16 bytes.

BIGNUMERIC

While BigQuery offers the NUMERIC data type for decimal values, BIGNUMERIC takes precision to an entirely new degree. It's designed to address extremely massive numbers with as much as 76.76 digits of precision, inclusive of nine decimal locations. This makes BIGNUMERIC ideal for economic calculations, medical data evaluation, or any situation requiring notably high-accuracy decimal values.

Usage: BIGNUMERIC is suitable for situations requiring pretty unique calculations or dealing with extraordinarily big numerical values, together with monetary calculations or medical computations.

Syntax:

 
column_name BIGNUMERIC(precision, scale)

Example:

A scientific dataset storing astronomical distances with excessive precision can be described as:

 
distance BIGNUMERIC(30, 10)

Here, 30 represents the total wide variety of digits and 10 indicates the variety of decimal locations.

Size of the Data Type: 32 bytes.

Boolean Data Type

Boolean or BOOL values are represented by the keywords TRUE or FALSE but can also be NULL. The values of these data types are sorted in this order, from the least to greatest: NULL, FALSE, TRUE.

Example: Indicating product availability (BOOLEAN in_stock). A table storing patron account status may have an is_active column with a Boolean data type. TRUE shows an active account, while FALSE signifies an inactive one.

Size of the Data Type: 1 byte

String Data Type

STRING types refer to the variable length data operating in Unicode characters rather than bytes. The input string values must be UTF-8 encoded. 

To quote a string, you must use either single (') or double (") quotation marks. It can also be triple-quoted with a group of three single ("') or three double ("""). Some examples of string values are: "abc", "it's", 'it \'s', """ abc""","' it'"

Strings are frequently used by online businesses to store any user-generated data, including logins, email addresses, etc. 

Example: Storing client names or product descriptions (STRING customer_name, product_description)

Size of the Data Type: 2 bytes+UTF-8 encoded string

Bytes Data Type

BYTES represents variable-length data but operates on raw bytes rather than Unicode characters. Therefore, strings and bytes are separate and can't be used interchangeably. However, the syntax of strings and bytes are quite similar; they must be quoted in single, double, or triple quotation marks. Bytes are commonly used when certain strings need to be encoded into a common programming format. 

Size of the Data Type: 2 bytes according to cell

Date Data Type

DATE data type represents the Gregorian calendar date independent of the time zone. It doesn't represent a specific 24-hour period. Rather, a given data value represents a different 24-hour period in different time zones. For example, a PST date type defined as 2023-05-01 will represent a period between 5 p.m. on May 1st and 5 p.m. on May 2nd, while for Korea standard time, the date type will represent a period between 9 a.m. on May 2nd and 9 a.m. on May 1st. 

The canonical format of this data type is - YYYY-[M]M-[D]D. Here, the year is four digits, the month is one or two digits, and the date is one or two digits, respectively.

Example:  Storing order dates (DATE order_date)

Size of the Data Type: 8 bytes 

Time Data Type

As you'll see in a digital watch, a TIME data type represents time. Like the date type, the time data type can refer to different times according to the time zone. The canonical format of this is: [H]H:[M]M:[S]S[.DDDDDD|.F]. Where:

  • [H]H: One or digit hours (valid values range from 00 to 23). 
  • [M]M: One or two-digit minutes (valid values from 00 to 59).
  • [S]S: One or two-digit seconds (valid values range from 00 to 60).
  • [.F]: It can go upto six fractional digits (microsecond precision).

Example:  Storing transaction timestamps (TIME transaction_time)

Size of the Data Type: 8 bytes

Datetime Data Type

DATETIME data type displays not only the date but also the time value independent of the time zone. It includes the year, month, day, hours, minutes, seconds, and even subseconds. 

The format of this data type is: 

civil_date_part [time_part]civil_date_part:YYYY-[M]M-[D]Dtime_part:{|T|t}[H]H:[M]M:[S]S[.F]

In the above format, the civil_date_part and time_part are the same as the Date and time data type.

Size of the Data Type: 8 bytes

Timestamp Data Type

A TIMESTAMP or timestamp value represents the absolute point in time, independent of the time zone, with microsecond precision. The display of timestamps for human readability usually includes Gregorian data, a time, and a time zone in an implementation-dependent format. An example of the canonical format for the timestamp data type is "2014-09-27 12:30:00.45-08".

Size of the Data Type: 8 bytes

Geography Data Type

A GEOGRAPHY is a collection of points, line strings, and polygons represented as a point set or a subset of the earth's surface. The points specify a particular location using the same longitude and latitude values as on the GPS. The most basic syntax for this data type is specifying the points with x and y coordinates: POINT(x_coordinate y_coordinate). Learn more about this data type here

Example:  Storing consumer locations (GEOGRAPHY user_location)

Size of the Data Type: 16 bytes

Array Data Type

ARRAY is an ordered list of one or more elements of non-array values. Every element in the array must share the same data type. You can declare an array type using ARRAY <T> syntax. Examples of array declarations are ARRAY <INT64>, ARRAY <BYTES (5)>, ARRAY <STRING>. A string value will look like this: ARRAY<string>['x', 'y', 'xy']. 

When working with arrays, if an array contains a NULL value, BigQuery will return an error. 

BigQuery array data types are very common when working with large data sets as they eliminate the need to declare thousands of variables. 

Example: Storing a list of product categories for each object (ARRAY<STRING> product_categories)

Struct Data Type

A STRUCT is a container of ordered fields, each with a type which is required and field name (optional). You can freely mix data types within a struct data type. There are two ways to define a struct: one is a typeless struct, and the other is typed. 

The typeless struct uses a tuple, and the syntax for that is as follows (expr1, expr2 [, … ]). This struct includes STRUCT (1,2,3) and STRUCT('abc')

The syntax of the typed struct looks like this: STRUCT<[field_name] field_type, ...>( expr1 [, ... ]). Examples of this struct are STRUCT<int64>(5) and STRUCT<date>("2011-05-05")

A STRUCT data type has many use cases, as it can contain any BigQuery and combine different types. 

Example: Storing consumer data like name, email, and deal with (STRUCT consumer call STRING, e-mail STRING, address STRUCT street_address STRING, town STRING )

JSON Data Type

A JSON data type represents a lightweight data-interchange format with which you might be familiar. You can expect these canonicalization behaviors when creating a value with JSON type: 

  • Booleans, nulls, and strings are preserved exactly. 
  • It can store integers in the range of -9,223,372,036,854,775,808 (minimal) to 18,446,744,073,709,551,615 (maximal) and floating point numbers within a domain of FLOAT64.
  • The order of each value in an array is preserved exactly.
  • The order of object members is not guaranteed or preserved. 

The syntax of BigQuery JSON data type is JSON 'formatted data'. An example can be JSON' {"name": "Cooper", "forname": Alice}'

Example: Storing product data with diverse attributes (JSON product_details)

How to check the data type in Bigquery?

1. Using SQL Commands

  • You can use the INFORMATION_SCHEMA to query metadata about your tables, including the data forms of columns.
  • Execute a SQL query similar to the one after retrieving information about the data forms of columns in a table:
 
SELECT column_name, data_typeFROM `project.dataset.INFORMATION_SCHEMA.COLUMNS`WHERE table_name = 'your_table_name';
  • Replace 'project.dataset' with the right assignment and dataset name, and 'your_table_name' with the call of your table.

2. Using the BigQuery Console

  • Navigate to the BigQuery Console and pick your mission and dataset.
  • Locate the table you're inquisitive about and click on it to view its details.
  • You'll see a listing of columns together with their corresponding data types.

By following these strategies, you could quickly decide the data varieties of columns for your BigQuery tables, permitting you to better understand and work with your data.

How to change the data type in Bigquery?

Changing the data form of a column in Google BigQuery may be achieved through various techniques, depending in your precise requirements and possibilities. Here are a few common techniques:

1. Using SQL Commands

  • You can alter the data form of a column using SQL's ALTER TABLE statement along with the SET DATA TYPE function.
  • Here's an instance SQL query to exchange the data type of a column named column_name in a table named your_table_name:
 
ALTER TABLE `project.dataset.your_table_name`ALTER COLUMN column_name SET DATA TYPE new_data_type;
  • Replace 'project.dataset' with your mission and dataset call, 'your_table_name' with the table name, column_name with the column you need to modify, and new_data_type with the preferred data type.

BigQuery only allows changing a column's data type to a compatible type. For example, you cannot directly change a STRING column to an INT64 column if the string values contain non-numeric characters.

2. Using the BigQuery Console

  • This method offers a visual interface for changing data types.
  • Navigate to the BigQuery Console and discover the table containing the column whose data type you want to change.
  • Click on the table to open its details.
  • Find the column you desire to regulate and click on its data type.
  • It provides a dropdown menu that shows compatible data types based on the current data.

By using those techniques, you could efficiently alter the data types of columns in your BigQuery tables to better match your data processing and evaluation needs.

BigQuery Integration With Airbyte

The above section in the article will help you deal with data types. However, data types are not the only thing if you are managing data. Therefore, to deal with other aspects of data management, such as data integration, you should use SaaS tools like Airbyte. 

Airbyte is an ELT tool that can help you automate data integration from multiple sources of your choice to BigQuery. With robust orchestration capabilities, an intuitive user interface, and integration with data transformation tools like dbt, it streamlines complex data integration processes. 

You can use Airbyte to centralize data (of any BigQuery-supported data types) in BigQuery without writing any code. 

Sign up (if you haven't already) with Airbyte Cloud today to handle data integration tasks easily. 

Conclusion 

To bring the data from outside data sources to the BigQuery database, you must have a detailed knowledge of the data types BigQuery supports. When you pick the right data type, you can enhance data integrity, analytical capabilities, and system performance. Following the guide above, you can easily learn about all the data types BigQuery supports with their syntax and examples. 

If you want to move from data type mapping to data integration in BigQuery, you can use Airbyte Cloud. The tool has an extensive library of more than 350+ pre-built connectors to smooth the process of data ingestion from various operational systems to BigQuery.

Similar use cases

No similar recipes were found, but check back soon!