How to Create a Database Schema in PostgreSQL: An Ultimate Guide
PostgreSQL is a free and open-source object-relational database management system. Its support for both SQL and JSON types makes it suitable for small projects and large-scale enterprise applications. As a relational database, PostgreSQL requires a well-defined database schema to organize data in a tabular format. The schema serves as a blueprint that defines the structure of the data objects and their relationships.
As a result, creating a schema is a foundational step in PostgreSQL database management. This article offers a detailed guide to help you create a database schema in PostgreSQL.
What Is PostgreSQL Schema?
In an object-relational DBMS like PostgreSQL, a schema is defined as a logical container that helps you hold database objects like tables. A schema can also include indexes, views, operators, data types, sequences, and functions. Within a single PostgreSQL database, you can have one or more schemas. Each schema provides a way to manage and isolate database objects, making it easier to organize large and complex databases.
By default, you can only access objects in schemas you own. To allow access to other schemas, the owner must grant the USAGE privilege. Additionally, creating objects in another schema requires the CREATE privilege on that schema.
Why Schemas Matter in PostgreSQL?
Schemas in PostgreSQL are fundamental for organizing and managing database objects. Here’s why they are essential:
- User Isolation: Schemas enable multiple users to access the same database without interfering with each other’s data or objects. Each user or group can have their schema, ensuring that their work remains independent.
- Efficient Data Organization: Schemas help you logically group database objects to manage and navigate complex databases. This well-defined structure enhances the database's maintainability.
- Avoiding Object Name Collisions: When third-party applications are integrated into a PostgreSQL database, their objects can be stored in separate schemas. The use of separate schemas allows you to prevent naming conflicts with the existing objects in the database.
- Improved Security: By assigning permissions at the schema level, PostgreSQL enables more granular control over who can access or modify certain data.
- Streamline Development: You can work in your own schema to develop and test new features in your application without affecting the production environment.
Things to Know Before Creating PostgreSQL Schemas
Before you create a database schema in PostgreSQL, it is essential to understand several key aspects to ensure the schema is designed properly. Here are the things to consider:
PostgreSQL Schema Hierarchy
The PostgreSQL schema hierarchy refers to the structural organization of database objects within PostgreSQL. Following this hierarchy, you can manage database objects efficiently, ensure accessibility, and maintain data separation. Let’s see the PostgreSQL schema hierarchy in detail:
Cluster
The cluster is the highest level in the PostgreSQL schema hierarchy. It consists of multiple named databases managed by a single PostgreSQL server instance. Users of the cluster may not have permission to access every database within it. Access is granted based on the privileges assigned to the user for specific databases.
Database
A database is a collection of schemas for different types of database objects. You can access the data of a specific database within a cluster by connecting to it.
Schema
A schema is a namespace within a database that allows you to organize and group related objects. Following are the various objects within schemas:
- Tables: Allows you to store structured data in rows and columns.
- Views: These are virtual tables that help you represent the result of a SQL query.
- Functions: A block of reusable code to perform computations.
- Sequences: It is an auto-incrementing numeric generator used for primary keys.
- Indexes: These are the data structures that enable you to improve the speed of data retrieval operations.
- Triggers: These actions automatically execute when specific database events occur.
About PostgreSQL Public Schema
The public schema is the default schema in PostgreSQL databases. When you create a new database in PostgreSQL, a public schema is automatically created. Any object, such as a table or views generated without specifying a schema, will be stored in the public schema. This way, all users can access the public schema to create, modify, or drop objects unless permissions are specifically restricted.
While a public schema is convenient for initial development, it’s best to use separate custom schemas in a production environment to organize data better.
Schema Naming Conventions
When creating schemas in PostgreSQL, following a consistent naming convention is essential for clarity, especially in complex applications with many users and objects. Let’s check out a few guidelines while naming schemas:
- Add Descriptive Names: The schema name should reflect its purpose. For example, a schema for user-related data can be named users.
- Avoid Reserved Keywords: You must not utilize reserved PostgreSQL keywords like select, table, or user as schema names.
- Use Lowercase Letters: PostgreSQL is case-sensitive. It is generally recommended to utilize lowercase letters for schema names, as PostgreSQL automatically converts unquoted names to lowercase.
- Underscores for Multiple Words: You can use underscores (_) to separate words in a schema name instead of spaces or camel case. For example, user_data instead of UserData or User Data.
- Prefix for Grouping: If there are multiple schemas serving similar purposes, you can apply a prefix to group them logically. For instance, sales_orders, sales_payments, and sales_stock.
Schema Search Path
The schema search path allows you to determine the order in which PostgreSQL looks for database objects when they are referenced without a schema qualifier. This feature is important because it helps you define the precedence of schemas in case the same object name exists in multiple schemas.
To understand this more clearly, let’s first create a table object “user_info” in a schema named “user_schema”:
However, writing fully qualified names, such as specifying both schema and object names, in PostgreSQL is tedious. To simplify this, PostgreSQL allows the use of unqualified names, such as just the table name.
Here, by default, PostgreSQL first searches for objects in the public schema, followed by any other schemas specified by the user or application. You can customize the search path to prioritize certain schemas using the SET search_path command.
For example, to enable PostgreSQL to first look for objects in a user-created schema “user_schema”, use the syntax:
If it does not find the object in user_schema, PostgreSQL will help you search in the public schema.
How to Create a Schema in PostgreSQL?
In this section, you will learn about how to create a database schema in PostgreSQL.
Prerequisites:
- Download and install PostgreSQL from the official website.
Steps:
- Launch pgAdmin, a built-in GUI tool for PostgreSQL, or open your terminal to use psql commands. Here, let’s use CLI with psql commands to create a database schema in PostgreSQL.
- Connect to your PostgreSQL database server instance using the following command:
- Define a new schema using any of the following syntax:
Let’s understand each of the parameters in the above syntax:
Here are a few examples to create a schema by using these parameters:
Creating a schema named “finance” owned by manager_role
Creating a schema only if it doesn't already exist
Creating a schema with additional elements
Working With PostgreSQL Schema
Once you create a database schema within a PostgreSQL database, you can manage it by modifying, dropping, or moving objects across multiple schemas.
Modifying Schemas
In PostgreSQL, you can modify the definition of a schema that you own using the ALTER SCHEMA command. Using this command, you can change the name of an existing schema using the RENAME TO clause:
Where the name is the existing schema name, and new_name represents the new schema name.
This command also allows you to change the owner to a new owner:
However, renaming a schema requires the CREATE privilege for the database. Changing the owner also needs CREATE permission to act as the new owner.
Dropping Schemas
If you are a PostgreSQL database owner or a superuser, you can remove a schema within the database using the DROP SCHEMA command. The syntax is as follows:
Here are the parameters:
- IF EXISTS: A clause that helps you prevent an error if the specified schema does not exist. Instead, a notice is displayed.
- name: The name of the schema to be dropped.
- CASCADE: An option that allows you to automatically remove all objects within the schema along with any dependent objects.
- RESTRICT: A keyword that enables you to ensure that the schema is not dropped if it contains any objects. This is the default behavior.
Moving Objects Between Schemas
In PostgreSQL, you can move objects like tables, views, sequences, and functions from one schema to another using the ALTER command.
Here is the syntax:
Where,
- OBJECT_TYPE: Specifies the type of object to move.
- object _name: Name of the object you want to move.
- new_schema_name: Target schema to which the object will be moved.
Let’s see the examples:
Move a Table
Changing the employees table from the current schema to finance
Move a Sequence
Shifting the sequence employee_id_seq to the finance schema
Move a View
Moving the sales_summary view to the analytics schema
Best Practices to Follow While Creating a Schema in PostgreSQL
Here are the best practices you must consider for a well-defined database schema design in PostgreSQL:
- Use a consistent naming convention for your schema names to identify the data you need faster.
- Create separate schemas for each user or application to control access to data instead of dumping every object in a public schema.
- Create a spreadsheet, database table, or plain text file to keep track of which users, roles, or applications own which schemas.
- Grant privileges on schemas rather than tables to restrict the user’s access to only the tables within that schema.
- Set the search path to include only the schemas you need. Overloading the search path can degrade performance as PostgreSQL has to check multiple schemas for objects.
- When creating schemas in scripts that may run more than once, use the IF NOT EXISTS clause to prevent errors.
- Regularly review and refactor schemas to remove unused or redundant ones.
- You must use the RESTRICT option when dropping a schema to avoid accidentally removing a schema that contains important objects.
- Only leverage the CASCADE option if you are sure about deleting all related database objects.
To leverage your data stored in PostgreSQL effectively for analysis, it should be properly structured, continuously updated, and well-integrated with data from various sources. Tools like Airbyte can help you achieve this seamlessly.
Why Airbyte for PostgreSQL Integration?
Airbyte provides a user-friendly interface and over 400 connectors. It helps you extract data from various sources (APIs, databases, flat files) and load it into your PostgreSQL database. This ensures your data remains centralized and ready for analysis.
During pipeline setup, you can mention how Airbyte should handle schema changes in the source for each connection. You have the flexibility to refresh this configuration at any time. For Cloud users, Airbyte automatically checks for schema changes in the source data every 15 minutes, and for Self-managed users at most after 24 hours.
Here are the key features of Airbyte:
- Extensive Connector Development: Airbyte offers 400+ pre-built connectors to help you extract data from multiple sources to a destination of your choice.
- Custom Connector Development: The AI-Assistant feature in the no-code Connector Builder feature allows you to create custom connectors as per your needs. The assistant prefills the necessary configuration fields, reducing the development time.
- Incremental Data Updates: Airbyte supports full refresh, incremental, and resumable full refresh syncs, optimizing data transfer by only updating changed records.
- Streamline GenAI Workflows: To streamline GenAI applications, you can load all your semi-structured and unstructured data into Airbyte-supported vector databases such as Pinecone, Milvus, and Qdrant.
- PyAirbyte: PyAirbyte is an open-source, developer-friendly Python library offered by Airbyte. It provides several utilities to extract data from various sources utilizing Airbyte connectors in your Python workflows.
Conclusion
A schema design is essential for efficiently organizing data within PostgreSQL. Compared to different types of databases, PostgreSQL offers more flexibility with schema management, allowing for the logical grouping of tables, functions, sequences, and other objects. This comprehensive guide showed how to create a database schema in PostgreSQL and modify the database objects within the schema.