Advantages to PostgreSQL table partitioning: How-to guide

In this tutorial we explain how to use PostgreSQL to split large tables of data into smaller pieces for table partitioning.

PostgreSQL table partitioning: How to partition tables

PostgreSQL is a relational database management system (RDBMS) using structured query language (SQL). It offers features for storing and scaling complex data workloads. This data is stored in "tables" consisting of columns and rows.

As SQL databases grow, the sheer number of tables and the amount of data stored in each table can make it difficult to manage information efficiently. Table partitioning in PostgreSQL refers to splitting a large table into smaller pieces, which can provide benefits such as improved query performance and easier data management.

What is table partitioning in PostgreSQL?

Table partitioning in PostgreSQL is the process of organizing a large data cache into smaller subsets based on certain criteria.

Here’s a real-world example: Image you’re building a large Lego set, and you decide to organize them before building. You might want to sort the bricks into piles by color so that you can find the correct brick more quickly. Each pile represents a partition, and all the partitions together make up the entire Lego.

In PostgreSQL, partitioning a table means taking all the data stored in that table and dividing it into smaller pieces of related information, like in the toy example above. Each partition stores a subset of the data based on specific criteria, such as a range of values or a certain attribute.

This process can be useful when your PostgreSQL tables become large and unwieldy, allowing for better performance and manageability.

For instance, if you have a table with millions of records, partitioning can help speed up queries by only searching through the relevant partitions instead of the entire table. 

When you’re deciding whether or not to use employ this technique, a good rule of thumb is to use partitioning when the size of the table is greater than the memory installed in the database server. You might also consider partitioning when you’re working with tables containing historical data. As new data comes in, you could use partitioning to separate new data from historical data while keeping all the data contained in the same table.

Benefits of partitioning in PostgreSQL

Partitioning tables in PostgreSQL offers a number of significant advantages. These include the following:

Improved query performance

By dividing a large PostgreSQL table into smaller partitions, table partitioning reduces the number of bytes read during a single query. This improves efficiency by reducing the time each query takes, often by a substantial margin. Partitioning also allows for index reorganization or rebuilding on only a single partition, so queries are applied to only the most relevant partition of table data.

Faster bulk loads and deletes

Table partitioning can lead to faster bulk loads and deletes because it allows for the addition or removal of individual partitions, which can be accomplished more quickly than loading or deleting data from a single large table.

Optimized storage of seldom-used data

With table partitioning, seldom-used data can easily be partitioned off and migrated to a cheaper storage medium.

Types of partitioning

PostgreSQL offers three types of table partitioning: Range, list and hash.

Range partitioning

Range partitioning involves dividing the data from a specific table column into segments based on a specified range of values. Each partition represents a distinct range of values, and there is no overlap between the ranges assigned to different partitions.

Range partitioning is often used to sort information by date. The bounds of each range are inclusive at the lower end and exclusive at the upper end. 

For instance, if column data is being partitioned by purchase data, one partition might include purchases made between January 1st and April 1st (meaning the last inclusive data would be for March 31st). The next partition might include purchases from April 1st through July 31st.

List partitioning

List partitioning in PostgreSQL refers to the process of dividing a table into smaller subsets based on discrete categories that have been specified.

For instance, a table holding insurance data might be partitioned by region or state, or by the specific type of claim (e.g., home, auto, life).

Hash partitioning

Hash partitioning is a technique used to partition a table based on a hash function. Each partition is defined by a function computed from the given column, and the data is distributed evenly across the partitions.

This method is often used when there is no natural way to partition the data or when the goal is to achieve even distribution.

How to create a partition table in PostgreSQL

By following a few steps, even a beginner can start partitioning tables in PostgreSQL. Here's what you need:

  • PostgreSQL installed: First, make sure PostgreSQL is installed on your system. You can download and install it for free from the official website.

  • Access rights: To partition tables, you need sufficient permissions enabled to make database changes, or you need access to a user who has these permissions.

  • Basic SQL knowledge: You don't need to be a SQL wizard to learn to partition tables, but it helps to have a solid knowledge base when it comes to the basic commands.

Now, here's a step-by-step guide to creating a partitioned table in PostgreSQL:

  1. Create the parent table. Start by creating the parent table, which will serve as the template for defining partitions. You can use the CREATE TABLE statement to create the parent table.

  2. Define your partitioning strategy. Decide what criteria you want to use to partition your data. From there, you can determine if range, list or hash partitioning makes the most sense.

  3. Create your child tables. These are your partitions, and you can create them using the CREATE TABLE statement along with a constraint that specifies the partitioning rule you decided on in the previous step.

  4. Add data to your child tables.  As you use SQL commands to insert data into your child tables, PostgreSQL will use your partitioning key to route each data entry to the appropriate partition.

  5. Create indexes and constraints. Define indexes and constraints on the parent and child tables to ensure data integrity and improve query performance.

  6. Test and optimize. Once your partitioned table is set up, perform tests to ensure that data is correctly routed to partitions. Monitor and optimize query performance as needed.

Range partitioning example

Let's say you want to create a table holding home, auto and life insurance data for the years 2018 through 2020, and you want to partition it by date.

The first step is to create your partitioned table:

    CREATE TABLE insurance_policies_p ( agency text , policy_date date not null , policy_type text , description text , location text ) 
PARTITION BY RANGE (policy_date);
  

Now you have your partitioned table, but since you haven't defined the partitions themselves, you won't be able to insert data.

To fix this, you now need to define the partitions:

    CREATE TABLE insurance_policies_p_2018 PARTITION OF insurance_policies_p FOR VALUES FROM ('2018-01-01') TO ('2018-12-31');
CREATE TABLE insurance_policies_p_2019 PARTITION OF insurance_policies_p FOR VALUES FROM ('2019-01-01') TO ('2019-12-31');
CREATE TABLE insurance_policies_p_2020 PARTITION OF insurance_policies_p FOR
VALUES FROM ('2020-01-01') TO ('2020-12-31');
  

Now you can insert data into your table insurance_policies_p, and PostgreSQL will sort it into the correct partitioned table based on the policy date.

If you attempt to insert data as a new row and it doesn’t match the predicate for any partition, PostgreSQL will raise an error. To avoid this, it’s important to correctly define your data.

List partitioning example

The process for creating a PostgreSQL table partitioned by list is similar to creating one partitioned by range.

It starts the same way: By defining your partitioned table. Let's use the same one as in the range example.

    CREATE TABLE insurance_policies_p ( agency text , policy_date date not null , policy_type text , description text , location text ) 
PARTITION BY LIST (policy_type);
  

As you'll notice, there are only two differences: One, we specify directly that we're partitioning by list rather than by range. Two, we specify policy_type as the criterion for partitioning.

From there, the process to build the partitioned tables is also similar:

    CREATE TABLE insurance_policies_p_auto PARTITION OF insurance_policies_p FOR VALUES IN ('auto');
CREATE TABLE insurance_policies_p_home PARTITION OF insurance_policies_p FOR VALUES IN ('home');
CREATE TABLE insurance_policies_p_life PARTITION OF insurance_policies_p FOR VALUES IN ('life');
  

Hash partitioning example

We’ll build the initial table the same way as before:

    CREATE TABLE insurance_policies_p_hash ( agency text , policy_date date not null , policy_type text , description text , location text ) 
PARTITION BY HASH (location);
  

Then you use your modular and remainder values to partition the table into equal subsets:

    CREATE TABLE insurance_policies_p_hash_p1 PARTITION OF insurance_policies_p_hash FOR VALUES WITH (MODULUS 3, REMAINDER 0);
CREATE TABLE insurance_policies_p_hash_p2 PARTITION OF insurance_policies_p_hash FOR VALUES WITH (MODULUS 3, REMAINDER 1);
CREATE TABLE insurance_policies_p_hash_p3 PARTITION OF insurance_policies_p_hash FOR VALUES WITH (MODULUS 3, REMAINDER 2);
  

Partition maintenance

Table partitioning in PostgreSQL provides significant value for data maintenance. It efficiently manages large datasets by dividing them into smaller, more manageable partitions. This partitioning allows for faster and more efficient operations such as data loading, backup and index maintenance. 

Additionally, partitioning enables easier data archiving and purging, simplifying data lifecycle management. It also improves query performance, making it an essential technique for handling large and frequently accessed databases.

How to drop partitions

Use the DROP TABLE command to drop a partitioned table. Here's an example using one of the partitioned tables we built in the range partitioning example above:

    DROP TABLE insurance_policies_p_2018;
  

How to detach partitions

To detach a partition from the original table, use the ALTER TABLE and DETACH PARTITION commands.

For example:

    ALTER TABLE insurance_policies_p; 
DETACH PARTITION insurance_policies_p_2018;
  

Limitations of partition maintenance

Table partitioning is a powerful tool in PostgreSQL that can help you manage your databases more efficiently. That said, it does have a few limitations.

  • Overcomplication: Table partitioning can simplify queries and make them run faster. But you can sometimes reach a point where your partitioning gets so complex that it makes troubleshooting more difficult.

  • Data imbalances: Table partitioning doesn't automatically create equal subsets with similar data loads. If you aren't careful, you could end up with some partitions that are significantly larger and more complex.

  • Query constraints: Once you partition a table, your queries to that table must refer to the partitioning column, which can make certain queries more difficult and limit your flexibility.

  • Lack of compatibility: Some partitioning features might not fully work in older PostgreSQL versions, so be mindful if you're not using the latest.

Resources, examples and more PostgreSQL info

For more detailed information and examples on creating partitioned tables in PostgreSQL, you can refer to the official PostgreSQL documentation:

The official PostgreSQL documentation provides comprehensive and up-to-date information about partitioning and various other features of PostgreSQL.

Make large tables more manageable with partitioning in PostgreSQL

Table partitioning in PostgreSQL has many benefits. These include:

  • It can help you optimize your relational database performance.

  • It can significantly improve the efficiency of your queries, particularly when heavily accessed rows are concentrated in a single partition or a small number of partitions. 

  • It also helps organize data by splitting a large table into smaller subsets based on a common attribute. 

  • Finally, it can help you optimize your storage and even save you money.

For more information about table partitioning in PostgreSQL, check out these resources from Capital One:


Capital One Tech

Stories and ideas on development from the people who build it at Capital One.

Explore #LifeAtCapitalOne

Startup-like innovation with Fortune 100 capabilities.

Learn more

Related Content