PostgreSQL Multicolumn Indexes

This article describes how to create multicolumn indexes in PostgreSQL.

As the name suggests, a multicolumn index is an index defined on multiple columns of a table. Multi-column index, also known as compound index or composite index.

In PostgreSQL, only the B-tree, GIST, GIN, and BRIN index types support multicolumn indexes. A multicolumn index supports up to 32 columns.

PostgreSQL Multicolumn Index Rules

When defining a multi-column index, you should place frequently used columns in the WHERE clause at the beginning of the column list, and less frequently used columns in subsequent conditions. Otherwise, the PostgreSQL optimizer might not use the index.

For example, the following statement defines an index on the a, b and c columns:

CREATE INDEX index_name
ON table_name(a, b, c);

In the above syntax, the PostgreSQL optimizer considers the use of an index in the following cases:

WHERE a = v1 and b = v2 and c = v3;

or

WHERE a = v1 and b = v2;

or

WHERE a = v1;

However, indexes are not considered in the following cases:

WHERE  c = v3;

or

WHERE b = v2 and c = v3;

PostgreSQL multicolumn index Examples

Let’s demonstrate a PostgreSQL multicolumn index on the customer table in the Sakila sample database.

View information about the customer table:

\d customer
                                             Table "public.customer"
   Column    |            Type             | Collation | Nullable |                    Default
-------------+-----------------------------+-----------+----------+-----------------------------------------------
 customer_id | integer                     |           | not null | nextval('customer_customer_id_seq'::regclass)
 store_id    | smallint                    |           | not null |
 first_name  | character varying(45)       |           | not null |
 last_name   | character varying(45)       |           | not null |
 email       | character varying(50)       |           |          |
 address_id  | smallint                    |           | not null |
 activebool  | boolean                     |           | not null | true
 create_date | date                        |           | not null | 'now'::text::date
 last_update | timestamp without time zone |           |          | now()
 active      | integer                     |           |          |
Indexes:
    "customer_pkey" PRIMARY KEY, btree (customer_id)
    "idx_fk_address_id" btree (address_id)
    "idx_fk_store_id" btree (store_id)
    "idx_last_name" btree (last_name)
...

There are already two indexes in the customer table, in order to demonstrate the multi-column index, let’s drop the index named idx_last_name first:

DROP INDEX idx_last_name;

Create an index on the last_name and first_name columns using the following statement :

CREATE INDEX ON customer (last_name, first_name);

View the customer table:

\d customer
                                             Table "public.customer"
   Column    |            Type             | Collation | Nullable |                    Default
-------------+-----------------------------+-----------+----------+-----------------------------------------------
 customer_id | integer                     |           | not null | nextval('customer_customer_id_seq'::regclass)
 store_id    | smallint                    |           | not null |
 first_name  | character varying(45)       |           | not null |
 last_name   | character varying(45)       |           | not null |
 email       | character varying(50)       |           |          |
 address_id  | smallint                    |           | not null |
 activebool  | boolean                     |           | not null | true
 create_date | date                        |           | not null | 'now'::text::date
 last_update | timestamp without time zone |           |          | now()
 active      | integer                     |           |          |
Indexes:
    "customer_pkey" PRIMARY KEY, btree (customer_id)
    "customer_last_name_first_name_idx" btree (last_name, first_name)
    "idx_fk_address_id" btree (address_id)
    "idx_fk_store_id" btree (store_id)

We find that customer_last_name_first_name_idx indexe is defined on last_name and first_name columns.

Use the EXPLAIN statement to see if the following statement uses an index:

EXPLAIN
SELECT * FROM customer
WHERE last_name = 'A'
AND first_name = 'B';
                                            QUERY PLAN
---------------------------------------------------------------------------------------------------
 Index Scan using customer_last_name_first_name_idx on customer  (cost=0.28..8.29 rows=1 width=70)
   Index Cond: (((last_name)::text = 'A'::text) AND ((first_name)::text = 'B'::text))
(2 rows)

Here, the PostgreSQL optimizer chooses to use the index when both last_name and first_name in the WHERE condition .

Use the EXPLAIN statement to see if the following statement uses an index:

EXPLAIN
SELECT * FROM customer
WHERE last_name = 'A';
                                            QUERY PLAN
---------------------------------------------------------------------------------------------------
 Index Scan using customer_last_name_first_name_idx on customer  (cost=0.28..8.29 rows=1 width=70)
   Index Cond: ((last_name)::text = 'A'::text)
(2 rows)

Here, the PostgreSQL optimizer chooses to use the index when only last_name in the WHERE condition. This is because last_name column is the first column in the index.

Use the EXPLAIN statement to see if the following statement uses an index:

EXPLAIN
SELECT * FROM customer
WHERE first_name = 'B';
                        QUERY PLAN
----------------------------------------------------------
 Seq Scan on customer  (cost=0.00..15.49 rows=1 width=70)
   Filter: ((first_name)::text = 'B'::text)
(2 rows)

Here, the PostgreSQL optimizer does not choose to use the index when only first_name in the WHERE condition. This is because first_name column is not the first column of the index.

Conclusion

This article discusses PostgreSQL multicolumn indexes and how the order of multicolumn indexes affects the PostgreSQL optimizer.

When you define a multicolumn index, you should always consider the business context to determine which columns are frequently used for lookups, and place those columns at the beginning of the column list when defining the index.