How to use pgTAP to automate unit testing in postgres

Vishi Cline

January 23, 2024|7 min read

Benefits of automated database testing

Are you an agile developer who does thorough testing to make sure your application is working as expected? Do you automate this process so it becomes iterative and repeatable, such that no changes in your application break other areas of your code? Capital One standards are high when it comes to testing. Deployments to production require that all the levels of testing are completed and pass in your lower environments. Linting, whitesource testing, unit testing, integration testing - all these various aspects of the application are tested.

Database testing is a crucial aspect to test the overall functionality of the application but is often either done manually or overlooked. This can result in mismanagement of code versions and the output results being inconsistent. pgTAP allows you to test everything from the structure of your schema, but also the logic in views, procedures, functions, rules and triggers.

While the application’s unit-testing framework can be used to test the database, the code can quickly get complicated and frankly, unreadable at times. pgTAP makes setting up the tests fairly simple. There is no need to set up the database connection, fetch the data and convert it like you would have to if you used the application unit-testing framework. You can use SQL and segregate the application code from database testing. pgTAP has a collection of assertion functions that can be used for testing all the database objects.

How to automate your unit tests in five steps

The test scripts can be added as standalone sql scripts or as xUnit Test Functions. They can then further be automated using pg_prove, which is a command-line application.

Sample pgTAP test code

Let’s go through some sample code to see the actual test case creation in action.

Step 1: Install pgtap and create the pgTAP extension

In order to install pg_tap you would need to follow below steps:

First clone the repositor
- git clone https://github.com/theory/pgtap.git
Navigate to the repository and install pg_tap
Connect to postgres database and run the following to create pgTAP extension

    CREATE EXTENSION pgtap;

Step 2: Create schema specific to unit testing

    CREATE SCHEMA unit_testing;

Step 3: Create test functions/scripts

The goal of this function is to ensure that the table in question exists in this environment and then ensure that the number of columns matches what is expected. The number of columns can go out of sync if someone makes a change manually or forgets to execute a script across different environments, sending the environments out of sync. Lastly, it checks the number of non-nullable columns. You can also check whether specific columns are nullable vs non-nullable. This is to ensure that the constraints are applied to the columns based on business requirements.

    CREATE OR REPLACE FUNCTION unit_testing.test_table(schemaName character varying, tableName character varying, expectedColcnt int, expectedNotNullColcnt int)
RETURNS SETOF TEXT AS $$
declare schemaTable character varying;
BEGIN

schemaTable=(schemaName || '.' || tableName);

– Check if the table exists

    RETURN NEXT has_table(schemaName, tableName, 'table ' || schemaTable ||' exists' );

– Check if the number of columns in the table match the expected number of columns

    RETURN NEXT is((select count(column_name)::int
		from information_schema.columns t
		where table_name=tableName
		and table_schema=schemaName),
		expectedColcnt,
		'Number of columns match expected number of columns'
		);

– Check if the number on non-nullable columns matches the expected

    RETURN NEXT is((select sum(case when is_nullable='NO' then 1 else 0 end)::int
		from information_schema.columns t
		where table_name=tableName
		and table_schema=schemaName),
		expectedNotNullColcnt,
		'Number of Non-nullable columns match expected number of columns'
		);
END;
$$ LANGUAGE plpgsql;

This function will check if the columns contained in the primary key for a given table matches what is expected. If a new column is introduced and it needs to be added to the primary key for the table, this would ensure that the column is in fact added and not overlooked.

    CREATE OR REPLACE FUNCTION unit_testing.test_table_PK(schemaName character varying, tableName character varying, PKcolumn character varying)
RETURNS SETOF TEXT AS $$
declare schemaTable character varying;
BEGIN
schemaTable=(schemaName || '.' || tableName);

– Check the columns contained in the primary key for a given table

    RETURN NEXT is((select jsonb_object_agg(attname,data_type)::jsonb @> PKcolumn::jsonb
		From (
			SELECT a.attname , format_type(a.atttypid, a.atttypmod) AS data_type
			FROM pg_index i
			JOIN pg_attribute a ON a.attrelid = i.indrelid
			AND a.attnum = ANY(i.indkey)
			WHERE i.indrelid =schemaTable::regclass
			and i.indisprimary
			)x), true, 'columns are part of primary key');
END;
$$ LANGUAGE plpgsql;

Setting up mock inserts

Below code sets up mock inserts into your table, checks that the count of inserts matches the expected count of inserts and then calls a user defined function against this newly inserted value to ensure that the actual output matches the expected output.

The reason why we are inserting the row instead of using existing data is because data is subject to change and this test is not to ensure whether the data exists or not, but to check the functionality of the insert and the return value of the function. You can also insert multiple rows using a loop, if you want to test multiple rows insert scenario.

    CREATE OR REPLACE FUNCTION unit_testing.setup_test_insert_into_table
(expectedReturnCount int, 
start_ts timestamp default current_timestamp, 
 result character varying default 'success', 
 http_stat_code integer default 200, 
 api_key character varying default ‘TEST’, 
 name character varying default 'TEST.EVENTNAME',
 Id_test int) 
RETURNS SETOF TEXT AS $$
BEGIN

– Insert a mock record into the table

    insert into test_table
(start_timestamp ,
id,
 event_name ,
 event_result ,
 http_status_code,
 api_key) 
 values
(start_ts::timestamp with time zone ,
id_test::integer,
name::character varying ,
result::character varying ,  
http_stat_code::integer,
api_key::character varying,
);

– Confirm that the count matches and the insert actually happened

    RETURN NEXT is(count(*)::int, expectedReturnCount, 'Should have inserted ' || expectedReturnCount || ' rows') FROM test_table
	WHERE id=id_test;

END;
$$ LANGUAGE plpgsql;

Once the data is inserted, we know the logic and what should be returned based on the underlying function definition. For example, perhaps certain filters are applied in a function that, even if we insert one row, will return zero as output. This is to check the functionality of the function from end to end and insert any edge cases which, without being tested, would give unexpected results. The inserted rows are rolled back in the end with the pgTAP extension.

    CREATE OR REPLACE FUNCTION unit_testing.setup_test_id_frequency( expectedReturnCount int, id_test int)
RETURNS SETOF TEXT AS $$
BEGIN

– Ensure that the actual output matches the expected output

    RETURN NEXT is( get_id_frequency( id), '('||expectedReturnCount||')', 'Table insert count and udf counts match: ' || expectedReturnCount );
END;
$$ LANGUAGE plpgsql;

Step 4: Create the testplan

Now you can create a test for each one of the functions above. Below I am explaining how to write the test for just one of the test functions. Similarly you can create a test for each one. Create a file called test_udf.sql for example and copy the following into the file:

    BEGIN;

– Declare the number of tests

    SELECT plan(2);

– Create test data in test_table and ensure insert completes

    select unit_testing.setup_test_insert_into_table(1::int, current_timestamp::timestamp, 
 'success'::character varying, 403::integer, ‘TEST’::character varying, 'TEST.EVENTNAME'::character varying, 123::integer);

– Check udf output and ensure the actual and expected output match

    select unit_testing.setup_test_insert_into_table(1::int, current_timestamp::timestamp, 
 'success'::character varying, 403::integer, ‘TEST’::character varying, 'TEST.EVENTNAME'::character varying, 123::integer);

– Check udf output and ensure the actual and expected output match

    select unit_testing.setup_test_id_frequency(1,123);

– Conclude the test and discard changes

    SELECT * FROM finish();
ROLLBACK;

Step 5: Use runtst as an alternative way to execute

After all the test cases have been created, and you have installed them in your database, you can execute them using runtests() as well, by providing the name of the test:

    Select * from runtests(‘unit_testing’::name);

Runtests() will take care of handling the rollbacks for any mock inserts created for testing and can be used to setup and tear down the functions as well.

File structure organization using TAP convention

In order to name and organize the test files within a project, the Test Anything Protocol (TAP) convention is used. Numeric prefixes are added in front of the test files to dictate the order which the files should be run in. In other words, if there is a test that is dependent on the previous test, this sequencing of files is important. There is typically a directory called “t”, which stands for test, under the root directory and all the test files are contained within this directory. Please see below for a sample setup.

    my_project/
│
├─ src/
│   ├── function1.sql
│   ├── function2.sql
│   └── function3.sql
│
└─ t/
    ├── 00-test.sql
    ├── 01-another-test.sql
    └── 02-yet-another-test.sql

Use pg_prove to execute test scripts

You can automate the execution of pg_tap tests using pg_prove. pg_prove is the command line tool for running pgTAP tests in bulk with a single command. All the test scripts can be written as SQL scripts and added to a directory, which is what pg_prove can use to run all the tests. You can also add these scripts as xUnit Test Functions. Please review the full description and further usage of pg_prove. To run your code with pg_prove, you first navigate to the folder where the test scripts are located, so in our case to the “t” directory, and then run the below command, replacing [database name] with your database name:

    pg_prove -d [database name] t/00-test.sql

You can include multiple tests in one file or just a single test. Based on how many tests you have in the file, you will get output as follows. Provided that 00-test.sql consisted of 5 tests, the below output tells you the total number of tests run, how many were successfully executed and how many failed, plus any diagnostic messages or errors encountered.

    t/00-test.sql .. ok
All tests successful.
Files=1, Tests=5,  0 wallclock secs ( 0.01 usr  0.00 sys +  0.02 cusr  0.00 csys =  0.03 CPU)
Result: PASS

Pg_prove can make executing large numbers of tests easier and more manageable. In addition, integrating it with the build process or continuous integration pipeline is simpler as well.

Using pgTAP at Capital One

We have tables with triggers that, when we did troubleshooting on, were not firing. pgTAP tests can make this simpler and more consistent because we can test the edge cases and figure out exactly where the problems are happening.

We can hold the publishers accountable and be proactive in identifying formatting issues. In addition to these capabilities, we can make sure that the avro file still has the same number of fields as expected from one environment to the next and avoid failures in production proactively. pgTAP allows us to be more proactive than reactive.

pgTAP wins when it comes to ensuring database validation

In summary, pg_TAP contains some very useful suite of database functions that can be used to create tests in psql scripts or xUnit-style test functions. In addition to these features demonstrated above, there are several other capabilities that can be tested with pgTAP, such as indexed columns, triggers, trigger functions etc. You can find the complete documentation on the suite of functions provided linked here.

It allows the users to write simple, readable code that is easy to understand and maintain. Pg_prove furthermore makes automation of testing with pg_TAP simpler so the tests can be made repeatable and more reliable.

Vishi Cline, Senior Data Engineer

Vishi completed her bachelor’s degree in computer science from North Carolina State University and master’s degree in data science from Southern Methodist University. She is an AWS architecture certified professional with 16+ years of experience in database development, analytics, and building ETL pipelines. At Capital One, she is responsible for leading the table design and architecture, providing optimizations, and building data pipelines that allows businesses to meet their SLAs and mitigate fraud.