Tuning Go’s MaxIdleConns connection pool setting

2020-10-01

Recently we were investigating high load on one of our Postgres database clusters. During peak hours we would nearly cap out on CPU usage for each node in this cluster. We deployed a few optimizations, which helped to varying degrees, though none of them reduced the load to an acceptable amount.

Eventually one of our engineers noticed that Postgres connections appeared to be very short-lived. The following query shows ongoing database connections:

SELECT pid, backend_start
    FROM pg_stat_activity
    WHERE datname = 'your_database_name'
    ORDER BY backend_start;

In Postgres, connections can be used to execute multiple queries over its lifetime. Ideally, queries would reuse same connection. What we saw instead were connections alive for only about 100ms on average. This indicated that connections were being aggressively terminated.

Postgres forks its process for each connection, leading to a relatively high cost associated with establishing new connections. Our hypothesis at this point was that the maximum number of idle connections configured in our application code was too low. In Go, when using database/sql, this setting is MaxIdleConns and set to 2 by default.

SetMaxIdleConns sets the maximum number of connections in the idle connection pool.

The default max idle connections is currently 2.

If the number of concurrent queries exceed MaxIdleConns, new connections will be established for the additional queries, and these connections will be torn down right after.

To validate our hypothesis we started collecting metrics exposed by the database/sql package.

From our metrics we made two primary observations:

  1. There were about 1,000 database connections being established and closed per second. This is much too high.
  2. On average there were about 4 connections in use concurrently.

Based on this data we increased the MaxIdleConns setting from the default 2 to 4, and then 4 to 6 shortly after. Immediately we observed a significant drop in the number of established connections.

Correspondingly, our database load dropped by about 30% after the setting was increased.

The load during peak hours reduced by up to 60%. A significant portion of our database load during peak hours was simply due to an increased number of connections needing to be established.

This ended up being a relatively simple fix, though it did take us a while to stumble upon the initial hypothesis.

The takeaway from this is that Go’s default MaxIdleConns value may be too low for a production environment with a high number of queries per second. For databases like Postgres that use forked processes instead of threads for persistent connections, the issue might manifest as high CPU usage.