Obtaining PostgreSQL cluster logs¶

The PostgreSQL cluster's operating logs are all located in /var/log/syslog on the various nodes. Particular attention should be paid to logs from patroni and etcd. These logs can be displayed continuously using the following command (to be executed as root):

1	`tail -f /var/log/syslog`

You can also display the current logs for these two services in more detail using the commands:

journalctl -fu patroni
journalctl -fu etcd

The patroni service will regularly indicate the current status of the node and specify whether it is LEADER or who is the LEADER.
As for etcd, it will be able to provide information on the status of the cluster.

Example

Here is an example of a log from a leader node:

patroni[1028]: 2023-02-24 17:08:41,198 INFO: no action. I am (PSQL_1), the leader with the lock

Here is an example of a secondary node log:

patroni[753]: 2023-02-24 17:12:51,226 INFO: no action. I am (PSQL_2), a secondary, and following a leader (PSQL_1)

When a node fails, we obtain the following information from etcd:

etcd[691]: lost the TCP streaming connection with peer 2ec65eb059382a64 (stream MsgApp v2 reader)
etcd[691]: lost the TCP streaming connection with peer 2ec65eb059382a64 (stream Message reader)
etcd[691]: failed to dial 2ec65eb059382a64 on stream MsgApp v2 (dial tcp 10.68.250.1:2380: connect: connection refused)
etcd[691]: peer 2ec65eb059382a64 became inactive (message send to peer failed)
etcd[691]: lost the TCP streaming connection with peer 2ec65eb059382a64 (stream MsgApp v2 writer)
etcd[691]: lost the TCP streaming connection with peer 2ec65eb059382a64 (stream Message writer)
etcd[691]: health check for peer 2ec65eb059382a64 could not connect: dial tcp 10.68.250.1:2380: i/o timeout (prober "ROUND_TRIPPER_SNAPSHOT")
etcd[691]: health check for peer 2ec65eb059382a64 could not connect: dial tcp 10.68.250.1:2380: i/o timeout (prober "ROUND_TRIPPER_RAFT_MESSAGE")
etcd[691]: health check for peer 2ec65eb059382a64 could not connect: dial tcp 10.68.250.1:2380: i/o timeout (prober "ROUND_TRIPPER_SNAPSHOT")
etcd[691]: health check for peer 2ec65eb059382a64 could not connect: dial tcp 10.68.250.1:2380: i/o timeout (prober "ROUND_TRIPPER_RAFT_MESSAGE")

The latest logs, those highlighted, loop until the failed node is restored.

The patroni service will also log other information:

patroni[753]: 2023-02-24 17:14:41,221 INFO: no action. I am (PSQL_2), a secondary, and following a leader (PSQL_1)
patroni[753]: 2023-02-24 17:15:13,540 WARNING: Request failed to PSQL-CLUSTER1: GET https://10.68.250.1:8008/patroni (HTTPSConnectionPool(host='10.68.250.1', port=8008): Max retries exceeded with url: /patroni (Caused by ConnectTimeoutError(<urllib3.connection.VerifiedHTTPSConnection object at 0x7fa4e4762978>, 'Connection to 10.68.250.1 timed out. (connect timeout=2)')))
patroni[753]: 2023-02-24 17:15:13,646 INFO: Could not take out TTL lock
patroni[753]: 2023-02-24 17:15:13,654 INFO: following new leader after trying and failing to obtain lock
patroni[753]: 2023-02-24 17:15:13,794 INFO: closed patroni connection to the postgresql cluster
patroni[753]: 2023-02-24 17:15:13,987 INFO: postmaster pid=11704
patroni[753]: 2023-02-24 17:15:13.990 CET [11704] LOG: en écoute sur IPv4, adresse « 0.0.0.0 », port 5432
patroni[753]: 2023-02-24 17:15:13.990 CET [11704] LOG: en écoute sur IPv6, adresse « :: », port 5432
patroni[753]: /var/run/postgresql/:5432 - pas de réponse
patroni[753]: 2023-02-24 17:15:13.995 CET [11704] LOG: écoute sur la socket Unix « /var/run/postgresql/.s.PGSQL.5432 »
patroni[753]: 2023-02-24 17:15:14.006 CET [11704] LOG: redirection des traces vers le processus de récupération des traces
patroni[753]: 2023-02-24 17:15:14.006 CET [11704] ASTUCE : Les prochaines traces apparaîtront dans le répertoire « /var/log/postgresql ».
patroni[753]: /var/run/postgresql/:5432 - acceptation des connexions
patroni[753]: 2023-02-24 17:15:15,033 INFO: Lock owner: PSQL_3; I am PSQL_2
patroni[753]: 2023-02-24 17:15:15,033 INFO: establishing a new patroni connection to the postgres cluster
patroni[753]: 2023-02-24 17:15:15,036 INFO: Local timeline=1 lsn=0/402EE78
patroni[753]: 2023-02-24 17:15:15,066 INFO: master_timeline=2
patroni[753]: 2023-02-24 17:15:15,067 INFO: master: history=1 0/402EE78 no recovery target specified
patroni[753]: 2023-02-24 17:15:15,077 INFO: no action. I am (PSQL_2), a secondary, and following a leader (PSQL_3)

In this example, line 1 indicates that PSQL_2 is a REPLICA of the LEADER PSQL_1. Line 2 is an error message due to the loss of connectivity with the LEADER node, and the last line specifies that PSQL_2 is still a REPLICA, but this time of the LEADER server PSQL_3.