.. meta::
   :description: Maintenance operations for the PostgreSQL High Availability cluster with Micetro by Men&Mice
   :keywords: high availability, database maintenance, failover, PostgreSQL, Micetro

.. _psql-ha-maintenance:

PostgreSQL HA operations
------------------------

Triggering a failover
^^^^^^^^^^^^^^^^^^^^^

To call the function successfully, you need to figure out the formation and group of the group where the failover happens.
The following commands when run on a ``pg_auto_failover`` keeper node provide for the necessary information:

Machine: node-1
"""""""""""""""

Get the variables [formation] and [group] from these commands:

.. code-block:: bash

  su - postgres
  export PATH="$PATH:/usr/pgsql-12/bin"
  # [formation] -> the default value is 'default'
  pg_autoctl config get pg_autoctl.formation --pgdata ./[node-1]
  # [group] -> the default value is '0'
  pg_autoctl config get pg_autoctl.group --pgdata ./[node-1]

Machine: monitor
""""""""""""""""

.. code-block:: bash

  sudo su - postgres
  export PATH="$PATH:/usr/pgsql-12/bin"
  psql -p [port] -d pg_auto_failover
  > select pgautofailover.perform_failover(formation_id => '[formation]', group_id => [group]);

Implementing a controlled switchover
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

It is generally useful to distinguish a controlled switchover from a failover. In a controlled switchover situation it is possible to organize the sequence of events in a way to avoid data loss and lower downtime to a minimum.
In the case of ``pg_auto_failover``, because we use synchronous replication, we don’t face data loss risks when triggering a manual failover. Moreover, our monitor knows the current primary health at the time when the failover is triggered and drives the failover accordingly.
So to trigger a controlled switchover with ``pg_auto_failover`` you can use the same API as for a manual failover above.

Maintenance of a secondary node
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

It is possible to put a secondary node in any group in a MAINTENANCE state so that the Postgres server is not doing synchronous replication anymore and can be taken down for maintenance purposes, such as security kernel upgrades or the like.

Machine: node-1 | node-2
""""""""""""""""""""""""

To enable maintenance we use:

.. code-block:: bash

  pg_autoctl enable maintenance --pgdata ./[node-1 | node-2]

When a standby node is in maintenance, the monitor sets the primary node replication to WAIT_PRIMARY: in this role, the PostgreSQL streaming replication is now asynchronous and the standby PostgreSQL server may be stopped, rebooted, etc.

.. note::
  ``pg_auto_failover`` does not provide support for primary server maintenance.

To disable maintenance we use

.. code-block:: bash

  pg_autoctl disable maintenance --pgdata ./[node-1 | node-2]

Show current state and events
"""""""""""""""""""""""""""""

.. code-block:: bash

  $ pg_autoctl show state --pgdata [monitor | node-1 | node-2]
  $ pg_autoctl show events --pgdata [monitor | node-1 | node-2]

Monitoring ``pg_auto_failover`` in production
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The monitor reports every state change decision to a LISTEN/NOTIFY channel named state. PostgreSQL logs on the monitor are also stored in a table, ``pgautofailover.event``, and broadcast by NOTIFY in the channel log.

Machine: monitor
""""""""""""""""

.. code-block:: bash

  sudo su - postgres
  tail -f ./[monitor]/pg_log/postgresql-[WeekDay].log