You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Princeton/failover
Eric Loyd 63ae56f867 Testing for 2.1.7 4 years ago
..
local/usr/local/nagiosxi/html/includes/components/custom-includes/css Somewhere, we switched from using header-gradient.css to just header.css so deleting the -gradient versions 4 years ago
Makefile Updated ZIP file 4 years ago
README.md Slight tweak to readme and now failover works without having a valid sync file (with a manual override) 4 years ago
START Correct hostnames 5 years ago
STOP Added STOP and SYNC for easy one line access 5 years ago
SYNC Added STOP and SYNC for easy one line access 5 years ago
colors.sh Logic implmeented to delete old sync files as well as only run on secondary if primary is not available (this avoids overwriting running configs) 5 years ago
failover.sh Slight tweak to readme and now failover works without having a valid sync file (with a manual override) 4 years ago
nagios_startstop.sh new version for 2021 5 years ago
release-2.1.13.zip Testing for 2.1.7 4 years ago
rsync_xi.sh failover.sh can now request that rsync continue without syncing, just to activate Nagios 4 years ago

README.md

Failover for Princeton University

Failover from primary Nagios XI to secondary Nagios XI is a Disaster Recovery effort aimed at providing a near-time up-to-date secondary Nagios server that can take over monitoring and notification options should the primary be unavailable. The primary will always monitor if it is capable. The secondary will only monitor when manually enabled and must also be manually disabled when the primary returns to active service.

Prerequisites

  • Nagios XI must be installed on both boxes with the same version and underlying directory configuration. If there are any differences in file locations or major configuration between the two boxes, the Nagios failover will have unpredictable results, including complete system failure.

  • The syncing process will delete files that it does not believe should be on the secondary, so all work must be performed on the primary. Any work performed on the secondary will be overwritten when the next synchronization process occurs. Note that this includes SSH keys, as /home/nagios will be synced from the parimary to the secondary.

  • /home/nagios/bin exists and contains the files needed for this process. Note that the sync process will sync these from the primary to the secondary, so like all other files, they must only be modified on the primary.

    • /home/nagios/bin/failover.sh
    • /home/nagios/bin/nagios_startstop.sh
    • /home/nagios/bin/rsync_xi.sh
  • The root user has the ability to SSH from the primary to the secondary as the nagios user without entering a passphrase. This is how the rsync and database copies are performed

  • root on primary (and root on secondary) has crontab requirements that will be detailed separately.

  • nagios on both primary and secondary needs to be able ot sudo to root without a password and execute the rsync command:

    NAGIOSXI ALL = NOPASSWD:/usr/bin/rsync *

  • nagios user on primary needs to be able to SSH to nagios on secondary without a passphrase (thus, an SSH key and .ssh directory needs to be set up)

  • Make sure any ramdisk (such as /ramdisk) is copied if it exists

  • Note that any gearman addons such as /etc/mod_gearman or whatever are NOT copied as part of this procedure. These types of things need to be set up on both boxes the same way before this process is set up.