HA-Lizard ha-cfg help (q to exit) ha-status Displays HA ststus as "true" or "false" ha-enable Enables HA ha-disable Disables HA show-alerts List HA-Lizard alert messages clear-alerts Clears HA-Lizard alert messages remove The “remove” option will erase all global parameters from the XAPI database. This will essentially “clean” the database and should be performed only when removing HA from a pool manually. The provided uninstall script will execute this action by default. log The “log” action will invoke a simple logging utility that displays system logs written by HA. It is a convenient way to watch system behavior and useful in troubleshooting. get The “get” action lists all of the global parameters and their values. This is useful for checking the current configuration and also for retrieving the correct syntax of a parameter to be used in a “set” command set The “set” action is used modify the value of any global parameter. Set requires additional command line arguments as follows: “ha-cfg set ” Where is the name of the parameter to be updated (correct syntax of the parameter can be retrieved by invoking “ha-cfg get”) and is the new value for the given parameter. Check the configuration file documentation for a list of supported values. get-vm-ha The “get-vm-ha” action will list all VMs within a pool and display whether HA is enabled for each VM. Additional information is provided depending on the pool setting context. This tool is useful for quickly determining the HA status of any VM. set-vm-ha The “set-vm-ha” action is used to enable or disable HA for the specified VM name-label. The passed in name-label must exactly match the name of the VM. Value must be set to true or false. Important: If the vm name-label has spaces, it must be wrapped in single or double quotes as follows: ha-cfg set-vm-ha “VM name with spaces” true status The “status” action displays whether HA is enabled or disabled for the pool and allows for the toggling of the status. This should be used to enable/disable HA for the pool as shown below. backup Usage The backup command will create a backup file of the pool configuration settings for ha-lizard. A filename must be specified to declare the filename and path for the backup. restore Usage The restore command will restore a previously backed up configuration file. A filename must be specified. The restore command need only be run on a single host in the pool (any host regardless of master/slave status) Settings are restored and applied to all hosts within the pool. When restoring new values for the monitor timers and monitor triggers, a restart of the ha-lizard service is necessary for a host to read in the new settings. The service is automatically restarted on the host where the restore command is called. It is advised that the service be restarted manually on any other hosts within the pool with “service ha-lizard restart”. restore-default Usage Restores default configuration parameters to pool database. email Usage The email command will sent a test email alert using the configured email settings. This is useful to for testing the configured values to ensure that email alerting is working properly. Sample output is displayed below. email_debug Usage The email_debug command is identical to the email command except that it will produce verbose SMTP log and output it to stdout. This is useful when troubleshooting SMTP problems. ######################## # Global Configuration # ######################## Global configuration parameters are stored in the XAPI database shared by all hosts within a pool. A command-line tool, “ha-cfg”, is provided for making changes to any of the global parameters. To view the current configuration, use “ha-cfg get”. Any global configuration parameter can be updated using the “ha-cfg set” command. For example, to disable logging globally, use “ha-cfg set enable_logging 0” ############################ # Configuration Parameters # ############################ HA Monitor Configuration The HA service is run by a monitoring service which runs continuously. The following Monitor settings are used to configure the behavior of the monitor. The provided installer installs and configures the Monitor with default settings that are acceptable in most cases. Once installed, the Monitor will run continuously as a service. Status can be checked with “service ha-lizard status”. As of version 1.7.7 the default parameter installed will be optimized for a 2-node pool with fast detection and switching of roles (failure detection in < 15 seconds). Some values may need to be changed depending on the environment. Host performance (how long it takes to iterate through HA processes launched by the Monitor) and the size of the pool should be considered when setting these parameters. The Monitor will launch several HA processes in a loop every 15 seconds (MONITOR_DELAY). The default setting of 15 seconds is intended for compact pools of 2-nodes. This value should be increased for larger pools. Generally 45 seconds is adequate for pools of 3-8 hosts. It is a good idea to watch the system logs to ensure that all the HA processes have finished running within the MONITOR_DELAY window. By increasing MONITOR_DELAY, it will take longer to detect a failure. Decreasing MONITOR_DELAY will more quickly detect failures and recover. System performance and preferences should be considered carefully. monitor_max_starts Threshold for when to assume running processes are not responding. Sets how many failed starts to wait before killing any hung processes. Use caution with this setting. It should be set relatively high as a pool recovery procedure will take considerably more time to execute causing the system to trigger logic which attempts to detect hung processes. Processes will generally never hang unless XAPI becomes totally unresponsive. Logic is provided to abort attempts to connect to XAPI during a failure. In this case, local state files with pool state information from the last successful iteration will be used instead. Default = 20 monitor_killall If MAX_MONITOR_STARTS threshold is reached - set whether to kill all ha-lizard processes. Default = 1 1 = yes, 0 = no monitor_delay Delay in seconds between re-spawning ha-lizard. This should be adjusted to the environment. Large pools require more time for each run of ha-lizard. Default = 15 monitor_scanrate ha-lizard will not re-spawn unless all current processes are completed. If there are active processes while attempting to start a new iteration, ha-lizard will wait the number of seconds set here before retrying. Each successive fail will increment a counter (MONITOR_MAX_STARTS) that may trigger KILLALL. Default = 10 xc_field_name Field name used to enable / disable HA for pool. This can also be set within XenCenter management console to enable/disable HA within the pool. To make this field visible within XenCenter, create a custom pool level field with the name set here. The configuration contents will then be visible and alterable within XenCenter. See section on configuring XenCenter for more details. op_mode Set the operating mode for ha-lizard. 1 = manage appliances, 2 = manage virtual machines Default value = 2 Mode 1 uses logic to manage appliances within the pool. By default, all appliances within a pool are managed by HA without requiring any additional settings. This is useful for small pools where most HA management functions can be handled from within XenCenter. Simply add VMs to appliances and those VMs are automatically managed. Any VM that is not part of an appliance is not managed by HA. This technique greatly simplifies HA management as each individual VM does not require any special settings. If some appliances should not be managed by HA, simply add the UUID of the appliances to be skipped in the DISABLED_VAPPS setting. Managing appliances also provides a convenient way to configure startup order and delays when recovering VMs. This can be done as part of the standard appliance configuration settings. Mode 2 provides greater flexibility by providing a mechanism for more granular control over which pool VMs are managed. Appliances are not managed in this mode. By default (when GLOBAL_VM_HA is set to 1), all VMs are automatically managed by HA. If GLOBAL_VM_HA is set to 0, then each VM within the pool must have HA explicitly enabled or disabled. This can be set from the pool GUI with access to custom configuration parameters or via the command line tool ha-cfg. global_vm_ha Set whether to individually enable HA on each VM within the pool (when OP_MODE = 2) 0 = You must individually set ha-lizard-enabled to true/false for each VM within the pool 1 = ALL VMs have HA enabled regardless of setting in GUI/CLI Default value = 1 enable_logging Enable Logging 1=yes, 0=no. Logs are written to /var/log/messages. All log messages are labeled with "ha-lizard" for easy filtering. View/Filter real time logging with: “tail -f /var/log/messages | grep ha-lizard” enable_alerts Enable alerts 1=yes, 0=no. Alerts are set systemwide and can be viewed in xencenter or via CLI. Alert types follow email alerts. disabled_vapps Specify UUID(s) of vapps that do not get automatically started by ha-lizard when OP_MODE=1 (manage appliances) Array is ":" delimited like this (UUID1:UUID2:UUID3) Leave blank if ALL vapps are managed by ha-lizard "DISABLED_VAPPS=()" Only applied when OP_MODE=1 xe_timeout Set the maximum wait time for calls to the xe toolstack to respond. If xe does not respond within XE_TIMEOUT seconds, the xe PID will be killed. This is done to prevent xe calls from hanging in the event of a Master host failure. In the event of a Master failure, xe may hang on requests from pool slaves. This timeout will ensure that fencing the of master host is not prevented. Default setting is 5 seconds which is ample time for a well running pool. Poor performing hosts may need to set this value higher to prevent unintended process terminations during normal operation. xapi_count If a pool member cannot reach a pool peer, XAPI_COUNT is the number of retry attempts when a host failure is detected. If the unresponsive host is recovered before the XAPI_COUNT is reached, attempts to fence and remove the failed host will be ignored. Default XAPI_COUNT = 2 xapi_delay If a pool member cannot reach a pool peer, XAPI_DELAY is the number of seconds to wait in between XAPI_COUNT attempts to contact the unresponsive host. DEFAULT XAPI_DELAY = 10 seconds slave_ha If the Pool Master cannot be reached and all attempts to reach it have been exhausted, set whether the autoselected slave will try to start appliances and/or VMs. (PROMOTE_SLAVE must also be set to 1 for this to work) promote_slave If the pool master cannot be reached - set whether slave should be promoted to pool master (this only affects a single slave: the "autoselect" winner chosen by the former master to recover the pool). More on this topic is available in the Cluster Management section. slave_vm_stat By default, only the pool master will check the status of all VMs managed by this script and attempt to start a VM that is not in the running state. Setting SLAVE_VM_STAT to 1 will cause any pool slaves to also check all VM statuses and attempt to start any VM not in the running state. Default = 0 In a large pool many hosts may attempt to start the same VM the first host to attempt will succeed, others will be safely declined. Enabling may create many unnecessary duplicate processes in the pool. Email Alert Settings - mail_on: Enable/Disable email alerts. 1 = enabled 0 = disabled - mail_subject: Subject Line of email alert - mail_from: The FROM email address used on email alerts - mail_to: the email address to send alerts to - smtp_server: Declare the SMTP server to use for sending email alerts - smtp_port: Declare the port number for the SMTP server - smtp_user: If the SMTP server requires login, set the username - smtp_pass: If the SMTP server requires login with a password, set the password Important: Do not use spaces when setting via the CLI tool. If spaces are required in the subject line, use the override configuration file instead and wrap the string in double quotes like this. “Mail Subject” Fencing Configuration Settings Currently Supported FENCE_METHOD = ILO, XVM, POOL, IRMC ILO is the HP Integrated Light Out management interface XVM is intended for test environments with nested xen instances where pool domos are domus within a single xen host. POOL does not fence failed hosts. It simply allows the forceful removal of a failed host from a pool. IRMC is a Fujitsu BMC similar to HP ILO The name of any custom fencing agent can also be named here. See the fencing section for details on using a custom fencing script. fence_enabled Enable/Disable Fencing 1=enabled 0=disabled fence_file_loc Location to store and look for fencing scripts fence_ha_onfail Select whether to attempt starting of failed host’s VMs on another host if fencing fails 1=enabled 0=disabled fence_method ILO, XVM and POOL supported. "FENCE_METHOD=ILO" - custom fencing scripts can be added and called here fence_passwd Password for fence device (only if required by the fence device) fence_action Supported actions = start, stop, reset fence_reboot_lone_host If Master host cannot see any other pool members (in pools with 3 or more hosts), choose whether to reboot before attempting to fence peers. This may not matter if quorum is used (see FENCE_QUORUM_REQUIRED) 1=enabled 0=disabled *** IMPORTANT *** As of version 1.8.9 - this feature is now supported on pools with 2 hosts. Enabling this feature will cause the Master to self fence when a failure is detected and quorum is not achieved. fence_ipaddress Only used for static fencing device - currently XVM host supported. Can be used for custom fencing scripts that call a static IP such as a power strip. fence_host_forget Will be Deprecated in future release – only use if you know what you are doing. As of version 1.6.41 – this is no longer necessary. Setting this to 0 is recommended. Select whether to forget a fenced host (permanently remove from pool) 1=enabled 0=disabled. fence_min_hosts Do not allow fencing when fewer than this number of hosts are remaining in the pool. Default value = 2 which is optimized for a 2-node pool. fence_quorum_required Select whether pool remaining hosts should have quorum before allowing fencing. 1=enabled 0=disabled fence_use_ip_heuristics Select whether to poll additional IP endpoints (other than the pool hosts) and possibly create an additional vote used to determine quorum for the pool. This can be helpful in a 2 node pool where quorum cannot otherwise be achieved. 1=enabled 0=disabled fence_heuristics_ips Create a list of IP addresses or domain names to check. The list should be delimited by “:” As of version 1.7.7 the default value is set to 8.8.8.8 which is a well-known address. However, live production systems should NOT use the default address. This address should ideally be set to a local IP on the management network which is very close to the management interface of the hosts. Typically this would be the IP address of the switch the hosts are connected to. If more than one IP address is specified then all of the addresses must be successfully reached with ping in order to count as a single vote. mgt_link_loss_tolerance The number of seconds before all VMs are forcefully shutdown on a master that has lost its management link for MGT_LINK_LOSS_TOLERANCE seconds. Default=5 seconds disk_monitor Select whether to enable hourly disk health monitoring which triggers an alert on any non-normal SMART flags. Disk or RAID controller must support SMART for this feature to work. Defualt value = 1 (enabled) enable alerts Enable alerts 1=yes 0=no. Sets pool-wide alerts which can be viewed or managed via management tools (such as xencenter) or CLI. Alerts mirror email alerts