BOSH monitors deployed VMs and release jobs’ processes on those VMs via the Health Monitor and the help of the Agent, and Monit.
The Health Monitor is extended by a set of plugins. Each plugin is given an opportunity to act on each heartbeat, so in cases of failure it can notify external services or perform actions against the Director.
Health Monitor includes the following plugins:
- Event Logger: Logs events to a file
- Resurrector: Recreates VMs that have stopped heartbeating
- Emailer: Sends configurable e-mails on events receipt
- OpenTSDB: Sends events to OpenTSDB
- Graphite: Sends events to Graphite
- PagerDuty: Sends events to PagerDuty.com using their API
- DataDog: Sends events to DataDog.com using their API
- AWS CloudWatch: Sends events to Amazon’s CloudWatch using their API
See Configuring Health Monitor for detailed plugins’ configuration.
Resurrector plugin continuously cross-references VMs expected to be running against the VMs that are sending heartbeats. When resurrector does not receive heartbeats for a VM for a certain period of time, it will kick off a task on the Director to try to “resurrect” that VM.
See Automatic repair with Resurrector for details.
Release jobs’ process monitoring on each VM is done with the help of the Monit. Monit continuously monitors presence of the configured release jobs’ processes and restarts processes that are not found. Process restarts, failures, etc. are reported to the Agent which in turn reports them as alerts to the Health Monitor. Each Health Monitor plugin is given an opportunity to act on each alert.
The Agent on each VM sends an alert when someone/something tries to log into the system via SSH. Successful and failed attempts are recorded.
The Director sends an alert when a deployment starts, successfully completes or errors.