Monitor

Monitor state diagram

Monitor state diagram

Monitor is a purely-Bash script that lets us leave the PC monitoring the result of an instruction. Unlike watch, Monitor is not about observing the output of the instruction, but to detect the moment when the instruction fails and stops failing and execute instructions whenever this happens.

Monitor was written to keep an eye on randomly failing services. The problem will sometimes last for a short while and it is not easy to take a sample of the system just right when the problem is occurring.

Besides, this kind of situations share some characteristics:

  • It is desired to know when did the failure occur and when did it raise back to know what section of the logs to browse for useful information.
  • It is desired to take action when a new failure occurs. These actions can be alert sending or snapshot taking.
  • It is desidred to be tolerant to temporary situations. For example, a simple lost ping doesn’t imply an actual lose of service.

Although this type of monitoring can be easily solved by using while under Bash, it could easily become complex and it is not fun to debug it each time the need comes up. Besides, it is also possible to make mistakes in the while logic and not notice until the next instance of the failure happens and we didn’t get the expected system snapshot.

With Monitor, the logic is already established and tested. It’s just a matter of indicate the commands that we want to monitor and the actions to execute on state change.

The code is available in Github: https://github.com/alvarezp/monitor

Example

In a hypothetical problem we lose communication with our own host and we don’t know why. We suspect that a running process is inserting rules in iptables that result in communication blockage.

Let’s user monitor.bash to keep an eye on the result of a ping to localhost and to take an iptables configuration snapshot on failure.

# monitor.bash --rest-time 1 --on-down 'iptables-save' 'ping localhost -c 1 -w 1 >/dev/null 2>&1'
       INIT 2015-11-05 15:21:58 PDT lun
         UP 2015-11-05 15:21:58 PDT lun
     ---- the weird process applies iptables -A INPUT -i lo -p icmp -j DROP ----
FALLINGDOWN 2015-11-05 15:22:13 PDT lun
# Generated by iptables-save v1.4.21 on Mon May 11 15:22:17 2015
*filter
:INPUT ACCEPT [74:70608]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [67:5781]
-A INPUT -i lo -p icmp -j DROP
COMMIT
# Completed on Mon May 11 15:22:17 2015
       DOWN 2015-11-05 15:22:17 PDT lun
     ---- the weird process applies iptables -F INPUT ----
RAISINGBACK 2015-11-05 15:22:32 PDT lun
         UP 2015-11-05 15:22:36 PDT lun

It gives the impression that iptables-save is run after getting to FALLINGDOWN, but in fact it executes just before entering DOWN.

Now we have timestamps to look for more information in our logs and some evidence to delve into the problem and its diagnostic.


Deja un comentario

Tu dirección de correo electrónico no será publicada. Los campos obligatorios están marcados con *