I haven’t worked out why yet, but this seems to be a common theme - the PHP/FastCGI service dies periodically, which causes outages with my blog (Nginx does not like it if the back end goes away). So, I need a solution to fix this. Enter Nagios!
Nagios is able to have customized event handlers. Those event handlers can be set up to perform any action you want - such as restarting a service. So, we’ll use Nagios to restart the service every time it dies.
First, create a script in /usr/local/lib64/nagios/plugins/eventhandlers/restart-fastcgi ;
#!/bin/sh
#
# Restarts the php-fpm FastCGI service if it dies
#
# restart-fastcgi $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$ $HOSTADDRESS$
case "$1" in
OK)
;;
WARNING)
;;
UNKNOWN)
;;
CRITICAL)
case "$2" in
SOFT)
case "$3" in
3)
echo -n "Starting Fast-CGI service (3rd soft critical state)..."
sudo /sbin/service php-fpm start | /bin/mail -s "[blog.zencoffee.org] FastCGI Restarted" root
;;
esac
;;
HARD)
echo -n "Starting Fast-CGI service ..."
sudo /sbin/service php-fpm start | /bin/mail -s "[blog.zencoffee.org] FastCGI Restarted" root
;;
esac
;;
esac
exit 0
Ok, now we’ll need to configure sudoers to allow the nagios user to run ‘service start php-fpm’ without credentials. Add this to your sudoers with visudo;
Defaults:nagios !requiretty,visiblepw Cmnd_Alias NAGIOS_START_PHPFPM = /sbin/service php-fpm start nagios ALL=(root) NOPASSWD: NAGIOS_START_PHPFPM
Now, we’ll test that we can actually do it. As root, do this;
su - nagios /usr/local/lib64/nagios/plugins/eventhandlers/restart-fastcgi CRITICAL SOFT 3 127.0.0.1
You should then get an email sent to root saying it’s starting the service. Obviously it won’t actually DO it (it’s already running). Check in your /var/log/secure that the sudo command worked. If so, great! Now we need to set up Nagios itself to do the restart.
First, we’ll define a command to do the restart (note, I use $USER8$ to point to the local event handlers folder);
define command{
command_name restart-fastcgi
command_line $USER8$/restart-fastcgi $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$ $HOSTADDRESS$
}
Then we’ll add that event handler to the service check we already have in place for checking our FastCGI service;
define service{
use generic-service
host_name yourhostnamehere
service_description PHP-FPM Service
max_check_attempts 4
event_handler restart-fastcgi
flap_detection_enabled 0
check_command check_local_procs!0:!1:!RSDT -C php-fpm
}
After that, everything should work. Don’t forget to restart Nagios. Specifically, you want max_check_attempts to be at least one more than the limit you set in the script, since on the third SOFT failure it will try a restart - you probably don’t want Nagios yelling at you about a critical error (and going to a HARD state) before it’s tried a restart. Then again, you might. Change it as you want.
Now, we can be brave and manually stop the php-fpm service and watch Nagios to see if it restarts. It should, after a few minutes. You can tune the script above to make it do the restart faster (on the first soft fail if you want) if you want.
Good luck!
