Raspberry Pi Temperature Monitoring with CheckMK

The Raspberry Pi running Raspian has some built-in temperature sensors.  The sensor is on the CPU die, and you can find it at;

/sys/class/thermal/thermal_zone0/temp

CheckMK supports the idea of local checks.  A local check is a simple script that runs in the agent on a host and performs whatever check processing and verification that’s required on the client end.  This means you cannot customize the warn/crit thresholds from the CheckMK host end.  But they’re easy to write.

The above simplistic script reads in the CPU temperature of the RPi, and sets a warn threshold of 90% of the throttling temperature with a critical threshold of 100% of the throttling temperature.

If you add this into;

/usr/lib/check_mk_agent/local

On your Raspian install, then manually run check_mk_agent, you’ll see in the <<<local>>> section the output from the sensor.  You can then edit the host in CheckMK and add the new service that is automatically inventoried.  I assume here that your CPU die never gets below 0 degrees (should be fairly sensible in most circumstances, I imagine).

Easy!

HWiNFO Sensors in CheckMK

I use HWiNFO64 on my Windows PCs to monitor the various temperature and fan sensors.  I wanted to get this data into CheckMK for monitoring purposes.  Here’s how I did it.

First, in HWiNFO, tag any sensors you want to monitor for the Vista gadget.  This causes HWiNFO to populate registry keys with the relevant data.  You’ll then need to make a custom plugin for CheckMK in C:\Program Files (x86)\checkmk\plugins named “hwinfo64.cmd”, containing the following;

Now, do a test on your CheckMK server, you should see the <<<hwinfo64>>> fields in your agent output for the host.  Great.  Now we need to write a check in CheckMK to interpret that data.  Make a new check ‘hwinfo64’ in /omd/sites/SITENAME/local/share/check_mk/checks, replacing SITENAME with your OMD site name;

Apologies for the terrible Python, my Python is very weak.  Also note that this assumes that all temperature-type sensors are in Celsius units, and all fan-type sensors are in RPM units.

Once that’s done, you should be able to add services to your host and the HWiNFO sensors will be automatically inventoried and show up.  They will use some default thresholds.  In order to customize those thresholds, edit etc/check_mk/main.mk in your OMD site and do something like this;

That will set the warning/crit threshold for CPU temp checks at 70/80 C, and the threshold for GPU checks at 90/100 C, on the machines ‘desktop1’ and ‘desktop2’.  Set as appropriate for your environment.

Legacy Nagios checks with CheckMK

I’ve recently started converting my old Nagios installs across to using CheckMK.  As part of this, I have a collection of old Nagios checks that I want to be able to use verbatim in CheckMK as legacy checks.  Here’s how you do that.

After you create your site using OMD, go into the site with ‘su – <sitename>’.  Then, edit etc/check_mk/main.mk and add something like this;

legacy_checks = [
  ( ( "check_solar!250!100", "Solar Output", True), [ "inverter" ] ),
]

extra_nagios_conf += r"""
  # 'check_solar' - Checks status of solar array
  # ARG1 = Warning level
  # ARG2 = Critical level
  define command{
    command_name check_solar
    command_line $USER2$/check_solar $ARG1$ $ARG2$
  }
"""

Now, put your script (in this case it’s check_solar) into local/lib/nagios/plugins/ .  What’s going on here is this;

  • Define a legacy Nagios check calling the command check_solar with parameters 250 and 100.  The check will have a description of Solar Output, outputs performance statistics, and will be assigned to the host named inverter.
  • Define a chunk of legacy Nagios config defining the check_solar command.

Then, go into your inverter host, edit services, and the manual service should appear.  Save config and you’re done!  Pretty easy.