Backup via rsync

Since my Microserver keeled over and died and I didn’t have a good backup in place, I was in search of a quick and easy way to back up the USB key on a regular basis so I could recover easily.

The first thing I did was to run a daily cron job to tar up the whole USB key and shove it onto rust somewhere.  This went OK, but it wasn’t very elegant.

So, enter the below script.  I got this off a work colleague of mine, and adapted it slightly to suit my purposes…

#!/bin/sh
#
# Script adapted from backup script written by David Monro
#
SCRIPTDIR=/data/backups
BACKUPDIR=/data/backups
SOURCE=”/”
COMPLETE=yes
clonemode=no
while getopts “d:c” opt
do
        case $opt in
                d) date=$OPTARG
                ;;
                c) clonemode=yes
                ;;
        esac
done

echo $date
echo $clonemode
if [ “$clonemode” = “yes” ]
then
        SOURCE=”$BACKUPDIR/current/”
        COMPLETE=no
fi

mkdir -p $BACKUPDIR/incomplete \
  && cd $BACKUPDIR \
  && rsync -av –numeric-ids –delete \
    –exclude-from=$SCRIPTDIR/excludelist \
    –link-dest=$BACKUPDIR/current/ \
    $SOURCE $BACKUPDIR/incomplete/ \
    || COMPLETE=no
if [ “$COMPLETE” = “yes” ]
then
    date=`date “+%Y%m%d.%H%M%S”`
    echo “completing – moving current link”
    mv $BACKUPDIR/incomplete $BACKUPDIR/$date \
      && rm -f $BACKUPDIR/current \
      && ln -s $date $BACKUPDIR/current
else
    echo “not renaming or linking to \”current\””
fi

What this will do is generate a backup named for the current date and time in the specified location.  This script will find the latest current backup, and will then generate a new backup hardlinked to the previous one, saving a LOT of disk space.  Each backup will then only contain the changes made since the last one.
After running this for a while (about a month), I’ve now got a 4.6Gb backups folder, with a 1.7Gb base backup – so a month’s worth of daily backups has only take up double the space of a single backup.  Note however that there’s been some inefficiencies around updatedb that has blown more disk space than otherwise should be the case.
In order to check the size of each backup, just do a “du -sch *” in the folder your backups are in.
A very important safety tip.  Since each file is generate through hardlinks, do not under any circumstances try and edit files in the backups.  Let’s say you haven’t changed /etc/passwd in a long time.  While it looks like you have 30 copies of it, you actually only have one (ie, the hardlink).
If you go and edit /etc/passwd, you will change it on every backup at once, effectively.  So don’t do that.  It’s safe to just flat-out delete a backup, you won’t trash anything.  Just don’t edit things.

HP Microserver – UpdateDB Bloat

A brief discovery…  If you use a script to back up your Microserver to a mount point somewhere in your filesystem, then your updatedb database will keep growing and growing without bound.  This is bad if you’re using a USB key for your root filesystem (I went from writing a 1.5Mb updatedb database once a day to writing a 60Mb one before I caught it).

The solution to this is to edit /etc/updatedb.conf and add the path to where your backups are stored to the PRUNEPATHS option.

HP Microserver – Suppressing Excess Logging

Just got around to doing this.  There’s a lot of logging that goes on by default on a Linux box, and every log message that gets committed to disk shortens the life of the USB key you’re driving your Microserver with.  As such, suppressing spurious logging is fairly important.

One of the most regular loggers is cron.  Every time cron runs it will write several log messages to /var/log/syslog and to /var/log/auth.log .  To suppress this entirely, create a new file /etc/rsyslog.d/02-suppress-cron.conf and put in this text;

cron.*  ~
:msg,contains,”pam_unix(cron:session)” ~

This will tell rsyslog to suppress all cron messages, and also suppress any other messages which would normally go to auth.log about cron.

ENVI-R Updates, Other Stuff

Whew, been a while.  I’ve been a bit slack with keeping up with my projects lately – too busy with other things.  However, I have a few updates.

I got a solar panel installation done last week, and as such I got a second transmitter for my ENVI-R power monitor.  I discovered a problem with the moving average code in my ENVI-R scripts, so I’ve corrected that and updated the GoogleCode repository.  It will now handle multiple transmitters without problems.

I’ve also gone and made button extensions for the three pushbuttons on the Arduinoven’s front panel.  Manufacture was simple – I used a piece of aquarium-grade rubber tubing, which pushed over the button’s shaft, and then a piece of aluminium rod which went in the tube and was rounded on the end.  Buttons hold well enough, and they won’t fall out.  Now to drill the holes in the front for them, and then cut out a space for the LCD with a coping saw.

In other news, the Microserver is going quite well now that I’ve rebuilt it onto another USB key.  I’m now using tmpfs for both the MRTG data and /tmp.  Doing so appears to have resolved the minor issues I used to have, where I/O would pause for up to 10 seconds at random intervals.  My thoughts here is that when MRTG was doing its five-minutely update, it was queueing up a lot of I/O to the USB key, which prevented any other I/O while it was in progress, causing the pause.  Now it runs against tmpfs I’m no longer getting that problem.

I’ve got some ideas for a battery-powered temperature sensor powered by an ATtiny, which would send out temperature data via radio once a minute or so.  More on that when it happens.

HP Microserver – Further Optimizations

Did some more optimizations on my Microserver to improve performance and reduce the write I/O going to the USB flash.

First thing was fixing up the script I used to set the noop scheduler on the USB key.  I didn’t like using sdc directly since that will change.  Instead, I now do this;

find /sys/devices/pci0000:00/0000:00:13.2/usb* -name “scheduler” -exec sh -c “echo noop > {}” \;

This finds all USB devices plugged into that PCI device (ie, all of them), and sets their scheduler to noop.  This has the desired effect without touching the hard disks.
Secondly, I added the following line to /etc/fstab ;

tmpfs   /tmp    tmpfs   defaults  0       0

This causes the /tmp directory to be mounted as a temporary filesystem in RAM, reducing writes to the USB by quite a lot.  Since I have 8Gb of RAM, there isn’t much penalty here (if any).  The good thing about tmpfs is that it only consumes memory for files that are actually written.  It’s better to use noexec,nosuid if you can, but I found that if I used that then scripts that are called as part of apt-get couldn’t run.  And at any rate, the default /tmp doesn’t have those bits enabled.
Next up is to look at how much logging is going on and reducing it.

HP Microserver – Ubuntu 11.04 Setup

On Wednesday, I received my $199 (!!!) HP Microserver.  This little beauty is a low-cost, low-power solution for anyone who wants a tiny NAS or otherwise always-on box.

HP Microserver with AMD N36L

As standard, this unit comes with one 250Gb SATA disk drive, no DVD drive, 1Gb of PC3-10600 unbuffered ECC memory, and an AMD N36L processor.  It also comes equipped with all the drive chassis required to go up to 4 internal drives.  Memory can be upgraded to 2x4Gb of PC3-10600 in either ECC or non-ECC formats, unbuffered only.  I upgraded it straight away to 8Gb of RAM with two Kingston 4Gb unbuffered PC3-10600 CAS 9 DIMMs.

The N36L processor is the kind of processor usually found on netbooks or laptops – it’s a dual-core 1.3GHz 64-bit processor, and even comes with AMD’s VT-d hardware virtualization support.  The motherboard has an integrated onboard USB port, as well as the four USB ports on the front and two on the back.

So, what to do with this thing?  Well, I currently have a box that’s running a compactflash card for storage, but it’s a horrible CPU and has hardly any memory (256Mb).  As a result, it’s slow.  But it uses very little power.  I want to make this as a replacement for that, which means power is at a premium.

As such, my plan was to run a *nix off of a USB key.  I also wanted a good filesystem so I could use the thing as a NAS when i get more physical drives.  I first toyed with the idea of Oracle Solaris 11 Express running ZFS, but I abandoned that because I didn’t like Solaris.

Then I had the idea of running Ubuntu Server with ZFS from the ZFS On Linux project.  That worked, but I locked up my Ubuntu 32-bit server install when I copied a file.  I then discovered that ZFS is really memory hungry and you should use a 64-bit OS.  So I swapped to Ubuntu Server 64-bit.  It worked.  But then I found out that ZFS can’t grow a stripe vertically, so if I upgraded to bigger drives I’d be in the poo.  Plus it’s really a bit complex for what I want.

So what I wound out settling on was good old’ MDADM with LVM and EXT4, running on Ubuntu Server 64-bit.

Next up was to install the O/S.  I have a 16Gb HP USB key sitting here, which is reasonably fast for something about 1cm longer than a USB plug.  Running hdparm gives me about 25 Mb/sec out of it, which is acceptable for this purpose.

Installation is straightforward, you basically follow the bouncing ball.  Pull out the 250Gb drive so nothing plays silly buggers, and when you get to the partitioning section you’ll have to adjust things significantly.  I used guided partitioning with LVM, but then edited everything.  You’ll need to recreate your swap to a more sane size (like 2Gb), and set your root to about 6gb.  I’d advise against just inflating the root to the size of the volume group, otherwise there’s no point using LVM.  Keep the spare space in reserve.

Now, there’s a few things to keep in mind with a USB key as your boot device.  You don’t want to use a journalling filesystem, since it will kill the USB key in short order.  You also don’t want to use the atime feature, since that will also kill the USB key.  So, change your root partition settings to use ext2 (non-journalling), and in options set the noatime flag.  Other options should be fine.

After installation, you should be able to boot up off the USB key into a working Ubuntu Server install.  There’s a few other optimizations to be made.  Edit your /etc/fstab and add dirnoatime to the options for your root.  Then edit /etc/rc.local and add these commands;

echo noop > /sys/block//queue/scheduler
echo 10 > /proc/sys/vm/swappiness

Be warned.  Your USB key will be the device letter of the last drive you have in the system.  So if you have two physical drives plugged into the array and the USB key, the USB key will be /dev/sdc .   GRUB and /etc/fstab doesn’t mind since they use UUID’s, but the above commands do.  Just keep that in mind.

I’ll write up some stuff about setting up MDADM when I get some drives to go in it.  But as it is, the Microserver looks like a pretty decent box for someone who wants a low power, lightweight storage box which can also do other stuff.

ENVI-R – Putting the last pieces together

In my last post about the ENVI-R setup, I discussed setting publish-envir.pl in order to parse raw serial data from the ENVI-R and post it out to MQTT channels.  In this post, we’ll discuss how to get THTTPD and MRTG going, in order to actually get some useful graphs out there.


Glue Scripts

MRTG can’t use data direct from MQTT, so we need a fairly simple Perl script to pull the MQTT data out and hand it to MRTG in a useable format.  An MRTG “packet” is of the following format;

A response must have four and exactly four lines.  So, our glue script has to haul in data from MQTT and then output four lines for MRTG to use.

Have a read of mrtg-envir.pl at my GoogleCode repository.  The code there should be fairly self-explanatory.  The only complex bit is CalculateReading, which is a horrible bit of code that tokenizes and parses your input so you can do things like “return the first sensor subtract the second sensor” with “0.1-0.2” and stuff like that.

It’s important to note that the script will wait until MQTT publishes a message.  This means that if your publish-envir.pl script isn’t running, or the ENVI-R isn’t working or something, then MRTG will also be held up.

THTTPD Setup

Setting up THTTPD is very, very easy.  From Ubuntu, just do this;

sudo aptitude install thttpd
service thttpd start

And that’s about it.  By default, it’ll share out stuff in /var/www, and has CGI turned off.  You don’t need CGI in the short term.  By default it’ll also have chroot enabled and will be reasonably secure.  All it can really do is pass out pages, but it has a tiny memory footprint and is very fast.  And for the basic cron-driven MRTG setup, that’s all you need.

In order to test, just create something like /var/www/index.html and put the text “Hello world” into it, and make sure you can fetch it.

MRTG Setup

MRTG is also very straightforward to set up.  Just sudo aptitude install mrtg to install it.  By the default setup, MRTG will not run as a daemon, it’ll run as a cron job executed every 5 minutes.

A warning about MRTG.  MRTG isn’t very scalable – you’ll want to use rrdtool or cacti if you want something big.  But for something simple, easy to set up, and only for a few samples, MRTG does the job quite nicely.

Have a look at my example mrtg.cfg at my GoogleCode repository.  Drop that into /etc/mrtg.cfg and run the following commands and you’re sorted;

sudo su –
mkdir /var/www/mrtg
indexmaker –output=/var/www/mrtg/index.html /etc/mrtg.cfg
env LANG=C /usr/bin/mrtg /etc/mrtg.cfg

exit

If you now look at the contents of /var/www/mrtg, you should see a number of files, images and the like.  Pop open your web browser and browse to http:///mrtg and you should see some graphs!

Final Notes

One thing you’ll notice is that the publish script will accumulate a 5-minute moving average, which happily lines up nicely with the sample interval for MRTG.  But the calculated averages for the weekly and monthly graphs are calculated by MRTG, and will often miss out spikes in usage which you may want to see.  Additionally, the linear vertical scale can miss out low levels of base load or make the hard to see  Consider using logarithmic for the vertical scale.

You’ll also notice in my GoogleCode repository a very simple CGI script for dumping out the instantaneous data from envir-last.  You’ll need to enable CGI on THTTPD for this to work.

All in all, the setup was a good bit of fun and some problem solving, and now I’m collecting some useful electricity data.  Some observations are that careless use of lighting burns a surprising amount of power, and that the aquarium heater for our freshwater turtle was turned up too high and was blowing a lot of power as well.

I’ll need to do some baseload analysis of devices on standby, because I still feel that the base load power use is way too high.

ENVI-R – Data Parser

In my last post about my ENVI-R setup, I talked about how to set up RSMB.  Now we’ll talk about setting up the appropriate scripts so that you can get data parsed from the ENVI-R and have it published to the appropriate channels on the message broker.

Note:  It’s been a long time since I’ve done much Perl coding.  As such, this code is probably horrible.  It works, but yeah, don’t expect a grand example of clean Perl code.

Before you can use this script, there’s a number of things that should have already been set up;

  • The Message Broker should be running.  Read my previous post for setup.
  • You should have your ENVI-R all connected and ready on /dev/ttyUSB0.  Read this post for setup.
  • You’ll need Perl installed (this should be in by default).
  • You’ll need the WebSphere::MQTT::Client CPAN module.  Read this howto for how to install.
  • You will also need the Device::SerialPort , Data::Dumper and Clone modules.  Dumper should already be installed.
  • The Perl script available as publish-envir.pl in my GoogleCode repository.

Right.  After all that, drop that script in /usr/local/bin (or somewhere else fairly sane) and just run it.  It will spit out various bits of output about what’s happening by default.  You can make an init.d script of the same kind of form as the one you made for the broker to run it on boot.  But just run it by hand first.

What you will notice is that the script will publish data received from the ENVI-R to three channels; envir-raw, envir-last, and envir-average.  The purpose of those channels is as follows;

envir-raw:  The last raw message received from the ENVI-R.  The only processing done to this is to make sure that the line received is of the form X, and it drops X out to the channel.

envir-last:  The parsed values of the last received ENVI-R message.  The timestamp that is generated here is the UNIX timestamp on the host at the time the message was received, NOT the time as far as the ENVI-R is concerned.  Sensors are defined as X.Y where X is the transmitter number (0 is the first) and Y is the channel number (1 through 3).

envir-average:  The calculated 5-minute moving average of the messages received.  The timestamp here is the timestamp on the host at the time the average was sent to the channel, not the ENVI-R’s time.  Sensor values are the same and are calculated from the sensors present at the start of the moving average period.  So if you add a sensor, it won’t show up in the average for five minutes.

Due to random variance in the temperature and power readings, it’s strongly advised to use the envir-average channel for normal charting, and to just use envir-last for looking at instantaneous data or making sure everything works.

Once you’ve got it running, swap to another window and run stdoutsub envir-last.   After a few seconds you should see parsed packets coming in from the script in a form you can easily use with MRTG or anything else you want.

If you don’t see anything after a while, check the envir-raw channel, look at what the script is outputting to the console, and finally check the /dev/ttyUSB0 device itself to make sure you’re seeing data.

Assuming it’s all working, we’re nearly there.  All that remains now is THTTPD, MRTG, and the final glue scripts.

ENVI-R – Data Broker Setup

Continuing my previous post on this topic, I realized I needed a message broker in order to permit me to have a daemon polling the ENVI-R for data, processing it, and then making it available in a palatable form for various scripts and for MRTG.  It would have been possible for me to simply have a cron job dump a summary table to disk every few seconds, but since my Linux box uses a CompactFlash card for disk storage, that would quickly kill the flash chips.  I needed something that held the messages in memory only.  That’s where a MQTT message broker steps in.


One of my colleagues at work put me on to MQTT, and in particular he put me onto IBM’s Really Small Message Broker (RSMB).

RSMB is a really tiny implementation of an MQTT compliant message broker.  When they say tiny, they aren’t kidding – the broker is a 78k executable which requires an included 73k library.  (config file is optional but a good idea).

Setup Notes

Given that RSMB is such a tiny thing, setting it up is very easy.  However, I did a few extra steps to make it a little more secure and integrated;

  • Dump broker (the executable) into /usr/local/bin.
  • Do the same with stdoutsub and stdinsub .
  • Copy libmqttv3c.so to /lib as libmqttv3c-1.2.0.so and then softlink libmqttv3c.so to it.  This conserves the normal sanity with libraries that ld.so expects, and it should be a bit more maintainable.  Probably not the best way to do things, but it works.
  • Create a new user with no special rights named broker.  After creation, edit /etc/shadow (there’s probably a better way to do this) and change the second field on the broker line to *.  That ensures nobody can log in using the account.
  • Create a config file in /usr/local/etc/broker.cfg (config text follows).
  • Create a new directory in /var/local/broker .  chown that directory to broker:broker.
  • Create a new /etc/init.d/broker script and add it to startup with update-rc.d .

The config file contents are;

# config file for IBM’s RSMB (Really Small Message Broker)

port 1883
max_inflight_messages 500
max_queued_messages 3600
persistence_location /var/local/broker/

The init.d script is;

#!/bin/bash
### BEGIN INIT INFO
# Provides:          broker
# Required-Start:    $network $local_fs
# Required-Stop:     $network $local_fs
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Should-Start:      broker
# Should-Stop:       broker
# Short-Description: start really small message broker (broker)
### END INIT INFO

pushd /usr/local/bin
su -c “nohup /usr/local/bin/broker /usr/local/etc/broker.cfg >> /dev/null &” broker
popd

Testing the Install

Broker comes with a couple of other binaries, which can be used to publish messages to a channel, and also to view messages on a channel.  Testing broker is quite easy.

  • Start up broker either manually (good idea the first time) or with the init.d script.
  • In one window, run stdoutsub example to start listening on channel “example”.
  • In another window, run stdinpub example and then start typing in lines.  You should see each “message” you enter appear on the stdoutsub window.
  • If you do, great.  If you don’t, verify that the ports are all OK and opened on your firewall (if any).

Next Steps

Right, getting the broker up provides the underpinning that gets used by the rest of the software that I’m discussing

From CPAN, you’ll need to go and install Websphere::MQTT – that’s the CPAN module for interfacing with IBM Websphere’s MQTT implementation.  But because it’s MQTT compliant, it’ll work just fine with RSMB.  That module will be used fairly heavily in the other scripts.  You’ll also need Device::SerialPort and Clone , for other parts of the scripts.

Followup posts will be outlining how to set up THTTPD and MRTG, and then tying it all together with Perl and Bash glue.

ENVI-R & MRTG – Overview

I recently had an ENVI-R wireless power monitor installed, and I set it up to record data to an always-on Ubuntu Linux box I have sitting around using MRTG.  The setup required a fair bit of scripting in Perl, Bash, and a couple of extra bits of software.

This post is the first in a series outlining just how I set up the monitor and what was required.

The ENVI-R and accessories

I bought the ENVI-R unit in a pack which included the transmitter, a receiver LCD display, power supply and one current clamp.  I also bought an additional two current clamps and the USB connector cable for the receiver.

The USB connector is actually a specially wired Prolific PL2303 USB to RS232 serial port adapter, with an RJ45 connector on the end.  I don’t know the exact pinout, but that’s not required.  Anyway, the adapter “just works” with Ubuntu 10.10.

Each transmitter unit can handle up to three current clamps, and one receiver can handle up to ~10 transmitters.  I didn’t want to buy additional transmitters, and there isn’t a huge amount of space in my switchboard, so I just got an additional two current clamps bringing me to three.  That allows for the monitoring of up to three loads.

Clamp Installation

WARNING – Mains voltage can be lethal.  In addition, tampering with your switchboard without an electrician’s license may be illegal, as well as dangerous.  Get an electrician to install the clamps for you.

The clamps themselves are no-contact types, and go around the active wire of whatever feed you want to monitor.  They should be clamped entirely around the wire so that the ferrite core of the clamp encircles the active wire.  The clamp operates by picking up the EM field around the active wire as current passes.

Be aware that the clamp cannot identify the direction that current is flowing.  This isn’t a problem in the case of a house like mine that has no solar power, but if you have solar power you’ll want to install a clamp onto the feed coming from the solar cells, and then put your main power clamp after the location where the solar power feed connects to your main feed.

In my case, my electric hot water is on a different circuit from the main power, so the three clamps were connected as so;

  • Main power:  Clamp connected after main breaker, between breaker and switchboard.  Registers all power going into the house (except hot water)
  • Lights:  Clamp connected after breaker for the lights (I only have one).
  • Hot water:  Clamp connected after breaker for the hot water.

Be aware as well that the ENVI-R assumes if you have multiple clamps connected to one transmitter that you’re measuring 3-phase power, and it therefore just adds together all the currents.  In the case of mains + hot water that’s OK, but since my “main” clamp is actually registering the sum of lights and everything else, it’ll over-read if the lights are on.  For that reason, if you’re setting up like me, don’t trust the display, use the serial data feed.

ENVI-R Receiver Installation

The receiver is pretty straightforward, and it’s just plugged in, and then the USB serial cable is hooked up.  I’ll run through the software to actually interpret the data in a useable format for MRTG later.

The ENVI-R communicates at 57600 8N1 speed.  If you want to just see the raw output, run this command from the command line;

$ stty -F /dev/ttyUSB0 speed 57600
$ cat /dev/ttyUSB0

You should see, about once every five seconds, output something like this;

CC128-v1.310002222.00033221000000029100000

If you do, great.  The ENVI-R is all hooked up.  Note that it also monitors temperature, and that temperature is the temperature of the receiver.  Note, if you don’t have a transmitter attached, you won’t see any output from the ENVI-R at all.

The Software

For this kind of thing, most people seem to use Cacti.  While it’s definite that Cacti can do more, the box I’m using has very limited memory, so I need to keep the number of running daemons to an absolute minimum.  MRTG can run as a Cron job, and doesn’t require a database daemon to be running as well.  For the small number of graphs I’m talking, Cacti is overkill.

In order to pass data to MRTG, I used a message broker written by IBM (Really Small Message Broker), since it does the job and it’s tiny.  Then I wrote a couple of Perl scripts to handle converting the raw ENVI-R data into moving averages for MRTG to slurp up.  So, the software required is;

  • IBM’s RSMB or other MQTT-compatible message broker
  • MRTG for generating the graphs
  • A web server of some type.  I used THTTPD, since it’s very small and fast.
  • Perl.  It’ll come with Ubuntu, but you will need a number of CPAN libraries.

Up Next

If you get to this point, you should have a connected up ENVI-R with a few clamps, and it should be hurling data out to /dev/ttyUSB0 on your Linux recording box.  Fantastic.

Now for the software….