Importing HomeAssistant into Splunk Metrics

James Young · July 26, 2021

I’m a fairly keen HomeAssistant user, and I also use Splunk for log analysis and as a lab for learning about it since my work uses it. I was pretty interested in getting Home Assistant telemetry into Splunk, and in particular into the metric indexes that Splunk can store for efficiency and longer-term graphing of data from all my sensors.

Configure a HTTP Event Collector for Splunk

Following the documentation, create a new HTTP Event Collector on your Splunk instance, giving it a source datatype of something like json_no_timestamp for now. Have all events sent to a new index. This index won’t be doing a whole lot, but we’ll use this to verify everything is working correctly.

While you’re at it, create a new Metrics Index named appropriately, and add it to the indexes for your HEC. Set the default index for your HEC to the event index you created, not the metric index. Record the HEC token, you’ll need that shortly.

Configure HomeAssistant to send metrics

In your HA configuration.yaml, you will need a fragment very much like this (splunk_api_key refers to a secret containing the HEC token you created above).

# Splunk integration
  token: !secret splunk_api_key
  port: 8088
  ssl: true
  verify_ssl: false
  name: "hassio"
      - sensor
      - sensor.time
      - sensor.date_time

This will only include the sensor domain, and will specifically exclude the clock-related HA sensors, since their data is not particularly useful to you, so you shouldn’t bother ingesting it. You may also want to collect binary sensors and such, but I’m much more interested in standard analog sensors, and need to keep my daily ingest down to fit into the Splunk Free license.

Configure Splunk to Transform HA Metrics

You should immediately start seeing events land in your events index. Notably, the format of those JSON messages is not suitable for Splunk to direct ingest as metrics, so we can’t use them. Instead, we will set up a custom transform to add the correct fields to allow that to work.

There are two JSON fields in every packet that will be useful - the domain (which in the above example will always be sensor), and the entity_id, which is a unique field corresponding to the entity_id in HomeAssistant. We will merge these together to make the metric name, and take the value from the JSON value field as the _value.

First, edit $SPLUNK/etc/system/local/props.conf and add the following stanza;

pulldown_type = 1
TRANSFORMS-hassio-metricname = hassio_metric_name_domain,hassio_metric_name_entityid,hassio_metric_name_merge
TRANSFORMS-hassio-metricvalue = hassio_metric_value
category = Metrics

This defines a new sourcetype named hassio_metrics which will have a set of transforms applied to it to do the field extractions we need. Don’t restart Splunk yet.

Then, edit $SPLUNK/etc/system/local/transforms.conf and add the following stanzas;

# Extract the domain
REGEX = "domain": "(\S+)"
FORMAT = metric_domain::$1

# Extract the entity_id
REGEX = "entity_id": "(\S+)"
FORMAT = metric_entityid::$1

# Merge domain, entityid into the metric name
# We only accept metric names in the 'sensor' domain
SOURCE_KEY = fields:metric_domain,metric_entityid
REGEX = (sensor) (\S+)
FORMAT = metric_name::$1.$2

# Extract metric value from a standard sensor
REGEX = "value": (\d+.?\d+)
FORMAT = _value::$1

Now what happens here is a bit funny, so let’s explain. The hassio_metric_name_domain and hassio_metric_name_entityid transforms pull those fields out of the raw JSON (before it’s indexed) and create new fields named metric_domain and metric_entityid respectively. You have to do it here and not use INDEXED_EXTRACTIONS=JSON in props.conf before the indexed extractions happen after your custom transforms and therefore the required fields aren’t available. It would be a lot simpler if they were, but it is what it is.

After those two fields have been generated, we then define a new transform hassio_metric_name_merge which uses the two fields we defined earlier concatenated together (with spaces between) as the source. This simply regexes it to be sure the domain is actually sensor and then creates a new field metric_name with the domain and entity_id concatenated together with a . between them.

And lastly, hassio_metric_value simply pulls out the relevant field from the raw event and inserts it into the _value field.

Restart Splunk, and then change your HEC’s source datatype to hassio_metrics so the transforms you defined get applied to it. Examining your new incoming events, you should see all the newly defined fields in the events, indicating that the transform has worked.

Converting to Metrics

Assuming that the events you are getting through have a correct metric_name and value field, you can now simply edit your HEC, and change your metrics index to be the default index for the collector. You should then start seeing metrics populating into it, which are available in the Analytics tab in Search & Reporting. There’s a good chance you’ll see all sorts of funny metrics coming in you didn’t expect or even know existed.

Great success!

Twitter, Facebook