Saturday, September 28, 2013

A search on the Splunk mug is wrong!!

For those that haven't seen it the Splunk mug is a neat little piece of practical schwag that contains queries for things ranging from finding happiness to finding Waldo and even tracking a zombie infestation. However! I've discovered an issue with one of the searches. 

The first thing to understand, if you don't already, is that the asterisk is a wildcard in Splunk. A neat little trick is that when you combine it with a field as in field=* your search will return events where that field contains a value. This makes it a great little inclusive search and potentially you won't have to use a usenull=f as part of your chart or timechart further in your search for filtering out events where the field isn't populated. 

Friday, September 27, 2013

I want more time to play!

I find myself in a somewhat strange place today where because I'm going to be at the Splunk conference next week I don't have much scheduled that needs to be done (or staged to be done this weekend). This reminds me of a line that has come up a few times as we've been going through the interview and candidate selection process for two open slots we have in the office. We have all been working way too many hours and want some 'free time' back in our normal routine. I'm not talking about a mental health break or time away from the office as much as having a pocket or two of time where we can explore/investigate/work on little side projects/quality-of-life-things that need to be done. They, generally speaking, aren't hard or long things to do but get sidelined because of higher priorities. 

So I'm monkeying around with a few things in Splunk and two rabbit holes later come up with a query that quite frankly doesn't return a whole lot of hits for me over the last month. What it DOES show is a server that wasn't able to install some config packages I was pushing from my deployment server.

index=_internal source=*metrics.log component="DeploymentMetrics" status="failed" | stats max(_time) as time by hostname event scName appName fqname | convert ctime(time)

This event is created on your deployment server. Not sure what fqname stands for exactly but in my case it was showing me the path the server was trying to install the app to (fully qualified path name is where my mind goes but doesn't fit the data). scName is likely server class name and appName is obviously the app itself - both are references to your serverclass.conf file contents. With over 1k agents deployed the fact that this found issues with only 1 server is pretty cool I suppose. Will likely bake this into the app I'll never create re: first paragraph =)