Thursday, November 7, 2013

A change in log format for Splunk UF 6.x relative to tracking apps using the Deployment Server

I realized two things yesterday as I was troubleshooting various Splunk things. The first relates to having multiple input configs sent to a centralized syslog server. The second relates to changes to the internal 6.x UF logs as it relates to tracking apps that have been installed or removed.

In the first case we have a ton of data coming into a syslog server that outputs to separate folders based on the sending organizational unit. For sanity sake and to give a higher level of granularity I create separate input packages pushed by my DS to the syslog server. This works well and good...until you outsmart yourself. Let's say you have multiple monitor statements in an inputs.conf file with a number of common statements for each governing where the data is going, if you are using host_segment, etc. A way to cut down on duplication is at the top (or wherever really) to add

[default]
index = some_index
host_segment = 3

[monitor::/1]

[monitor::/2]

This works great and cuts down on visual clutter but at some level only in cases where you are sending one inputs.conf file to a system. In this case I changed my MO and had multiple [default] stanzas in separate inputs going to the same system. It is obvious in hindsight of course. Lesson learned.

The second is we are starting to have some internal logs from 6.x UFs roll in. I have a query using the internal logs from the UF I use to track when apps are being installed or uninstalled. Have gotten a lot of use our of this thing over the last year.

index=_internal sourcetype=splunkd DeployedApplication "installing" OR "uninstalling" | rex "WARN\s+DeployedApplication - (?<action>\S+)\sapp\S+\s(?<app>\S+)" | dedup _time host action app | table _time host action app | sort -_time

The 'problem' is the data in relevant logs in 6.x is slightly different so the above field extractions don't work. Don't get me wrong - I love the increased granularity there! The challenge is coming up with one query to still see apps being (un)installed across all my forwarders though. I monkeyed with making adjustments to the existing query and still using the UF internal logs but have opted instead to just use the logs from the deployment server itself. The resulting query is certainly cleaner.

[this is superseded below]
index=_internal host=your_deployment_server DeploymentMetrics sourcetype=splunkd | table _time hostname event appName status

Debating on keeping in the event=download events and at some point need to figure out why some of those have a status of failed. Note that this query works on a 5.x DS. Don't know how it will change on a 6.x one.

[update: 11/20/13] The query above didn't really work well I think because of issues related to metrics in general. Who knows. At any rate I came up with a new query that goes back to looking at the internal logs from the forwarders. I also found another event in the 6x UF logs that is much easier to break out for the purposes of this search. I've also incorporated a visual element to see if the forwarder is 5x or 6x.


index=_internal sourcetype=splunkd deployedapplication (removing OR installing OR uninstalling) NOT "removing app at location" | rex "DeployedApplication - (?<Action>\S+)\sapp(\=|\S+\s)(?<App>\S+)" | eval Action = case(Action="Removing","Removing",Action="Uninstalling","Removing",Action="Installing","Installing",1=1,"Fix me") | rex "(Removing|Installing) app=(?<Version>\S+)" | eval Version = if(isnull(Version),"5x","-= 6x =-") | dedup _time host Action App Version | table _time host Action App Version | sort –_time

Mark

No comments:

Post a Comment