Mark Runals' Blog: 2014

Wednesday, December 17, 2014

Greatest Splunk search of all time

Of all of the Splunk searches I've made over the last couple years the one I keep coming back to time after time is this

index=_internal sourcetype=splunkd deployedapplication (removing OR installing OR uninstalling) NOT "removing app at location" | rex "DeployedApplication - (?<Action>\S+)\sapp(\=|\S+\s)(?<App>\S+)" | eval Action = case(Action="Removing" , "Removing" , Action="Uninstalling" , "Removing" , Action="Installing" , "Installing" , 1=1,"Fix me") | rex "(Removing|Installing) app=(?<Version>\S+)" | eval Version = if(isnull(Version),"5x","-= 6x =-") | dedup _time host Action App Version | table _time host Action App Version | sort -_time

This search shows apps being installed or removed at agent level and is invaluable from an admin perspective. I have it saved to run in 5 minute real-time which is a good window for us as our agents only check into our deployment server every 3 minutes.

Wednesday, July 16, 2014

So how big ARE Windows Logs?

In my last post I mentioned how I was re-writing a few Windows events to cut down on Splunk license issues. In trying to size log management solutions in the past I've looked for lists or rules of thumb when it comes to the size of Windows events but never really found anything. That being the case hopefully someone will find this useful. I ran a query just now in Splunk to get the average byte count per Windows event ID. If you need to figure out log management license sorts of things this could give you a ROM by which to multiply a sampling of your event count against (as in number of logs on one server over 24hrs * number of related servers). After the cut you will find a 'csv' listing the Event Viewer (sourcetype), Event ID (EventCode), and average bytes for that ID. Enjoy. Oh - the average byte count for all of our Windows logs is 630.

Taming verbose Windows logs in Splunk

As you get into the world of logs you quickly realize how 'heavy' Windows logs are. By that I mean verbose. In this space verbose = length and log length translates to increased storage and licensing issues. Many log generators simply say this did that or this talked to that over this port. Pretty quick and dirty. Windows logs are generally along the lines of "dear reader, I've observed many events in the course of my life and here is something I thought I should bring to your attention. I will go on at length about this though really only give small pieces of important information with little to no explanation forcing you to scour the Internet looking for others who have gone through this self same issue." I ran a quick search in my Splunk environment and found the average Windows event code to be 630 bytes.

Splunk User Activity - Apps and Dashboards

At some level time on this blog can be marked by when I'm having a car worked on. Today isn't an exception. While that means the overall frequency is low the upshot is I get to spend my morning working in Chick-fil-A with some good breakfast in mah belly and good music in the background.

A couple days ago I wanted to get a feel for where users were going/how they were using Splunk. This is one of those things that seems easy but the further you go down this path the more offshoots there are. At any rate I was wanting to focus on UI navigation so knew I was going to be looking at the web access logs vs searches performed. There is a couple screens in SoS but they weren't working. There are also a couple screens in the search app (in 6x that is in Activity > System Activity) but those views, while working, didn't display what I was really after.

In looking at the queries behind all of these screens I ended up with the following:

index=_internal source=*web_access.log* uri=*/app/* uri!=*/images/* status=200 | rex field=uri_path "app/(?<app>[^/]+)/(?<view>.+)" | where app!="launcher" AND view!="Home" AND view!="home" AND view!="landing" | table _time host app view user _raw

Events of Interest from a Splunk Admin perspective

As our deployment has grown from basically myself performing administrative duties to adding a second body as well as running into a couple other things we've wanted increased visibility when certain activities have taken place. Over time we've merged these into a rather ugly query. It has proved itself a time or three though so figured I'd share. In putting something like this into place you need to figure out a few things. How frequently it will run, who it goes to, etc. You might find you want certain aspects to run more frequently than others which would require more than one query. YMMV

Splunk DateParserVerbose logs - Part 2

In part 1 of this subject we talked about what Splunk's DateParserVerbose internal logs are and I gave an example query that at its heart attempts to rollup and summarize timestamp related issues. In this post I'll present a query for taking the sourcetypes Splunk is having issues with from a timestamp perspective and display the relevant props configs. What we've done is thrown both queries into the same dashboard to make things easier to work though. I should note a couple things here. The first is the foreach command is only available in Splunk 6 (I believe). The second is the REST endpoint I'm getting the config data from is likely only available in 6.

With that out of the way here is the query:

Detecting OpenSSL version data in Splunk

I won't go into the HeartBleed details as you likely already know them. From a Splunk perspective there are any number of ways to try to get your arms around this issue but are highly dependent on the types of data you are collecting. That said, if you are using the Splunk Linux TA and have the package script enabled and/or the Windows TAs and have the InstalledApps_Windows script turned on you could use the following queries to extract the OpenSSL version. You could also combine the queries but for the purposes of posting them here that would make it harder to read /shrug. Obviously adjust based on changes you've made (ie sourcetype)

Linux
sourcetype=package | multikv | search NAME=openssl | dedup host ARCH | eval HBconcern = case(match(VERSION,"(^0\.\d\.\d|^1\.0\.0)"), "Too Low", match(VERSION,"^A"), "HP (Not familiar)", match(VERSION,"^1\.0\.1[a-f]"), "Potentially Susceptible", match(VERSION,"^1\.0\.1[g-z]"), "Patched", match(VERSION,"^1\.0\.2-beta"), "Potentially Susceptible", 1=1, "fixme") | table host NAME VENDOR GROUP VERSION ARCH Hbconcern | sort host

Windows

sourcetype=InstalledApps_Windows DisplayName=openssl | rex "DisplayName=OpenSSL\s+(?<VERSION>\S+)\s+\((?<ARCH>[^\)]+)"| dedup host ARCH | eval HBconcern = case(match(VERSION,"(^0\.\d\.\d|^1\.0\.0)"), "Too Low", match(VERSION,"^A"), "HP (Not familiar)", match(VERSION,"^1\.0\.1[a-f]"), "Potentially Susceptible", match(VERSION,"^1\.0\.1[g-z]"), "Patched", match(VERSION,"^1\.0\.2-beta"), "Potentially Susceptible", 1=1, "fixme") | table host VERSION ARCH HBconcern | sort host

What I'm not sure on is the regex for the 1.0.2-beta as I haven't actually seen that version installed. I'm guessing it shows up like that.

Sunday, April 6, 2014

Splunk, timestamps, and the DateParserVerbose internal logs - Part 1

Splunk is a pretty powerful piece of software. There is the obvious search and analytic capabilities it has but there is some robustness under the covers as well. One of those under-the-cover capabilities is detecting and understanding timestamp data. Its the sort of thing that as users of the software we simply accept and generally speaking don't spend a whole lot of time thinking about. From an admin perspective as you start to put some effort into understanding your deployment and making sure things are working correctly one of the items to look at is the DateParserVerbose logs. Why you ask? I've recently had to deal with some timstamp issues. These internal logs generally document problems related to timestamp extraction and can tell you if, for example, there are logs being dropped for a variety of timestamp related reasons. Dropped events are certainly worthy of some of your time! What about logs that aren't being dropped but for one reason or another Splunk is assigning a timestamp that isn't correct? In this writeup I will share a query you can use to bring these sorts of events to the surface and distill some quick understanding.

Splunk - troubleshooting remote agents with the phonehome logs

An issue popped up the other day that was pretty interesting (from a Splunk admin perspective) so figured I would share. This will likely be pretty long but hopefully someone will benefit. We had a number of servers with Splunk universal forwarders stop sending logs but in doing a spot check on their firewalls the server owner noticed traffic still going to our Splunk infrastructure backend. What had happened? The answer to the question lies in the UF phone home logs - do you know where they are and how to read them?

Mark Runals' Blog