Saturday, September 28, 2013

A search on the Splunk mug is wrong!!

For those that haven't seen it the Splunk mug is a neat little piece of practical schwag that contains queries for things ranging from finding happiness to finding Waldo and even tracking a zombie infestation. However! I've discovered an issue with one of the searches. 

The first thing to understand, if you don't already, is that the asterisk is a wildcard in Splunk. A neat little trick is that when you combine it with a field as in field=* your search will return events where that field contains a value. This makes it a great little inclusive search and potentially you won't have to use a usenull=f as part of your chart or timechart further in your search for filtering out events where the field isn't populated. 


The search in question proposes to find where the streets have no name with the search being source=streets NOT name=*. To understand why this search is wrong is to understand what the search is asking and the implications on the dataset. Does every street have a name and it is simply not known? Perhaps a data analyst didn't have it and wasn't motivated to ask Siri. In that case we could assume the dataset contains the field name but is empty. The query should be source=streets name!=* because what you are really looking for is a field containing null data.

…..unless of course we were to ask this at a higher level of thinking - do all streets have names? Can a street exist without a name? If so the query would be correct because what the query is really looking for is cases where the name field doesn't exist. 

So what about dirt roads? I bet there are a lot of those that don't have names and wouldn't show up in either search. Does the streets source contain information on just paved pedestrian or motorized thoroughfare I wonder. We might never know. 

At any rate how can you leverage all of this drivel. The first thing that comes to mind is what if you are trying to implement a CIM or common information model. What am I talking about? Let's say you have multiple firewalls sending data into Splunk that all contain different field names (or no field names) for something like source IP address. One has src, another spells out source_address, etc. If you wanted to search across your dataset more more than just finding a particular IP you'd like to have just one field name to use wouldn't you? Once you have decided on a field name (let's say src_ip) you can create a search looking for cases where that field doesn't show up against any particular sourcetype. For example sourcetype=sonicwall NOT src_ip=*. This query is looking for cases where your CIM field doesn't exist in the returned data. At that point you could go back to your props.conf and put a field alias so that every time Splunk sees the field 'src' in the sonicwall data it also creates a field called src_ip. 

The main point of this exercise was to point out that using NOT field=* vs field!=* might get you results you weren't expecting and to give you a little more knowledge on when to use which based on what you are trying to find. 

No comments:

Post a Comment