Saturday, February 7, 2015

Gaining visibiliy to ad-hoc data exports from Splunk

Along the same lines of understanding how your users are using Splunk and dovetailing into are users abusing their access to data in Splunk is taking a periodic look into what data they might be exporting. By that I mean exporting to a csv or maybe generating a pdf of a dashboard. Ideally you would like to know, for example, if this Mark character has exported something, what format was it in, what was the search, and how many records or results were included in the download.
There are a couple challenges


  1. Search results (result count, events searched, etc) are in the internal search completion logs while the search parameters are in the internal search initiation logs.
  2. Those logs are separate from the web logs that indicate someone has performed one of the export actions.
  3. The various Splunk commands you might use to merge all of this data has some limitations that you will need to keep in mind. For example to use a subsearch to get something like search_id and pass it to a parent search is limited by default to a 60s runtime and/or 10k results. A join or append is limited to a 60s runtime and/or 50k records, again by default. If you have even a moderately sized deployment over the course of several days you have thousands of searches being run when you factor in your users, scheduled content, and internal Splunk processes. I suppose one way to mitigate this is to review the detection query output every day but that seems a little too frequent to me.