Thursday, October 29, 2015

Moving toward Splunk's CIM

For those that don't know, for some time Splunk has been moving toward a Common Information Model (CIM). They are using this both a data normalization effort - what should you name fields from particular data sources - as well as a layer of abstraction placed over your data to indicate what the data IS. In the end this is a worthwhile effort though the devil is in the details for those of us with 1) older - by whatever definition - Splunk instances with local extractions and 2) larger - again by whatever definition - Splunk instances with 3) a large - sensing a trend? - number of sourcetypes. Frankly I'm big time scared of the performance implications of using what was a second or third class citizen in Splunk (tags) as my primary source of querying across 1k sourcetypes and 15B logs per day. Martin Mueller had a great talk at .conf15 looking into performancy sorts of things which brings a lot to that aspect of the discussion (search for his name here for a link to the slides. His name is actually spelled Martin Müeller and a direct link to the pdf is >here<).

Performance concerns aside the question is how do you go about a discovery effort to figure out which of your sourcetypes should map to which CIM based data model? In theory and based on the number of sourcetypes you have you could do this by manually reviewing a list. That might work for some percentage of sourcetypes but perhaps not all. At any rate some of this is addressable by the new Splunk commands: pivot and datamodel. The challenge with those is they are essentially searching across your data with the fields contained within the model in a one off basis (one DM at a time) and if the fields don't match then there simply is no results. What I was wanting was a way to take all of the fields from my data and throw that up against all of the fields in all of the models with a side of fuzzy string comparison. I *think* I have found a way.

Friday, October 2, 2015

Taming Verbose Windows Logs - Update

In looking at the Windows firewall logs coming out of the Security event viewer (mainly 5156) I realized the space in "program files" was throwing off the regex. You got to love the format of Windows logs. Maybe one day we will ingest the XML version - not likely :(. If anyone has ideas on a better regex I am ALL EARS! I tried using newlines, carriage returns, spaces, etc in front of "Network" (which is 2 lines down from the application name) but wasn't getting the desired results.

This is an update to >this< post.

Transforms
New
(?ms)EventCode=(5156|5152).*?Keywords=(Audit Failure|Audit Success).*?Message=The Windows Filtering Platform (?:has )?([^\.]+).*?Process ID:\s+(\S+).*?Application Name:\s+(System|.+\.exe).*?Direction:\s+(\S+).*?Source Address:\s+(\S+).*?Source Port:\s+(\S+).*?Destination Address:\s+(\S+).*?Destination Port:\s+(\S+).*?Protocol:\s+(\S+).*?Filter Run-Time ID:\s+(\S+).*?Layer Name:\s+(\S+).*?Layer Run-Time ID:\s+(\S+)

Old
(?ms)EventCode=(5156|5152).*?Keywords=(Audit Failure|Audit Success).*?Message=The Windows Filtering Platform (?:has )?([^\.]+).*?Process ID:\s+(\S+).*?Application Name:\s+(\S+).*?Direction:\s+(\S+).*?Source Address:\s+(\S+).*?Source Port:\s+(\S+).*?Destination Address:\s+(\S+).*?Destination Port:\s+(\S+).*?Protocol:\s+(\S+).*?Filter Run-Time

Props Field Extraction
New
^Trimmed Event EventCode=(?<EventCode>5152|5156) (?<Keywords>Audit Success|Audit Failure) (?<Process_ID>\S+) (?<Application_Name>.+) (?<Direction>Outbound|Inbound) (?<Source_Address>\S+) (?<Source_Port>\S+) (?<Destination_Address>\S+) (?<Destination_Port>\S+) (?<Protocol>\S+) (?<Filter_Run_Time_ID>\S+) (?<Layer_Name>\S+) (?<Layer_Run_Time_ID>\S+) (?<TaskCategory>blocked a packet|permitted a connection)

Old
^Trimmed Event EventCode=(?<EventCode>5152|5156) (?<Keywords>Audit Success|Audit Failure) (?<Process_ID>\S+) (?<Application_Name>.+) (?<Direction>\S+) (?<Source_Address>\S+) (?<Source_Port>\S+) (?<Destination_Address>\S+) (?<Destination_Port>\S+) (?<Protocol>\S+) (?<Filter_Run_Time_ID>\S+) (?<Layer_Name>\S+) (?<Layer_Run_Time_ID>\S+) (?<TaskCategory>blocked a packet|permitted a connection)

Tuesday, August 4, 2015

Does better information sharing require a security clearance?

From time to time the topic of information sharing comes up in relation to getting security clearances in order to have more open and timely dialog with various government agencies. Having lived in that space for a time I would agree having a clearance would help in having overarching conversations if only because the culture is one that defaults to needing a clearance to have meaningful dialog. The problem comes when I put on more of an incident responder/cyber defender hat. The TLDR summary is the information most useful to cyber defenders isn't who has compromised their environments as much as it is the IOCs and methodologies used to gain entry. This is because we aren't defending a strategic point in 3D space. We are having to defend our organizations potentially from every computer plugged into an ethernet jack or wifi around the planet. Note this post is about why I think having multiple people in your security group cleared is less important that an adjustment in the classification paradigm. This isn't in response to being notified that my company has been breached.

Not to rehash 'cyber warfare' conversations post Aurora but conflict, by whatever definition, in the 5th domain (cyber) is unlike kinetic based conflict occurring especially in domains 1 through 3 (land, air, sea) and less so in space (the 4th). To back up a bit and make gross generalizations the end state of much of the classified intelligence space is ultimately linked to and focused on attribution (aka who's responsible for X so I can go punch them in the throat). Retribution though doesn't happen in the 5th domain - at least at the commercial level. The impasse generally found then is at the government information sharing level where the who is portion marked with the highest classification level because in that world that is the most important piece of information. That trickles down to portion marking the techniques being used and lastly things like specific things like IPs. While the initial response to a breach from management is often "who did this?" followed quickly by "why were we a target?" and the "why's" can and should help shape our defensive strategies/priorities, as a cyber defender at some level I could care less about the answers. Why you ask? Because that information isn't actionable. I'm more interested in the how as it relates to knowing what I should look for and what needs to be fixed. 'How' in this case ISN'T just which IP addresses were used 3 months ago and we are only hearing about this now. It is the full scope of IOCs.

I fully appreciate if the federal agencies openly shared IOCs and TTPs the malicious actors would simply switch how they are doing what they are doing. That said though I don't believe the solution to more and better sharing at the rank and file actionable data level is to get a clearance so we can better operate in the federal space once we are 'read in'. For cyber based compromise notifications adjust the portion marking on the classified documents appropriately allowing companies to better defend and respond /shrug.

Monday, June 29, 2015

Electronically Aided Collisions and InfoSec

Like cruise control for your car, GPS assisted autopilots and the like for boats can help operators with mundane tasks like holding a course over a long stretch of water. Unlike driving though boating comes with additional challenges - keeping track of water depth, the impact of changing weather conditions, the fact that there generally aren't defined 'roads' or travel lanes, etc. The rise in adoption of electronic navigational devices has also seen a rise in what is being termed "Electronically Aided Collisions." These can range from a GPS device malfunction or signal interference which causes the boat to veer off course and run aground to a momentary (or longer term) judgement lapse where your attention is off where the boat is going and the general surroundings to the detriment of your boat - or worse - others.

While my sail boating father and I were talking about this and him sharing stories my mind starting drawing parallels to the InfoSec world. I've tried to boil these down really to two thoughts

Saturday, February 7, 2015

Gaining visibiliy to ad-hoc data exports from Splunk

Along the same lines of understanding how your users are using Splunk and dovetailing into are users abusing their access to data in Splunk is taking a periodic look into what data they might be exporting. By that I mean exporting to a csv or maybe generating a pdf of a dashboard. Ideally you would like to know, for example, if this Mark character has exported something, what format was it in, what was the search, and how many records or results were included in the download.
There are a couple challenges


  1. Search results (result count, events searched, etc) are in the internal search completion logs while the search parameters are in the internal search initiation logs.
  2. Those logs are separate from the web logs that indicate someone has performed one of the export actions.
  3. The various Splunk commands you might use to merge all of this data has some limitations that you will need to keep in mind. For example to use a subsearch to get something like search_id and pass it to a parent search is limited by default to a 60s runtime and/or 10k results. A join or append is limited to a 60s runtime and/or 50k records, again by default. If you have even a moderately sized deployment over the course of several days you have thousands of searches being run when you factor in your users, scheduled content, and internal Splunk processes. I suppose one way to mitigate this is to review the detection query output every day but that seems a little too frequent to me.

Saturday, January 31, 2015

Splunk Apps: Forwarder Health

It is long past time I actually wrote a few posts on the Splunk apps I've created. Woke up far too early for a Saturday morning and in an effort to avoid anything around the house I will rationalize this as productivity at a general level and feel I've accomplished much! Who knows - it might be of value to my ... ones ... of readers! =)

Actually it was VERY cool to have a guy come up after my presentation at the 2014 Splunk user conference and mention having read my blog while working with ArcSight and now while working with Splunk (thanks Joe!).

Forwarder Health

So our environment has currently some 2,200+ forwarders which is certainly not the largest environment out there but is likely much larger than the average. While there are apps like Splunk on Splunk and Fire Brigade to help identify issues with your indexers and search heads there wasn't something that helps identify issues with forwarders. Admittedly this is a hefty task as there are innumerable issues a forwarder can have. I wondered though if there was a way to generically detect if an agent was having issues. The sage like advice from the Verizon breach reports bubbled up in my mind - start by looking at the size of the haystacks. What if you were to compare the number of internal logs a forwarder was generating and compare it to the average? A couple hours later the bones of the app were in place.