Thursday, October 29, 2015

Moving toward Splunk's CIM

For those that don't know, for some time Splunk has been moving toward a Common Information Model (CIM). They are using this both a data normalization effort - what should you name fields from particular data sources - as well as a layer of abstraction placed over your data to indicate what the data IS. In the end this is a worthwhile effort though the devil is in the details for those of us with 1) older - by whatever definition - Splunk instances with local extractions and 2) larger - again by whatever definition - Splunk instances with 3) a large - sensing a trend? - number of sourcetypes. Frankly I'm big time scared of the performance implications of using what was a second or third class citizen in Splunk (tags) as my primary source of querying across 1k sourcetypes and 15B logs per day. Martin Mueller had a great talk at .conf15 looking into performancy sorts of things which brings a lot to that aspect of the discussion (search for his name here for a link to the slides. His name is actually spelled Martin Müeller and a direct link to the pdf is >here<).

Performance concerns aside the question is how do you go about a discovery effort to figure out which of your sourcetypes should map to which CIM based data model? In theory and based on the number of sourcetypes you have you could do this by manually reviewing a list. That might work for some percentage of sourcetypes but perhaps not all. At any rate some of this is addressable by the new Splunk commands: pivot and datamodel. The challenge with those is they are essentially searching across your data with the fields contained within the model in a one off basis (one DM at a time) and if the fields don't match then there simply is no results. What I was wanting was a way to take all of the fields from my data and throw that up against all of the fields in all of the models with a side of fuzzy string comparison. I *think* I have found a way.

Friday, October 2, 2015

Taming Verbose Windows Logs - Update

In looking at the Windows firewall logs coming out of the Security event viewer (mainly 5156) I realized the space in "program files" was throwing off the regex. You got to love the format of Windows logs. Maybe one day we will ingest the XML version - not likely :(. If anyone has ideas on a better regex I am ALL EARS! I tried using newlines, carriage returns, spaces, etc in front of "Network" (which is 2 lines down from the application name) but wasn't getting the desired results.

This is an update to >this< post.

Transforms
New
(?ms)EventCode=(5156|5152).*?Keywords=(Audit Failure|Audit Success).*?Message=The Windows Filtering Platform (?:has )?([^\.]+).*?Process ID:\s+(\S+).*?Application Name:\s+(System|.+\.exe).*?Direction:\s+(\S+).*?Source Address:\s+(\S+).*?Source Port:\s+(\S+).*?Destination Address:\s+(\S+).*?Destination Port:\s+(\S+).*?Protocol:\s+(\S+).*?Filter Run-Time ID:\s+(\S+).*?Layer Name:\s+(\S+).*?Layer Run-Time ID:\s+(\S+)

Old
(?ms)EventCode=(5156|5152).*?Keywords=(Audit Failure|Audit Success).*?Message=The Windows Filtering Platform (?:has )?([^\.]+).*?Process ID:\s+(\S+).*?Application Name:\s+(\S+).*?Direction:\s+(\S+).*?Source Address:\s+(\S+).*?Source Port:\s+(\S+).*?Destination Address:\s+(\S+).*?Destination Port:\s+(\S+).*?Protocol:\s+(\S+).*?Filter Run-Time

Props Field Extraction
New
^Trimmed Event EventCode=(?<EventCode>5152|5156) (?<Keywords>Audit Success|Audit Failure) (?<Process_ID>\S+) (?<Application_Name>.+) (?<Direction>Outbound|Inbound) (?<Source_Address>\S+) (?<Source_Port>\S+) (?<Destination_Address>\S+) (?<Destination_Port>\S+) (?<Protocol>\S+) (?<Filter_Run_Time_ID>\S+) (?<Layer_Name>\S+) (?<Layer_Run_Time_ID>\S+) (?<TaskCategory>blocked a packet|permitted a connection)

Old
^Trimmed Event EventCode=(?<EventCode>5152|5156) (?<Keywords>Audit Success|Audit Failure) (?<Process_ID>\S+) (?<Application_Name>.+) (?<Direction>\S+) (?<Source_Address>\S+) (?<Source_Port>\S+) (?<Destination_Address>\S+) (?<Destination_Port>\S+) (?<Protocol>\S+) (?<Filter_Run_Time_ID>\S+) (?<Layer_Name>\S+) (?<Layer_Run_Time_ID>\S+) (?<TaskCategory>blocked a packet|permitted a connection)