Wednesday, July 16, 2014

Taming verbose Windows logs in Splunk

As you get into the world of logs you quickly realize how 'heavy' Windows logs are. By that I mean verbose. In this space verbose = length and log length translates to increased storage and licensing issues. Many log generators simply say this did that or this talked to that over this port. Pretty quick and dirty. Windows logs are generally along the lines of "dear reader, I've observed many events in the course of my life and here is something I thought I should bring to your attention. I will go on at length about this though really only give small pieces of important information with little to no explanation forcing you to scour the Internet looking for others who have gone through this self same issue." I ran a quick search in my Splunk environment and found the average Windows event code to be 630 bytes.


So what to do about this and why am I posting? There is a group sending logs here ... how shall I say this... that gave me the opportunity to explore creative ways to adjust logs coming into Splunk. Folks let me just say turning on object access in Windows can be a a great and appropriate thing to do. However, this is not something you want to do across the board or just willy-nilly as they say. We were getting millions of logs relating back to object auditing, specifically close events (4658). I will say with a Splunk 6.x agent you can blacklist entire events but I was more inspired by something Adrian Hall had posted here. Since these particular logs really only contain 2 pieces of more critical data in the end what I ended up doing was re-writing the event! To do so add the following lines as appropriate

--- Props.conf ---

[WinEventLog:Security]
TRANSFORMS-winsec_events_manipulation = replace4658
EXTRACT-winsec_4658_custom_fields = ^Trimmed Event EventCode=(?<EventCode>4658) Handle_ID=(?<Handle_ID>\S+) Process_ID=(?<Process_ID>\S+)

---- Transforms.conf ---

[replace4658]
REGEX = (?ms)EventCode=(4658).*?Handle ID:\s+(\S+).*?Process ID:\s+(\S+)
DEST_KEY = _raw
FORMAT = Trimmed Event EventCode=$1 Handle_ID=$2 Process_ID=$3

You wouldn't believe the difference from a license volume perspective that made. With that done I took a look at other events I could re-write. Based on things like size and volume I turned my attention to the Windows Filtering Platform. These are logs I have mixed emotions about. They are good to have but often you will have scads and scads of them for the same crap over and over. In particular I tackled 5152 and 5156 (blocked a packet and allowed a connection respectively). Since we already have an appropriate stanza in props I'll just add a bit to the TRANSFORMS line and add a stanza to the actual transforms.conf

--- Props.conf ---

[WinEventLog:Security]
TRANSFORMS-winsec_events_manipulation = replace4658,replace5152_5156
EXTRACT-winsec_4658_custom_fields = ^Trimmed Event EventCode=(?<EventCode>4658) Handle_ID=(?<Handle_ID>\S+) Process_ID=(?<Process_ID>\S+)
EXTRACT-winsec_5152_5156_custom_fields = ^Trimmed Event EventCode=(?<EventCode>5152|5156) (?<Keywords>Audit Success|Audit Failure) (?<Process_ID>\S+) (?<Application_Name>.+) (?<Direction>Outbound|Inbound) (?<Source_Address>\S+) (?<Source_Port>\S+) (?<Destination_Address>\S+) (?<Destination_Port>\S+) (?<Protocol>\S+) (?<Filter_Run_Time_ID>\S+) (?<Layer_Name>\S+) (?<Layer_Run_Time_ID>\S+) (?<TaskCategory>blocked a packet|permitted a connection)

---- Transforms.conf ---

[replace4658]
REGEX = (?ms)EventCode=(4658).*?Handle ID:\s+(\S+).*?Process ID:\s+(\S+)
DEST_KEY = _raw
FORMAT = Trimmed Event EventCode=$1 Handle_ID=$2 Process_ID=$3

[replace5152_5156]
REGEX = (?ms)EventCode=(5156|5152).*?Keywords=(Audit Failure|Audit Success).*?Message=The Windows Filtering Platform (?:has )?([^\.]+).*?Process ID:\s+(\S+).*?Application Name:\s+(System|.+\.exe).*?Direction:\s+(\S+).*?Source Address:\s+(\S+).*?Source Port:\s+(\S+).*?Destination Address:\s+(\S+).*?Destination Port:\s+(\S+).*?Protocol:\s+(\S+).*?Filter Run-Time ID:\s+(\S+).*?Layer Name:\s+(\S+).*?Layer Run-Time ID:\s+(\S+)
DEST_KEY = _raw
FORMAT = Trimmed Event EventCode=$1 $2 $4 $5 $6 $7 $8 $9 $10 $11 $12 $13 $14 $3

This ended up taking those logs, which would run about 700 bytes, down to 150(-). In the end with those 2 changes (3 events) I cut out multiple hundred gigs in license volume.

Now - you may be thing "why not re-write ALL Windows events." I suppose you could if you wanted. I think there is probably better things to do with your time /shrug. I at least have not taken this any further than what I have (well with the exception that I did do some of the event manipulation in Adrian's post). I should also note that one thing I wanted to do was still be able to leverage the automatic lookups within the Splunk_TA_windows app/configs. That is why I specifically left in the "EventCode=X" bit and keywords.

Anyone doing anything similar or have taken a different approach?

-----
10/02/15 Update

The regex above is new as it wasn't accounting for the space in "program files" /sigh at Microsoft. I'm not overly happy with the replacement solution thus far but in the few minutes of testing it appears to work. The old regex is here

(?ms)EventCode=(5156|5152).*?Keywords=(Audit Failure|Audit Success).*?Message=The Windows Filtering Platform (?:has )?([^\.]+).*?Process ID:\s+(\S+).*?Application Name:\s+(\S+).*?Direction:\s+(\S+).*?Source Address:\s+(\S+).*?Source Port:\s+(\S+).*?Destination Address:\s+(\S+).*?Destination Port:\s+(\S+).*?Protocol:\s+(\S+).*?Filter Run-Time

This also requires an update to the props field extraction. I've made the change above. The old is here

^Trimmed Event EventCode=(?<EventCode>5152|5156) (?<Keywords>Audit Success|Audit Failure) (?<Process_ID>\S+) (?<Application_Name>.+) (?<Direction>\S+) (?<Source_Address>\S+) (?<Source_Port>\S+) (?<Destination_Address>\S+) (?<Destination_Port>\S+) (?<Protocol>\S+) (?<Filter_Run_Time_ID>\S+) (?<Layer_Name>\S+) (?<Layer_Run_Time_ID>\S+) (?<TaskCategory>blocked a packet|permitted a connection)


3 comments:

  1. This is all well and good, but in what directory (TA Windows) of which server (UF, Indexer?) do you apply these?

    ReplyDelete
    Replies
    1. Props and transforms.conf files are put on the indexers generally speaking when you have a UF -> Idx data flow. The reason is universal forwarders are 'dumb' in that they simply are pushing events with virtually no processing. In cases where you have a UF -> HF -> Idx you can and sometimes have to push the props/trans to the heavy forwarders as they will do some of the lower level processing (ie setting host name field). In either setup for the examples I've given above they need to live on the indexers.

      That hopefully covers which server they live on. The directory is a different sort of animal based on your environment. If you have a deployment server I would create a TA or config package and push it from the DS. That will land on the idx at $SPLUNK_HOME$/etc/apps (or you could manually put it there I suppose). Another alternative is to put it in the $SPLUNK_HOME$/etc/system/local directory but if you do and you have a DS it won't be able to override the configs.

      Hopefully that helps some!

      Delete
  2. This is still good in 2022! Thank you Mark.

    One thing I will say as well, the EXTRACT lines in props.conf need to exist on your Search Heads.

    ReplyDelete