Mark Runals' Blog: Log Management

Showing posts with label Log Management. Show all posts

Wednesday, July 16, 2014

Taming verbose Windows logs in Splunk

As you get into the world of logs you quickly realize how 'heavy' Windows logs are. By that I mean verbose. In this space verbose = length and log length translates to increased storage and licensing issues. Many log generators simply say this did that or this talked to that over this port. Pretty quick and dirty. Windows logs are generally along the lines of "dear reader, I've observed many events in the course of my life and here is something I thought I should bring to your attention. I will go on at length about this though really only give small pieces of important information with little to no explanation forcing you to scour the Internet looking for others who have gone through this self same issue." I ran a quick search in my Splunk environment and found the average Windows event code to be 630 bytes.

Splunk, timestamps, and the DateParserVerbose internal logs - Part 1

Splunk is a pretty powerful piece of software. There is the obvious search and analytic capabilities it has but there is some robustness under the covers as well. One of those under-the-cover capabilities is detecting and understanding timestamp data. Its the sort of thing that as users of the software we simply accept and generally speaking don't spend a whole lot of time thinking about. From an admin perspective as you start to put some effort into understanding your deployment and making sure things are working correctly one of the items to look at is the DateParserVerbose logs. Why you ask? I've recently had to deal with some timstamp issues. These internal logs generally document problems related to timestamp extraction and can tell you if, for example, there are logs being dropped for a variety of timestamp related reasons. Dropped events are certainly worthy of some of your time! What about logs that aren't being dropped but for one reason or another Splunk is assigning a timestamp that isn't correct? In this writeup I will share a query you can use to bring these sorts of events to the surface and distill some quick understanding.

A cappella log management

A curious thought hit me the other day in church; hopefully I can articulate it well enough to be understood – even by me. For those that aren’t familiar there is often full congregational signing as part of the worship service. As you might expect this includes musicians playing various instruments, several folks up on the stage signing as part of the choir, and in the olden days a hymnal in your hands that has the words. I’m going to murder terms but to translate it in my head you have the technical musicians translating symbols on a page of music into sounds on their instruments, the next layer up are the signers who combine the lyrics with the symbols to know how to sing the words, and then you have a presentation layer where the rest of the congregation is able to see the words and take their singing cues from the music and choir. More often than not these days those words are presented on a projector screen without the musical score which works for me since I can’t read that bit anyway. At any rate there is a cumulative layering effect that allows folks like me who aren’t musically inclined or trained to participate. (Is there a musical OSI equivalent?)

The difference a couple weeks ago was the music was done a cappella and only a small number of the choir was singing. While I could see the words on the projector I was much slower to catch on to the “tune.” I and many others didn’t sing as loudly that day (this is a good thing in my case).

So what’s the link you ask? Going through the process of leaving my job to pursue another I have been somewhat introspective in thinking of what I could have done differently to try to explain the need for log management and where that fits into larger discussions of security strategy, incident detection, incident investigation, monitoring, etc. By and large I think the message wasn’t understood the higher you go up the management chain. And while I’m somewhat saddened and disappointed by that fact I’m not going to beat myself up over it. I tried several times and several different ways to communicate the need and why certain paths were better than others not only for immediate needs but also for where that positioned us 6 months, a year, several years down the road.

If you are reading this blog you, like me, can at least at a conceptual level look at something like an incident detection use case and probably see all or at least a lot of what must go into what is required for the alert to be tripped from a log collection perspective, what needs to happen once it goes off, how that use case fits into the larger picture, etc - much like my wife can look at a sheet of music and hear the music in her head. The challenge I pose to us all is - are we communicating our message a cappella in that our audience is perhaps just seeing “the words.” Something to think about the next time you do a presentation perhaps.

Tuesday, June 26, 2012

Statistical vs Rule based Threat Detection

A number of different discussions have led me to think about the difference between log management and SIEM when it comes to their use and play in threat detection. Of the many items that could be discussed what I came to is the difference between statistical and rule based threat detection.

An oft used analogy, even referenced in the Verizon DBIR, is the difference between looking at haystacks and needles in haystacks. A statistical detection methodology might be to review the top N of X activity within Y timeframe. The point of this exercise is twofold. The first is simply to look at the “curve” of the numbers involved in the top 5 relative to the top 10 relative to the top 30 or just a spike in a line graph showing the volume of logs collected. All of which might indicate something has happened and might be worth diving deep on. The second is to help dial in your tools by whitelisting systems performing normal activity. If you are looking for outbound SMTP traffic sorted by volume in a day, you should be able to easily spot your email gateways. Whitelist them and the next time the report is run the top of the list might contain compromised systems. By and large many log management systems should be able to accomplish this sort of activity.

At some point though you will want to focus in on specific threat activity. Take today’s SANs Diary update on Run Forest Run or Sality, Tidserv, whatever. In this case you have specific information and want to receive an alert when one of your internal systems hits an IP, URL, or multi-step pattern of activity. This is your rule based needle finding capability. Generally speaking this requires a rule engine of varying level of sophistication located within some point solution or product. Statistical detection won’t really be able to get you this.

The challenge is many point solutions, by design or omission, aren’t able to factor in the larger view in reviewing the needles found. MSSPs are notorious (too strong?) for this but then so are things like IPS. “We saw this and this so we wrapped it up in a pretty bow for you” ….great, but I need more context. The SIEM technology space, in general, was supposed to fill this gap. Not only can you develop specific rules to find needles in the overall stream of log consciousness but to a greater or lesser extent, based on vendor/tool/administrator, use them in a more statistical way. I think a lot of “Mah SIEM sux” mentality comes from how you approach your SIEM relative to this overall issue. That is a rabbit hole that I don’t want to go down though.

Especially when you first start out, if you dive directly to a rules based approach you will have a harder time seeing the forest for the trees and depending on the tool used will be frustrated that you can’t move that lens backward. In other words dealing with individual infections IS key but if you are so focused on the individual detections you lose sight of the bigger picture you can do yourself a disservice. On the other hand if you just do a statistical approach and never grow you are going to miss the needles you need to find.

I would argue there is a direct correlation between shop maturity and the ability to full leverage rule based technology. If you are just starting out I suggest you will see more value with a log management, statistical threat detection methodology. This will allow you to get to know your data – as strange as that might sound – which will allow you to better dial in your rule based solutions.

Wednesday, February 1, 2012

The unrecognized APT story....

2 hobbits sneaking into Mordor.

Somehow I don't see me using that analogy in the executive boardroom though.

Thursday, June 2, 2011

SIEM alerts and an analogy to help people 'get it'

It may just be me but does anyone else struggle with people in general and over-arching 'management' not really getting it that when you get or see a security based event from your LM or SIEM; that this is just the beginning of a process and not the end? I mean they say they get it but you just have this feeling that they really don’t get it? Now I’m not saying my statement reflects my current management, it is more a generalized observation as you start to bring people into the conversation who might not have been exposed to this sort of technology or the exposure is limited to a conversation about needing to monitor something. And there is a level of irony as some of these folks are in the actual vendor space.

I’m sort of an analogy guy and this has been one of those things that has plagued me for some years now – what is a good equivalence that helps people get it? I thought of one today that might work but still feels sort of rough. Figured I would use this as a sounding board and hopefully get some feedback. Who knows, maybe just putting it down on paper will help refine it.

While driving around in your car you generally are vaguely aware of, but may or may not pay a whole lot of attention to, your gas gauge. Some probably have more of an alert mentality and wait for the gas to get to the last X% when you get the audible chime indicating you are low. You now are more focused on the situation and ‘remediate’ the issue by stopping off at a gas station and filing up. I think this is somewhat analogous what ‘management’ thinks of when they hear things along the line of monitoring alerts. That the process is generall closed and ends when you get the alert. Alert > gas station > fixed. The problem is that isn't what we are talking about. SIEM/LM alerts, I think, are more akin to all of the gauges in your car that don’t even show until there is an issue. Things like low oil pressure, over heating, your engine service light. Any of those things coming on means something isn’t right somewhere but unless you pop the hood and/or break out some diagnostic equipment you don't know the severity. Could be a complex, multi-component breakdown or just something a little out of whack that takes a simple fix. Oh and don't forget the number of things that can happen to your car that you don't have a gauge for: breaks, suspension, tire alignment, etc. Multiply that picture by the number of end point systems you have in your environment and I think we are getting a little closer to a decent analogy. The point is the alert often isn't actionable other than letting you know you need to start an investigative process.

What do you think - does it work? Do you have suggestions, tweaks, or better analogies that have worked for you as you try to convey to people that monitoring alerts is more than just sitting around waiting for them to pop in your inbox?

Friday, May 6, 2011

The mortar between your defenses

The other day I read a bit by Andreas M. Antonopoulos on Networkworld about how to be an effective security buyer. Of course when it came to finding the article again when I wanted to write this….I couldn’t find it. +1 to the Interwebs though because Mike Rothman over at Securosis mentioned it in Wednesday's Incite 4 U. Andreas’ advice seems to be when you are buying security tools to not buy something designed to fulfill a singular function. Instead go for multi-purpose tools that can cover down on multiple areas. I think the idea somewhat boils down to knocking out two birds with one stone + it sucks to have to look at one dashboard for each tool you have. Enterprise resource scaling aside though I tend to agree with Mike’s take. What really stood out to me was an analogy Andreas used:

Monitoring logs and SIEMs

So there are several SIEM articles that tackle SIEMs vs Log Management, reasons you might want to use one in your environment, and what you might hope to see with one. I’m kinda a simple guy though and while I have recently tried to wrap my head around the nuances between monitoring log management style vs “just” alerting and how you might use a SIEM to both ends I was hit with a rather simplistic realization - the SIEM is already monitoring the logs.

See, I said it was simplistic. Here is what I mean though. I currently have multiple tickets open with ArcSight support and having reviewed multiple sets of manager logs one of the techs was kind enough to open an additional ticket based on some things she saw. I had three takeaways:
1. I should monitor the ArcSight logs. (I’m insightful aren’t I)
2. If I had been monitoring them I would have picked up on the warnings/errors/whatever that was seen
3. My manager logs rolled every 2 hours – used to anyway.

This, to me, makes the manager logs a perfect candidate for insertion into the SIEM. Granted part of the “perfect” is that the logs are being generated by something designed to suck in logs so it doesn’t seem to make a whole lot of sense to have something other than itself actually monitor…itself. The other side of this is while there is value in actually looking at the logs periodically I think this is also a perfect case of being able to just set up alerts.

The real takeaway then is maybe an adjustment to the argument that SIEMs are an “analyst in a box.” While they aren’t really an analyst they are at least some/part/multiple/an FTE who is, or at least capable of, monitoring logs. The next takeaway is according to the 2009 Verizon Data Breach Investigations Report we should be looking at application logs anyway. I mean the OS logs I am pulling in from the server hosting the ESM certainly aren’t telling me anything about these “guardoverflow” events that I should apparently search for in my logs in order to fix some data monitors.

Mark Runals' Blog