Tuesday, June 26, 2012

Statistical vs Rule based Threat Detection

A number of different discussions have led me to think about the difference between log management and SIEM when it comes to their use and play in threat detection. Of the many items that could be discussed what I came to is the difference between statistical and rule based threat detection.

An oft used analogy, even referenced in the Verizon DBIR, is the difference between looking at haystacks and needles in haystacks. A statistical detection methodology might be to review the top N of X activity within Y timeframe. The point of this exercise is twofold. The first is simply to look at the “curve” of the numbers involved in the top 5 relative to the top 10 relative to the top 30 or just a spike in a line graph showing the volume of logs collected. All of which might indicate something has happened and might be worth diving deep on. The second is to help dial in your tools by whitelisting systems performing normal activity. If you are looking for outbound SMTP traffic sorted by volume in a day, you should be able to easily spot your email gateways. Whitelist them and the next time the report is run the top of the list might contain compromised systems. By and large many log management systems should be able to accomplish this sort of activity.

At some point though you will want to focus in on specific threat activity. Take today’s SANs Diary update on Run Forest Run or Sality, Tidserv, whatever. In this case you have specific information and want to receive an alert when one of your internal systems hits an IP, URL, or multi-step pattern of activity. This is your rule based needle finding capability. Generally speaking this requires a rule engine of varying level of sophistication located within some point solution or product. Statistical detection won’t really be able to get you this.

The challenge is many point solutions, by design or omission, aren’t able to factor in the larger view in reviewing the needles found. MSSPs are notorious (too strong?) for this but then so are things like IPS. “We saw this and this so we wrapped it up in a pretty bow for you” ….great, but I need more context. The SIEM technology space, in general, was supposed to fill this gap. Not only can you develop specific rules to find needles in the overall stream of log consciousness but to a greater or lesser extent, based on vendor/tool/administrator, use them in a more statistical way. I think a lot of “Mah SIEM sux” mentality comes from how you approach your SIEM relative to this overall issue. That is a rabbit hole that I don’t want to go down though.

Especially when you first start out, if you dive directly to a rules based approach you will have a harder time seeing the forest for the trees and depending on the tool used will be frustrated that you can’t move that lens backward. In other words dealing with individual infections IS key but if you are so focused on the individual detections you lose sight of the bigger picture you can do yourself a disservice. On the other hand if you just do a statistical approach and never grow you are going to miss the needles you need to find.

I would argue there is a direct correlation between shop maturity and the ability to full leverage rule based technology. If you are just starting out I suggest you will see more value with a log management, statistical threat detection methodology. This will allow you to get to know your data – as strange as that might sound – which will allow you to better dial in your rule based solutions.