Assessing and Prioritizing Risk in Your Infrastructure

But, now you have yet another hurdle to jump. Those alerts keep coming…and coming…and coming! In many cases, when these alerts come in, they sound pretty bad. There are usually multiple events that an analyst has to look into, and they’re occurring in different places on your network. In addition, analysts typically aren’t working with much context about the nature of the attack, the resources involved, and so on. It can be daunting to figure out where to start, and what to actually focus on. Should I tackle alert A or alert B? What if I spend an hour on one to find out it was a false positive, while the other issue wreaks havoc on our systems??!!

Event Triage

One common way to try and prioritize this workload is a standard event triage. As alerts are detected, take a cursory look at the information available, assess the severity, and plan to tackle the alerts in order from highest priority to lowest. Assessing and prioritizing risk is part of this process.

For example:
If two alerts came in around the same time, and one was for a port scan attempt, and the other was a potential ransomware outbreak, you’d obviously want to work on stopping the ransomware first! However, there are drawbacks to this approach. For one, this can get very tedious. Although that seems like a minor complaint, staring at potentially hundreds of alerts can be like watching paint dry. It also makes it more likely something will get missed. If an analyst is just trying to clear out the alerts, or speeds through triage so they can do the actual security work, they may inadvertently ignore or gloss over an important event.

In other cases, the prioritization won’t be as clear-cut as in my example above. A more common example of multiple alerts would be if there were 2 port scan attempts happening on different network segments—how do you know which one is more pressing?

Security Event Risk Scores

When Graylog 6.0 was released, we introduced a feature to assist with these types of questions with Risk Scores. These scores are calculated based on the information we have about the alert, the severity of the alert that was fired, and how important the assets involved contribute to the risk score. If the same attack occurred on two systems, but one of those systems was a user laptop, and the other was a database containing your clients’ PII, you’d want to focus on the latter first!

Here’s how our port scan example would look in Graylog:

6.1 Alert Port Scan — Port Scan Detection

Because of the asset system and risk scores, we now have more information to assist in our triage – one of these events has more important assets involved in the port scan, and so it has a higher risk score (and that would be the one to address first). This is definitely better in terms of event triage, but it does still means combing through every event.

Risk from an Asset Perspective

With event triage and risk scores, you can definitely prioritize your work and make things easier for your analysts in terms of where to focus their efforts. But what if we looked at these alerts from a different angle? With Graylog 6.1 we wanted to provide even more context to the environment, so we added features like importing vulnerability scans. But as we looked at this and other contextual enhancements, we noticed that they all centered around the assets themselves. That got us thinking. What if instead of trying to prioritize the events, we prioritized the assets instead? With that in mind, we’ve taken the concept of the Security Event Risk Score and applied it to assets as well. Assessing and prioritizing risk by high value assets not surprisingly, we’re calling it the Asset Risk Score!

Assetts with Risk Score — Assets With Risk Score

Risk Score Formula

This risk score is a slightly different formula – it takes into account all alerts that an asset is involved in (whether it’s a machine or a user). Like the old adage “Where there’s smoke, there’s fire”, a lot of alert activity on a particular machine could indicate an issue, and the risk score of an asset reflects that.

However, that’s not the only advantage to looking at this from an asset perspective. We also allow for additional nuance that a simple event triage might not catch. Let’s say the following 2 events occur:

Unusual remote login activity on a webserver
Unusual remote login activity on an internal development machine

Using event risk scores, one might assume that the Webserver is the more important asset, so it has the higher score. But let’s take a step back and look at it from the asset point of view. From there, we get more information that shifts the calculus a bit:

On the Webserver:

The remote login alert is the only security alert that’s fired in the last two weeks
The machine was recently updated, and a recent vulnerability scan shows no major issues

On the development machine:

The remote login alert is one of 10 security alerts that’s fired in the last two weeks
The machine has not been updated in months, and a recent vulnerability scan shows lots of high CVSS scored issues that should be patched or addressed

With this additional context, the development machine should be our focus, since it’s more likely to be compromised than the Webserver. Graylog’s Asset Risk Scores will reflect this, and allow you to make those decisions with all of that information in mind. We’ve also made it easy to view all of this information right from the Asset view, so you can use both sets of risk scores in your prioritization:

Asset Vulnerability Detail View for Assessing and Prioritizing Risk — Asset Vulnerability Detail View

Asset Event View for Assessing and Prioritizing Risk — Asset Event View

Taking a New Approach for Assessing and Prioritizing Risk

Now with Graylog 6.1, you can now tackle risk assessment in a variety of ways. With the updated asset view, not only will you get a more comprehensive risk score, but you can continue to do your work on events right from the assets themselves. Instead of a siloed approach to the alerts and jumping from case to case, let’s look at the most vulnerable/risky assets, and focus on those instead. By fully investigating an asset, you get a deeper understanding of that machine and it’s purpose, the traffic it normally sees, and where’s its vulnerabilities lie. This can allow you to resolve multiple issues at once as you update/reconfigure/tune the machines based on your learnings.

By focusing on the “asset” rather than the “alert” for assessing and prioritizing risk there is a bit of a philosophical change in how we assess and manage our risk. With any SIEM tool, a holistic approach to how we view and analyze data is key to better understanding and protecting your environment. Standard event triage is still readily available as an avenue to explore your data. However, we are eager for you to investigate from this new viewpoint as well, and to see how it changes an analysts daily routine. We will continue exploring this asset-centric approach further as we continue to develop new features and gather feedback from our users. So please try it out and let us know what you think!

Guest Blogger:

Rich Murphy
Director, Product Management (Security)