What are Key SOC Metrics and How to Define Them?

22 October 2022 | by Xavier Bellekens

Effective SOC managers embrace data and use SOC metrics to identify and fix problems, but let’s start with the beginning.

What is a Security Operation Center?

A SOC, or security operation center, is a facility where organizations can monitor and manage their cybersecurity posture. SOCs typically combine technology and human expertise to provide 24/7 monitoring of an organization’s networks and systems. SOCs can be either physical or virtual, and can be staffed by in-house personnel or by third-party providers. SOCs typically use a range of tools and processes to detect and investigate potential security incidents. These may include SIEM (security information and event management) systems, analytics platforms and threat intelligence feeds. SOCs also typically have defined procedures for incident response, which may includes measures such as quarantining compromised systems and notifying relevant stakeholders. By providing organizations with visibility of their cybersecurity posture, SOCs can help them to identify and mitigate risks in a timely manner.

What is a security incident?

A cybersecurity incident is any event that jeopardizes the security of an organization’s information systems. This can include everything from a data breach to a denial of service attack. In recent years, cybersecurity incidents have become increasingly common, as organizations have become more reliant on computer networks and the internet. As a result, it is essential for businesses to have robust cybersecurity protocols in place to protect their data and resources. While there is no guaranteed way to prevent all cybersecurity incidents, having strong security measures in place can help to minimize the risk of an attack and the consequent damage.

When an incident occurs, security operations centers are responsible for identifying, assessing, and responding to cybersecurity incidents. They use a combination of technology and human expertise to constantly monitor for threats and take action to mitigate them. In addition to responding to incidents, the SOC also works to prevent future attacks by constantly improving their detection capabilities. By staying one step ahead of the attackers, the SOC helps to protect our critical infrastructure from the ever-growing threat of cyberattacks.

What are false positives in cybersecurity?

A false positive in the context of cybersecurity is an alert that incorrectly flags a benign file or event as malicious. This can cause businesses and individuals to waste time and resources investigating and responding to non-existent threats. False positives can also create a sense of “alert fatigue” whereby users become numbed to warnings and start ignoring them altogether. Unfortunately, there is no silver bullet for eliminating false positives, but there are some steps that businesses can take to reduce their occurrence. For example, they can fine-tune their security rules and settings, implement better quality control measures, and provide ongoing training for their staff.

Security operations centers face on average 72 to 80% false positive security alerts on a daily basis, due to the variety of tools they have to monitor. We’ll discuss later how to increase the number of true positives received by the SOC.

Why do we need key SOC metrics?

Data is one of the most valuable commodities. However, all data types do not have the same value.

As a result, understanding the value of the data we collect is essential. Appropriate telemetry and appropriate data sources can go a long way to improving response time and improving the SOC performance.

Unfortunately, measuring the effectiveness of a SOC can be difficult. This is where SOC metrics come in. By tracking key performance indicators, SOC managers can get a clear picture of the posture the SOC is in. This information can then be used to identify weaknesses and implement improvements. In other words, metrics are essential for ensuring that the efforts of the analysts are effective. Without them, it would be very difficult to make informed decisions about how to measure efficiency.

What are the Key performance indicators for a SOC?

First we need to define what the metrics look like, as not all are technical metrics.

  1. Data and Health
  2. SOC Coverage
  3. Human Performance
  4. Analytics and Incident Handling
Plan Do Check Act Circle for SOC

These should be revised quarterly in a Plan, Do, Check, Act fasion

Data and Health

When we think of the data collected, we must answer a number of questions

  1. When do we receive the most alerts through the SOC?
    A simple temporal statistical analysis can help your SOC manager to identify the total number of alerts received over a specific period of time. Understanding how many events are generated at a particular period can help plan more effectively for the number of analysts on call during that period.

    This can also help to identify outages. For example, if you are expecting a large number of incidents on a given day, but the number is low, there might be an ongoing issue that need resolved.
  2. What data are we receiving?
    Understanding where the telemetry and type of data comes from can help improve the number of true positives and help in the continuous improvement and classification of events.
  3. How long do events wait at each stage?
    It is important to understand when an event first entered the queue and when that event was first worked on by an analyst.

    Timing should be measured everywhere along the event pipeline, this might help find issues with the I) Ingestion and Parsing of events, II) The correlation and Analysis of events of simply raising an issue of III) batched queries and processing times.
  4. How long do events wait based on their criticality?
    Not all events are equal. With events ranging from low to critical, it is imperative to understand this key metric.
    1. Critical events are usually taken care of within 3 minutes,
    2. High events are usually taken care of within 10 minutes
    3. Medium events are usually taken care of within 30 minutes to an hour
    4. Low events are usually taken care of within 1 to 2 hours

SOC Coverage

  1. Key SOC metrics include the coverage of your SOC, for example, how many nodes on the network do you receive data & telemetry from.
  2. How many malicious activity and cyber threats can you map against the MITRE ATT&CK framework.
  3. Is your SOC’s performance tightly linked with your SOC coverage, and could a larger coverage help reduce risk?
  4. How much visibility do you have across the technology and compute stack? Can you see OS (Windows, Linux, etc), Cloud, Virtualization, Docker, Application, Network events? Which ones are missing from your security operations.

You can also identify the number of systems managed by your security operations center and classify them by business function, owner function, type of configuration, team ownership. These techniques can help identify rogue devices on the network and help contextualize some alerts with relevant information, as well as provide a holistic view of the organization.

Human Performance

False positives are notorious for creating alert fatigue and SOCs are riddled with irrelevant alerts, constantly putting pressure on the security team, security professionals and analysts.

The SOC performance is tied to the performance of the analysts, hence it’s key to understand the roles of each member of the team within the SOC.

  1. Dress, a profile of the analysts
    1. When did the analyst join the team, and how long has s.he has been in their current role?
    2. How many alerts have they triaged in the last month?
    3. What is the percentage of true positive alerts escalation?
    4. What is the mean time taken to close a case?
  2. Create a technical profile for the analysts
    1. How many analysis scripts have been created?
    2. How many detection scripts have been created?
    3. How many lines of codes have been committed to the SOC repository in the last month?
    4. What are the rate of successful and fail queries in the last month?
    5. What is the structure of the queries run and what is the time per query taken?

Analytics and Incident Handling

  1. What’s the mean number of alerts we handle per day?
    If there is a change, let’s investigate, the new number of clients, the type of incidents that occurred on the day, or if there was an ongoing red teaming and or purple teaming activity planned.
  2. What is the dwell time of adversaries?
  3. What is the mean time to triage and escalate an event?
  4. What is the mean time to eradicate and recover from a malicious action?
  5. Was the eradication sufficient. Note, this is often measured over a longer period of time?
  6. Have we taken a proactive approach to incidents?
  7. Evaluate the financial cost of every event seen by the incident response team

How can we improve the alerts quality in the SOC?

Given we know that the average SOC yields 72 to 80% false positives, how can we improve this key SOC metric that often leads to alert fatigue.

  1. Improve the telemetry you receive from tooling such as IDS, Firewall, EDR, XDR etc
  2. Tune your SIEM to the environment. A well tuned SIEM will make a world of difference and will be money well spent.
  3. Improve the security controls
  4. Measure everything and collect SOC metrics for every process.
  5. Use cyber deception to obtain 100% true positive alerts.

    Learn how Lupovis deception as a service platform fits into your SOC to deliver higher ROI than your existing security investments.

In conclusion

The quality of your cybersecurity operation center is rooted in your ability to measure and obtain appropriate metrics.

Metrics provide insight into how well your team is performing and what improvements need to be made. Without metric data, it is difficult to tell whether or not your team is accurately detecting and responding to threats. Additionally, SOC metrics can help you identify areas of improvement for future training and development. Without a comprehensive understanding of metrics, it is difficult to optimize the performance of your cybersecurity operation center. By measuring and obtaining appropriate metrics, you can ensure that your team is providing the best possible defense against cyber threats.

22 October 2022 | by Xavier Bellekens

Speak to an Expert

Whether you have a specific security issue or are looking for more information on our Deception as a Service platform, simply request a call back with one of our security experts, at a time that suits you.