18 Nov Gartner Market Guide for SOAR – Key Points by Phase, Part Three of Four: Respond
In this third of a four-part blog we highlight key points for the Respond SOAR phase that prospective buyers miss when evaluating, selecting and deploying a platform. As a result, while the Respond phase seems like the most straight-forward SOAR phase, many projects fail here by not:
- Recognizing the intra and inter-enterprise complexities
- Focusing on Process vs. Systems automation, and
- Demonstrating clearly to IT Operations when Cyber Risk exceeds Operational Risk
The Gartner Market Guide for Security Orchestration, Automation & Response (SOAR), published June 27, 2019, (by Analyst(s): Claudio Neiva, Craig Lawson, Toby Bussa, Gorka Sadowski) provides a reference model that breaks the SOAR solution scope into four phases – Detect, Triage, Respond and Prioritize.
See, “Figure 1. SOAR Overview,” below to see this overview reference model.
Figure 1. SOAR Overview
It’s important in this phase, particularly within Security Services Providers (MSPs, MSSPs, and MDRs), that the processes and roles/responsibilities vary client to client. Within an enterprise, these variances could be across Divisions or Business Units. This variance complicates efforts to leverage best practices without the ability to easily mix human process and system automation within a SOAR solution. For the purposes of this blog series, we’ve been focusing on the Enterprise SOAR use case. While the processes are similar for Security Services Providers, they differ due to the multi-tenancy, tool variability, and process variations (e.g., how much automation is the Service Provider entitled, authorized to execute).
Key Point 1: Triage = Reducing Risk; but Respond = Managing Risk.
As discussed in the last blog, Triage phase is about finding and validating bad stuff faster in order to begin containment and remediation. Response is about executing actions necessary to contain (e.g., prevent the spread of malware) and then fully remediate any potential risks that have been validated by Triage. These respond actions are often a combination of both technical and procedural steps. Technical steps include actions such as blocking specific IPs, URLs at the network and endpoint levels, resetting compromised user passwords, removing malicious emails from mailboxes, etc. Procedural actions include notifications to compliance of Incidents involving PII or PHI to determine what, if any, disclosure is required, scheduling and confirming Security Awareness training for individuals repeatedly falling victim to phishing attacks, or creating a ITSM ticket to permanently restrict malicious domains temporarily blocked for containment.
For example, imagine an Incident comprised of several Alerts including failed database server login attempts from locations outside a user’s typical work locations, malware detections on the user’s endpoint, and Data Loss Prevention (DLP) Warning Alerts involving Personally Identifiable Information (PII) in a timeframe just prior to these failed user logins. The Response process may involve:
- Automatically notifying HR, and documenting the investigation process within the HR system in the event employee action is recommended, and receive case number confirmation
- Automatically notifying Compliance about an Incident involving PII, such that the necessary reviews can take place to determine what, if any, disclosure is required and tracking the status/results of that decision-making process
- Automatically placing the User’s endpoint in quarantine, resetting the User’s password within Active Directory, and journaling email in O365 for eDiscovery
The focus shifts in Respond to automating what is under Security Operations’ control, while instrumenting a digital processes, including all stakeholders, to ensure end-to-end case management and audit-level data capture. Over time, these observations drive leaning about how to shrink that elapsed time of the key metrics and can lead to breaking down or even removing intra and inter-company barriers and adopting automation (more on that in Key Point #3). While many suggest applying Machine Learning or AI to this problem, the fact is most enterprises don’t have enough incidents to generate a large enough data set to generate meaningful models. In additional, models across enterprises are extremely difficult to normalize due to the high variance among the processes used. As Gartner Analyst, Augusto Barros, pointed out in his 2019 Security & Risk Management Summit presentation, despite large volumes of playbook templates available from SOAR vendors, most implementations have generated their own, unique processes.
They key is that these processes involve the extended enterprise, and trigger additional workflows, which may not be able to be automated, but that the status of these Tasks must be tracked, timestamped, and captured in audit-level detail to support the overall Response process, capture and report metrics, as well as provide input to the Incidents knowledge base for compiling “Lessons Learned.”
Key Point 2: Respond is More Process Than System Automation:
Unlike the non-linear nature of validating Alert risks and investigating correlated Alerts for escalation to an Incident, the Incident Response processes can be more predictable, and are often governed by the presence of key criteria (e.g., PII, PHI, etc.) such that Playbooks templates can be more heavily leveraged, provided they enable the inclusion of intra and inter-enterprise stakeholders who may not be regular participants in the Security Operations processes or have logins to the SOAR platform.
These processes, as defined in various digital workflows, can be more easily shared and leveraged across a community, preferable using an open-source data model like OSSEM and an open source orchestration engine like StackStorm, that’s not tied to a particular vendor’s proprietary technology, programming language, or orchestration engine. A key element of these response Playbooks is how to involve non-Security and even external parties in the process. Options should include structured email notifications, common workflow apps like PagerDuty, or a private, one-time weblink for receiving input (e.g., Approvals).
The actions executed automatically in this phase often involve generating tickets in an ITSM platforms and enabling bi-directional updating and status tracking, as the operational risk of having the SOC making production system changes outside of pre-approved change control windows is perceived as higher than the perceived cyber risk associated with remediating these Incidents (more on this in the next “Key Point”). As a result, Respond Playbooks contain more Process vs. System automation, involving a broader set of stakeholders.
However, regardless of the system-to-human automation mix in these Playbooks, every action must be captured with audit-level details in order to continuously measure effectiveness and recommend improvements. Playbooks must also ensure the flexibility necessary to tailor the key metric definitions to a customers’ preferred approach because not all firms calculate their metrics the same. For example, some teams define “Dwell Time” as the time from originating event to the time a risk is validated, while other might define this as the time when an Alert is ingested into the SOAR queue and either robotic or process automation begins.
See, “Figure 2. Mature SecOps Process Model,” which highlights the scope of the typical “Respond” procedures.
Figure 2. Mature SecOps Process Model
Key Point 3: Enable Respond Automation Adoption by Demonstrating Cyber vs. Operational Risk:
The ability to quickly close a risk exposure window associated with cyber threats though automation is often hampered by the need to follow pre-approved change control protocol and procedures. Most enterprises do not enable their SOC of CFC to dynamically make changes to production systems without going through approved change control processes. This is due the adoption of ITIL approved procedures, regulatory requirements for “separation of duties”, and potentially the most addressable, the human element of trust.
In addition, Respond workflow or “Playbook” actions often involve coordinating with other enterprise functions – legal, compliance, IT – as well as external 3rd parties – and the amount of automation possible with these tasks is constrained. In these scenarios, a SOAR solution has less immediate impact on the elapsed time it takes to contain and remediate given these dependencies.
For example, when investigating a series of URLs associated with suspected phishing campaign, Security can use automation to block these URLs immediately, and then release the blocks after the investigation determines that they’re benign. The Cyber risk if these URLs turn out to be malicious is significant, as the enterprise is protected from further instances of potential malware infection, recon, and exfiltration by closing the exposure window within seconds of identifying the potentially malicious URLs. The operational risk of this automation is minimal, essentially delayed email delivery, if these URLs turn out to be benign. The delayed delivery of emails is something most users won’t even notice, since emails are by nature an asynchronous means of communications. However, the cyber risk reduction in the event these URLs are truly malicious is significant.
Another example could be suspicious user activity in the form of off-hours logins from locations atypical of the user’s behavior. The cyber risk associated with compromised user credentials are comprised is significant, particularly if the user has direct access to sensitive IP, PHI, or PII. If the Security team can automate forcing a password reset for the user in question, the risk of further illegitimate access is mitigated, while the operational risk of the inconvenience of asking the user to reset their password is negligible.
Getting IT teams to understand and accept the protocol by which production systems changes would be made automatically by Security requires a culture shift, which starts with information. Reporting containment and remediation metrics that highlight the need to shrink existing risk exposure windows can help IT Operations teams become more comfortable introducing response automation. As you deploy SOAR, it’s critical to highlight use cases where the cyber risk outweighs the operational risk, and gain support for incrementally introducing these automations with the IT Operations teams.
Ultimately, the SOAR platform should be the brains of the process and capture all the relevant status changes/updates, detailed audit trails, and function as the system of record for all automated and human actions. It is this system of record that generates the reporting that serves as the catalyst for process changes and Responds to automation adoption. As you deploy SOAR, it’s critical to highlight for the IT Operations team the use cases where the cyber risk outweighs the operational risk, and gain support for incrementally introducing these automations.
Syncurity believes a more effective Respond phase requires SOAR platforms to go beyond basic system automations to provide more comprehensive process automation supported by workflow, case management and lessons learned/knowledge management. By highlighting the Cyber vs. Operational risk trade-offs and leveraging mature Playbooks that include a mix of human and machine actions, Enterprises and Security Services Providers can better manage risk, further shrink their exposure windows, and engage the extended enterprise and supply chain.
Syncurity also believes these key points are supported by the Gartner SOAR research and Market Guide. For Enterprises and Security Services Providers looking to evaluate SOAR platforms, the “keys” to success in the Respond phase are:
- Triage=Reducing Risk; Respond=Managing Risk
- Focus on holistic Process vs. Systems Automation
- Enable Respond Automation Adoption by Demonstrating Cyber vs. Operational Risk
In the next and final blog of this four-part series, we’ll examine keys to success in the “Prioritize” SOAR phase.
What do you think? Let us know at https://www.syncurity.net
Gartner, Market Guide for Security Orchestration, Automation and Response Solutions, Claudio Neiva, Craig Lawson, Toby Bussa, Gorka Sadowski, 27 June 2019. Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.