About the Tool
Whether proactively and iteratively searching through networks to detect and isolate advanced threats or conducting a digital forensics and incident response, cybersecurity involves studying security alerts and performing actions to investigate, verify, and escalate. A security operations analyst must triage alerts, collect evidence for investigations, verify the authenticity of collected data, and present the information in a format friendly to downstream consumers. This task is challenging because security alerts are often inscrutable; sophisticated machine learning algorithms are responsible for raising the security alerts, and the analyst has no understanding of the implicit rule used by the algorithm. Often a security alert is triggered when a machine learning algorithm detects a so-called anomaly—a deviation from normal or expected behaviour. The implicit assumption is that malicious activity exhibits characteristics not observed for regular usage. This assumption is inadequate since not all statistical anomalies constitute security events. For example, an increase in network traffic might be statistically interesting, but from a security point of view, it will rarely ever represent an attack. Because the underlying assumption of anomaly detection is too lenient, an operator must deal with many false alarms—alarms that are difficult to rationalise and contextualise.
Substantial research on explainable anomaly detection for cybersecurity applications has focused on methods that highlight which features of a security event (a data point) significantly contribute to the classification of the event as an anomaly. For example, prevailing explainable anomaly detection algorithms may tell an operator that the connection duration, and the number of operations performed as root in the connection, contributed most. We give more information by quantifying precisely how much the attributes of an event would have to change for the event to no longer be considered an anomaly. We achieve this by projecting an anomalous event onto the decision boundary of the underlying machine learning algorithm. Projecting onto the decision boundary allows us to synthesise a new event that is typical but close to the anomalous event. By comparing the abnormal and synthesised events, one can understand how much aspects of the anomalous event need to change to make the event usual. Such a procedure helps an analyst understand the limits of the implicit rule used by the machine learning algorithm. Armed with this information, they can instantly use their domain knowledge and judgement to flag false alarms and recalibrate the anomaly detection algorithm.
We have developed and demonstrated an interactive tool capable of giving unprecedented insight into anomalies that arise in the flow, delivery and cadence of communication between devices on a computer network. Our approach bridges the semantic gap between the implicit rule a machine learning algorithm uses to detect an anomaly and what it means and implies for the analyst. Answering this question goes to the heart of the difference between finding "abnormal activity" and "attacks". When working with an anomaly detection system, it is crucial to understand the operationally relevant activity it can detect and the blind spots that the system will necessarily have. Our tool helps an analyst understand what the system is doing.
We give an operator more insight into the implicit rule that a machine learning algorithm used to trigger an alert so that they can use their domain knowledge and context to assess the warnings properly and prioritise their efforts. We also introduce a rapid feedback loop into the anomaly detection procedure. We believe that one can vastly elevate the maintainability and flexibility of an anomaly detection system by allowing a security analyst to report false positives and false negatives directly and, based on this feedback, benefit from adjusted model parameters.