For security leaders and practitioners, it seems like developers and IT teams get all the cool toys, and security pros get stuck with the hand-me-downs. Dev was first to cloud, IT followed, and security warily joined in. IT had patch management while security just scanned to see if the patches weren’t there; and security orchestration, automation, and response was new to security pros, but devs called it “middleware” back in the ’90s. History will repeat itself yet again with a term new to security teams but one familiar to developers: observability.
In the world of developers, observability means understanding what code does in production, how it works, how it fails, and how it’s experienced by end users. Much like a side-channel attack, observability is about inferring the internal state of a system from its external outputs (but for good reasons). It combines logs, metrics, and traces — three key aspects of understanding a complex system: what the output was, whether the application maintains consistency, and what path was taken to get there.
For security pros, this definition seems deceptively simple and not entirely relevant, because it hides a dramatic shift in how we log, monitor, detect, and respond to bad — and good — things that happen in enterprise environments. Let’s dive into how and why that will happen by examining the issues with telemetry and instrumentation:
- Logs require known problems and states. Someone writes, classifies, and programs device logs. Whether it’s CEF or some other logging framework, the categories, text, details, classifications, severity, and anything else included in the log requires someone to specify and describe those items. And so logs require preknowledge of an event. And they are often wrong because they are built for the broadest possible use case.
- Logs broadcast far too much unimportant information. The real issues and problems get lost in the noise. The ability to decide levels of logging already taints the information sent to security teams. Some log data never gets sent, some gets filtered out, some never gets reviewed, and some actually matters. This contributes to alert fatigue.
- Logs lack context and exist in siloes. All logs sent exist in a vacuum. Device logs are inherently narcissistic. The hardware and software that sends logs only knows about itself, not the entire environment, where it’s deployed, what matters, and what doesn’t based on those details.
All of this is to say: Logging alone is not enough. Observability — which isn’t quite ready for prime time for security leaders yet, but it will get there — addresses this differently:
- Observability comes from what happens, not what is known. Inherently, observability features actions and behaviors, not log data. It doesn’t require an exhaustive taxonomy and broad generalizations because it uses states — expected and unexpected — as its source.
- Observability includes exploitation techniques of attackers. Sure, systems can send crash dumps and tell you when some monitoring tool was turned off. But observability comes from what was done before, during, and after a crash occurred so developers can trace, debug, and examine exactly what was happening and why it happened. This — alone — brings with it a foundational shake-up to how security works today. Instead of getting information and details about why a system might have rebooted, you understand the underlying details of what was happening that caused it to crash.
- Observability eliminates siloes. Modern environments with DevOps, microservices, and API-based integrations isolate functions to make it easier to troubleshoot and recover. But they place enormous value on machine telemetry and environmental factors present during execution. Data about an environment can aggregate, even if specific functions remain isolated. And, as we’ve mentioned before, microservice architectures lend themselves well to microsegmentation — one of the elements of Zero Trust.
What should security leaders do right now?
Keep your eyes open. You’ll start to see observability emerge at events, from vendors. It will become a talking point — perhaps even a new market segment. Given the skepticism all security pros possess, we expect plenty of people to dismiss this one. Do not.
These capabilities stand to become one of the major shifts this industry encounters, with downstream effects on application security, security operations, extended detection and response, managed detection and response, security analytics, and your overall situational awareness.
Stay tuned as we produce more content on this topic. Allie and Sandy will be writing about application security and the security operations center (SOC). Jeff and Allie will be talking about SOC roles and responsibilities and developing a detection charter for enterprise security teams in the near future.