Investment Notes

When the Cloud Becomes the Threat Surface: Backing Orbit Security

Martijn Hoekstra

Cloud workload security has a signal problem that is different in kind from the one facing traditional endpoint or network security. In an on-premises environment, the expected behaviour of a server is relatively stable: the running processes are known, the network connections are predictable, the filesystem changes during normal operation are bounded. Anomaly detection in this context is tractable — baseline normal, alert on deviation. In a dynamic cloud environment running containerised workloads on orchestration platforms like Kubernetes, the baseline is not stable. Containers are ephemeral by design: they start, run a bounded workload, and terminate. The same Kubernetes cluster might run thousands of container instances per day, each with a short and highly variable lifecycle. Defining "normal" at the process level is meaningless because the processes are different for each workload type, and the combination of workloads running at any given time changes continuously.

The detection engineering challenge this creates is that the useful signal in cloud workload monitoring is not at the process or filesystem layer — it is at the API and control plane layer. An attacker who has compromised a workload in a cloud environment will attempt to escalate privilege by querying the instance metadata service (IMDS) for attached IAM role credentials, or by making API calls to the cloud provider's control plane to enumerate available resources, create new access credentials, or modify security group rules. These actions leave traces in cloud-provider audit logs — AWS CloudTrail, Azure Activity Log, GCP Cloud Audit Logs — but correlating them into an attack narrative requires understanding the expected API call patterns for each workload type and flagging deviations that are consistent with lateral movement or privilege escalation intent, not just high-volume or unusual-time calls.

When we evaluated Orbit Security in 2024, the core technical question was how their anomaly detection model handled the baseline stability problem in dynamic cloud environments. Their approach used workload identity-anchored behavioural profiling rather than instance-level or container-level profiling: because containers are ephemeral but workload types are stable — the same microservice runs thousands of container instances over its lifetime — you can build a stable behavioural baseline at the workload identity level and flag deviations at the instance level against that baseline. An API call to query IAM credentials from a container running a stateless web frontend is anomalous regardless of whether it is the forty-second or the four-hundred-and-second instance of that workload. The detection logic does not require seeing the specific container before; it requires understanding what that workload type should be doing.

We should be honest about a limitation in cloud workload anomaly detection that no platform has fully solved: distinguishing legitimate operational complexity from attack behaviour. A cloud operations team responding to a production incident at 3am will perform API calls that look anomalous by any behavioural baseline — querying unusual resources, modifying configurations, accessing credentials outside normal patterns. These are operationally legitimate actions under incident response conditions. The detection systems that produce the most value are those that can consume operational context — scheduled maintenance windows, active incident response tickets, change management records — and apply that context to adjust detection thresholds appropriately. This requires integrations with ITSM platforms and operational tooling that cloud security platforms have historically underinvested in, treating the detection layer as separate from the operational context layer. Orbit's roadmap toward contextual detection — using operational metadata to interpret anomalies in context rather than in isolation — is technically sound, and it is the right direction for cloud workload security platforms aiming to reduce alert fatigue while maintaining detection coverage.