Causely's Capabilities That Power Causal Analysis
Shmuel kliger
January 15, 2025
Our Causal Reasoning Platform is a model-driven, purpose-built AI system delivering multiple analytics built on a common data model.
There are 13 tenets of Causely that are required to continuously assure applications reliability and performance. With Causely, teams automate root cause analysis, prevent SLO violations, and gain transparency and organizational alignment.
Out-of-the-box Causal Models
The Causal Reasoning Platform is driven by Causal Models. Causely is delivered with built-in Causal Models that capture the root causes that can occur in cloud-native environments. These Causal Models enable Causely to automatically pinpoint root causes out-of-the-box as soon as it is deployed in an environment.
There are at least a few important details to highlight about these Causal Models:
- They capture potential root causes in a broad range of entities including applications, databases, caches, messaging, load balancers, DNS, compute, storage, and more.
- They describe how the root causes will propagate across the entire environment and what symptoms may be observed when each of the root causes occurs.
- They are completely independent from any specific environment and are applicable to any cloud-native application environment.
Out-of-the-box Attribute Dependency Models
Causely is delivered with built-in Attribute Dependency Models that extend the Causality Models to capture the dependencies between attributes across entities and the constraints attributes must satisfy. These Attribute Dependency Models enable Causely to automatically correlate performance trends across the entire environment, figure out the desired state of the environment – a state where all applications meet their objectives while satisfying the constraints they have to operate within – and the actions to keep the environment in that state.
There are at least a few important details to highlight about these Attribute Dependency Models:
- They can capture attribute dependencies in a broad range of entities including services, applications, databases, caches, messaging, load balancers, DNS, compute, storage, and more.
- They describe the functions between the attributes, but more importantly the functions can be learned.
- They describe the desired state in terms of the applications' goals and the constraints they have to operate within.
- They are completely independent from any specific environment and are applicable to any cloud-native application environment.
Automatic Topology Discovery
Cloud-native environments are a tangled web of applications and services layered over complex and dynamic infrastructure. Causely automatically discovers all the entities in the environment including the applications, services, databases, caches, messaging, load balancers, compute, storage, etc., as well as how they all relate to each other.
For each discovered entity, Causely automatically discovers its:
- Connectivity - the entities it is connected to and the entities it is communicating with horizontally
- Layering - the entities it is vertically layered over or underlying
- Composition - what the entity itself is composed of
Causely automatically stitches all of these relationships together to generate a Topology Graph, which is a clear dependency map of the entire environment. This Topology Graph updates continuously in real time, accurately representing the current state of the environment at all times.
Automatic Causality Mapping Generation
Using the out-of-the-box Causal Models and the Topology Graph as described above, Causely automatically generates a causal mapping between all the possible root causes and the symptoms each of them may cause, along with the probability that each symptom would be observed when the root cause occurs.
Causely automatically generates two data structures to capture the causality mapping:
- A Causality Graph is a directed acyclic graph (DAG), where the nodes are root causes and symptoms and the edges represent the causality, i.e., an edge from node A to node B means that A may cause B. The edges are labeled with the probability of the causality.
- A Codebook is a table where the columns represent the root causes and the rows represent the symptoms. Each column is a vector of probabilities defining a unique signature of the root cause.
Automatic Attribute Dependency Graph Generation
Using the out-of-the-box Attributes Dependency Model and the Topology Graph as described above, Causely automatically generates an Attribute Dependency Graph.
The Attribute Dependency Graph is a directed acyclic graph (DAG) where:
- The nodes are attributes.
- The edges represent a dependency between the attributes, i.e., an edge from attribute A to attribute B means that the value of B is a function of attribute A.
- The edges are labeled with the functions. The functions can be defined in the Attributes Dependency Model or can be learned.
- A subset of the nodes might be decorated with a constraint the attribute must satisfy.
Contextual Presentation
Results are intuitively presented in the Causely UI, enabling users to see the root causes, related symptoms, the service impacts and initiate remedial actions. The results can also be sent to external systems to alert teams who are responsible for remediating root cause problems, to notify teams whose services are impacted, and to initiate incident response workflows.
The Models, the automated topology discovery, and the automatic generation of the Causality Mapping and the Attribute Dependency Graph empower multiple analytics that together deliver an autonomous application reliability system that continuously assures application reliability and performance.
Root Cause Analysis
Causely uses the Codebook described above to automatically pinpoint root causes based on observed symptoms in real time. No configuration is required for Causely to immediately pinpoint a broad set of root causes (100+), ranging from applications malfunctioning to services congestion to infrastructure bottlenecks.
In any given environment, there can be tens of thousands of different root causes that may cause hundreds of thousands of symptoms. Causely prevents SLO violations by detangling this mess and pinpointing the root cause putting your SLOs at risk and driving remediation actions before SLOs are violated. For example, Causely proactively pinpoints if a software update changes performance behaviors for dependent services before those services are impacted.
Performance Analysis
Causely uses the Attribute Dependency Graph and Causality Graph to analyze microservices performance bottleneck propagation by automatically learning, based on your data:
- The correlation between the loads on services, i.e., how a change in load of one cascades and impacts the loads on other services;
- The correlation between services latencies, i.e., how latency of one cascades and impacts the latencies of other services; and
- The likelihood a service or resource bottleneck may cause performance degradations on dependent services.
Constraints Analysis
Causely uses the Attribute Dependency Graph decorated with performance goals like throughput and latency, and capacity or cost constraints, to automatically compute the desired state of the environment and to figure out what actions need to be taken to assure the goals are accomplished while satisfying the defined constraints.
Prevention Analysis
Causely uses the Causality Graph and the Attribute Dependency Graph to enable prevention analysis. Teams are empowered to analyze the potential impacts or problems of changes.
Teams can ask "what if'' questions to:
- Understand the services that may be denigrated if a potential problem were to occur
- Understand the impact a planned change may have on services
In doing so, teams can support planning of service/architecture changes, maintenance activities, and service resiliency improvements, and assure that none of these cause unexpected outages that may dramatically impact the business.
Predictive Analysis
Causely uses machine learning (ML) together with the Causality Graph and the Attribute Dependency for predictive analysis. Causely uses:
- ML to analyze the performance behavior of a small subset of attributes, e.g. some services loads, to predict their trends.
- The Attribute Dependency Graph and the predicted trends to predict the state of the environment, i.e., the state of all the attributes.
- The Causality Graph and the predicted future state to pinpoint potential bottlenecks and suggest actions that may prevent bottlenecks.
In doing so, Causely pinpoints the actions required to prevent future degradations, SLO violations, or constraint violations.
Service Impact Analysis
Causely uses the Causality Graph to automatically analyze the impact of the root causes on SLOs, prioritizing the root causes based on the violated SLOs and those that are at risk. Causely automatically defines standard SLOs (based on latency and error rate) and uses machine learning to improve its anomaly detection over time. However, environments that already have SLO definitions in another system can easily be incorporated in place of Causely’s default settings.
Postmortem Analysis
Causely uses the Causality Graph to save the relevant context of prior incidents to enable postmortem analysis. Causely saves the root cause, the Causality Graph of the root cause, the symptoms in the Causality Graph and the relevant attribute trends. Teams can review prior incidents and see clear explanations of why these occurred and what the effect was, simplifying the process of postmortems and enabling actions to be taken to avoid re-occurrences.
See Causely for Yourself!
Book a meeting with the Causely team and let us show you how to transform the state of escalations and cross-organizational collaboration in cloud-native environments, or start your free trial now.