The concept is clear: Knowing the context of your data leads to better assessment and analysis of data risk.
Why Data Risk Analysis
A growing number of high-profile data breaches and the emergence of complex government regulations, such as GDPR and CCPA, have made data security a top priority for organizations that operate in multi-cloud environments. The explosion of data volume and variety that goes hand in hand with increasingly data-driven organizations has raised the priority even more.
It’s therefore critically important to accurately assess data risk.
Making it Context-Based
Data risk analysis in a cloud environment begins with an assessment of system and resource configurations. When systems and resources are misconfigured, it leads to gaps in the security of the infrastructure. It also opens up loopholes which can be exploited by attackers to gain access to the environment.
The likelihood and severity of attacks varies with factors inherent in the data like the amount and type of data involved. It also varies with a number of other factors surrounding the data like the users and identities that have access to it. All of these factors are the data context.
Next-generation data risk analysis takes into account not just identification of misconfigurations but also the data context. This leads to more accurate determination of attacks and their associated potential loss. This next-generation analysis is termed contextual data integration or context-based data risk analysis.
A Holistic View
Cloud Security Posture Management (CSPM) assesses security in cloud environments by implementing configuration-based controls. Although CSPM tools are good at identifying gaps and misconfigurations in cloud infrastructure, they cannot identify the severity or cost to the company associated with data loss.
This is where DSPM (Data Security Posture Management) comes into play. DSPM combines misconfiguration and access governance with data context to provide a holistic view of risk in cloud and hybrid environments.
Data Attack Paths
Common misconfigurations in cloud resources like inadequate network security or encryption can leave important data vulnerable to unauthorized access. When these misconfigured resources also have access to sensitive data, a data attack path exists.
Preventing such attacks requires discovering all data stores and data types and assessing their sensitivity and relationships to other cloud resources.
Attack Path Detection Using Data Context
Normalyze, a pioneer in DSPM, offers support to identify over 30 data stores, and classifies the datatypes and any PII/PHI or other sensitive data present in this data. Supported data stores include S3 buckets, RDS instances, Azure Blob containers, Azure PostgreSQL, and GCP Cloud SQL
The Normalyze Cloud Platform then uses patented attack path detection to build an attack graph to represent and highlight the links between resources and your sensitive data. An attack graph is a visual representation of all possible attack paths against a network.
The attack graph enables a comprehensive analysis of both direct and indirect exposure risks. With this powerful representation of your data context, you gain valuable insights into the security of your sensitive data, ensuring enhanced protection and compliance.
The concept of attack graphs helps us identify potential attacks in a network or in infrastructure. Locating an attack path in the graph depends on the context added in the graph. If there is not enough context, the paths can either lead to noisy or missed detections. Information like detected vulnerabilities or exploits on nodes adds context to the paths and can help identify attack paths with increased accuracy.
A Focus on Buckets
S3 buckets are one of the most popular cloud storage options available for storing large amounts of data. In 2023, about 5TB of data and 6 million records were leaked due to misconfigured and exposed buckets. About 45% of these leaks were from S3 buckets.
S3 buckets can be left publicly exposed if bucket settings are misconfigured or if other misconfigured resources have access privileges to the buckets.
In the following sections, we illustrate a few scenarios where Normalyze uses data context to identify data attack paths to S3 buckets. These examples show how context-based data risk analysis connects the dots between cloud resource misconfigurations and sensitive data exposure for better security decision making.
Scenario 1: Publicly Exposed Resource with Access to an S3 Bucket Containing Sensitive Data
An EC2 instance can be assigned an IAM role that grants access to an S3 bucket. Even if the S3 bucket has a secure configuration, it can be susceptible to attacks if the EC2 instance is publicly exposed.
If an attacker gains unauthorized access to a publicly exposed EC2 instance, then the attacker may connect to the EC2 instance, assume the instance profile’s role, and then access the data store.
The diagram below illustrates this scenario for an EC2 instance with access to data in an S3 bucket.
Figure 1: Demonstration of sensitive data exposure via a publicly accessible EC2 instance.
Identifying this potential attack path is valuable when considering the Principle of Least Privilege in the cloud environment. It also helps prioritize which occurrences of this issue should be fixed first, since the risk to the organization depends on the sensitivity of the data in the S3 bucket.
One well known example is the 2019 Capital One breach that affected the accounts and credit card applications of over 100 million Capital One customers. The attacker was able to gain access to a misconfigured AWS data store and fetch the records stored in it.
Scenario 2: Misconfigured Publicly Exposed Resource with Access to an S3 Bucket Containing Sensitive Data
Adding to the complexity, many public EC2 instances use an older version of the Instance Metadata Service (IMDSv1). A known behavior in IMDSv1 makes it possible for an attacker to steal IAM credentials from the EC2 instance even without getting code execution on it. This can be done by abusing existing applications running on the host. If an adversary can exploit a common vulnerability such as server side request forgery (SSRF) or XML external entity (XXE) flaws, an application running on the host can be then coerced to retrieve those IAM credentials.
If this EC2 instance has access to an S3 bucket with sensitive data, this vulnerability can be exploited to steal or access data stored on the bucket.
An important thing to note here is that the attack path is more likely to be exploited if the EC2 instance is running IMDSv1 (the vulnerable version). In other words, the risk of the attack path increases with the misconfiguration, and security teams should therefore prioritize this risk.
The following image shows a control in the Normalyze Cloud Platform which assesses this misconfiguration and data sensitivity at the same time and shows where it was discovered. The image also shows important Compliance tags associated with this control and detailed remediation steps that can be shared with operational teams.
Figure 2: Details of risk detection for “Publicly exposed EC2 Instance without IMDSv2 enabled has access to sensitive S3 Data” in Normalyze platform.
Scenario 3: Misconfigured Publicly Exposed Resource with a Known Vulnerability and Access to an S3 Bucket Containing Sensitive Data
In the previous scenario mentioned above, we identified an attack path to an S3 bucket via a publicly exposed EC2 instance with a cloud-level misconfiguration (IMDSv1 in this case). In this example, we explain how this path becomes more viable to attack if the EC2 instance also has a package-level vulnerability.
Image 3 below shows this use case from the Normalyze platform: a misconfigured resource (i.e. publicly exposed EC2 instance running IMDSv1) with a known package-level vulnerability, specifically CVE-2016-1585, “mount rules grant excessive permissions,” in the AppArmor package. The presence of the package-level vulnerability increases the likelihood of a successful attack, now that we know that the EC2 instance also has a critical vulnerability that can be exploited.
It is important to note that the risk analysis shown above is only possible because the attack graph stores the relationships between the data (and its attributes) and all of the resources and identities with access to it. That’s what puts “context-based” into the data risk analysis.
Figure 3: Details of risk detection for “Publicly Exposed EC2 Instance without IMDSv2, contains critical severity vulnerability & can access sensitive S3 Data” in Normalyze platform.
Risk Remediation for Faster MTTR
In any risk analysis process, the next step after the identification of a risk is remediation and containment of the risk. Reducing mean time to remediate (MTTR) is one of the main focus areas for Normalyze. Once the risk has been identified, the Normalyze Cloud Platform provides steps for remediation which can be followed in the environment as preventive measures.
Figure 4 below shows the remediation steps identified by Normalyze for resolving the risk. This guided remediation covers different segments of the infrastructure where the risk can be mitigated: through the cloud console, through command line or through Infrastructure-as-code (IaC). Cloud and/or data security teams can follow the steps for either of three modes to resolve the risk in their infrastructure.
Figure 4: Details of remediation for the risk “Publicly Exposed EC2 Instance without IMDSv2, contains high severity vulnerability & can access sensitive S3 Data” in Normalyze platform.
A Technology with Many Applications
DSPM tools with context-based data risk analysis like Normalyze address the limitations of CSPM by offering comprehensive insights into misconfigurations and access governance while focusing on data protection. By bridging the gap between resource vulnerabilities and sensitive data exposure, DSPM enables organizations to prioritize risk mitigation effectively. DSPM not only enhances threat detection but also streamlines response efforts, minimizing Mean Time to Remediation (MTTR) and bolstering overall cybersecurity resilience.
And it’s possible to imagine other scenarios beyond the ones detailed above where context-based data risk analysis yields valuable insights, for example: identifying over-provisioned (and therefore risky) users, identifying if data is being moved out of a region in violation to GDPR, finding hidden vulnerabilities in complex environments with many microservices accessing data, to name a few.
In a landscape where data breaches and complex regulations are on the rise, DSPM tools with context-based data risk analysis serve as essential safeguards, ensuring the security, compliance, and reputation of businesses in multi-cloud environments.