This ESG Technical Validation documents evaluation of security features and controls for Google Cloud Data Analytics. We evaluated how five aspects of security—encryption, data loss prevention (DLP), identity and access management (IAM), protection from data exfiltration, and logging/access transparency—contribute to increasing security and transparency for data analytics projects using Google Cloud Platform.
Organizations are leveraging cloud infrastructure to a great extent for their data analytics projects, and are moving more, and more sensitive, data to the cloud. According to ESG research, most organizations (92%) use or are planning to use public cloud storage for their analytics data sets. More than one-third (36%) of organizations store more than 50% of their analytics data sets in the cloud. In just two years, the number of organizations doing so is projected to almost double, with more than two-thirds (68%) expecting to use the cloud for their storage of analytic data sets (see Figure 1).1
Organizations using cloud data analytics increasingly trust cloud service providers (CSPs) with sensitive analytics data sets—those containing personally identifiable information (PII), financial information, company intellectual property, etc. ESG research reveals that 95% of organizations are storing some sensitive analytics data in the cloud. More than one-quarter (27%) said that more than half of their data currently stored in the cloud is sensitive, and the percentage expecting to store more than half of their sensitive data in the cloud in just two years is 49%.
While security concerns about cloud providers were once a major impediment to cloud adoption and use, 73% of organizations now view cloud security to be on par with or more mature than on-premises security as it relates to data analytics projects, demonstrating that cloud service providers (CSPs) have become trusted partners. However, organizations implementing cloud-based data analytics programs still have valid security concerns, as 58% suspect or have experienced a security incident that has affected their cloud-resident data.
As a result, organizations are looking to CSPs to provide robust native data security controls to address their challenges: data discovery, data loss prevention, and real-time response capabilities. They also face a learning curve as they look to apply security controls to analytics workloads in the cloud. Those who reported suffering a security breach were four times more likely to state that they need to (better) learn the CSP’s native controls. This need increases as organizations adopt multiple clouds: six out of ten organizations with sensitive data in multiple clouds express the need to learn each CSP’s native data security controls.
Utilizing a defense-in-depth strategy enables organizations to better utilize cloud data analytics solutions. Organizations with more than five security controls in place are 2.2x more confident in their ability to discover and classify all their cloud-resident data. These organizations predict they will double the amount of cloud-resident analytics data sets in just 2.8 years (compared to more than 6.5 years for those with fewer security controls).
Security by default, encryption, and a provider’s security-conscious culture are top-of-mind for protecting cloud-resident analytics data sets. Six out of ten organizations said that security by default with encryption was among the top three most important security attributes, and 21% put security by default as the most important.
Securing Google Cloud’s Data Analytics Suite
Google designed its data analytics suite, part of Google Cloud, to enable organizations to capture, process, store, analyze, visualize, and interact with large data sets in the cloud. The products use a fully managed, serverless approach, and can remove operational complexities, enabling businesses to increase the speed, efficiency, and accuracy of data-driven decision making.
Organizations can implement analytics programs on Google Cloud for business intelligence, data warehousing/data lakes, streaming data analytics, marketing analytics, IoT, machine learning and cloud AI, and more. The benefits of developing a Google Cloud-based analytics program include:
- Accelerate time to insights using cloud-scale, serverless, integrated end-to-end data analytics services.
- Focus on developing analytics programs while Google focuses on developing and maintaining the infrastructure.
- Manage and analyze at scale using tools designed to ingest, process, store, and analyze gigabyte- to petabyte-sized analysis data sets.
- Increase performance, efficiency, and value of integrating open source tools such as Apache Kafka, Spark, Flink, Airflow, and others.
- Leverage built-in advanced capabilities for machine learning or geospatial analytics or take advantage of the deep integrations with Google Cloud AI platform to incorporate predictive analytics into applications.
- Ensure security and provide high availability access to applications, data, and analytics programs.
Google Cloud Security
Cloud security is a shared responsibility, requiring collaboration between the CSP and the customer. Google Cloud is responsible for securing the underlying infrastructure foundation, and customers are responsible for securing applications and workloads such as data analytics.
Google places security at the core of the design and implementation of Google Cloud, and security permeates every aspect of the platform including the data centers, hardware, software, and company culture. Infrastructure security measures include:
- Security at the core of technology—Google practices defense-in-depth, and conceived, designed, and built Google Cloud to operate securely, developing custom-designed servers, networking equipment, and a proprietary operating system (OS), and deploying infrastructure in geographically distributed data centers. Google’s server OS is based on a stripped-down and hardened version of Linux, and Google continuously monitors systems for application binary modifications.
- Secure data centers—Google employs a layered security model with multiple safeguards to protect physical data centers, and maintains access logs, activity records, and video recordings should an incident occur.
- Global IP network—Google’s global data network consists of Google-owned fiber, public fiber, and undersea cables, enabling worldwide tight control over security, availability, and latency. Employing multiple layers of defense, Google only allows authorized services and protocols that meet security requirements; anything else is automatically dropped. Google enforces network segregation using industry-standard firewalls and access control lists (ACLs), and all traffic is routed through custom Google front end (GFE) servers to detect and stop malicious requests and distributed denial-of-service (DDoS) attacks. Google routinely examines logs to reveal any exploitation of programming errors.
- Securing data in transit—GFE servers use strong encryption protocols such as TLS to secure the connections between customer devices and Google’s web services and APIs. Google Cloud provides customers with additional transport encryption options, including Google Cloud VPN for establishing IPSec virtual private networks.
- Data access and restrictions—Google Cloud logically isolates each customer’s data from that of other customers and users, even when data is stored on the same physical server. Google employee access rights and levels are based on their job function and role, using the concepts of least-privilege and need-to-know to match access privileges to defined responsibilities. Google monitors and audits employee access with dedicated security, privacy, and internal audit teams, and audit logs are provided to customers through Access Transparency for Google Cloud.
- Dedicated security team—Google’s dedicated security team employs some of the world’s foremost experts in information, application, and network security, and the team is tasked with maintaining the company’s defense systems, developing security review processes, building security infrastructure, and implementing Google’s security policies.
- Monitoring—Google focuses security monitoring on information gathered from internal network traffic, employee actions on systems, and outside knowledge of vulnerabilities. Security monitoring uses a combination of open source and commercial tools for traffic capture and parsing, and Google’s network analysis is supplemented by examining system logs to identify unusual behavior, such as attempted access of customer data. Automated search alerts ensure Google security engineers are informed about security incidents that might affect the infrastructure.
- Incident management—Google has developed a rigorous incident management process structured around the NIST guidance on handling incidents (NIST SP 800–61) for any security events that may affect the confidentiality, integrity, or availability of systems or data. Key staff are trained in forensics, evidence handling, and the proper use of third-party and proprietary security, data collection, and incident investigation tools and techniques.
- Vulnerability management—Google actively scans for security threats using commercially available and custom tools, intensive automated and manual penetration efforts, quality assurance processes, software security reviews, and external audits. Google uses a variety of methods to prevent, detect, and eliminate malware.
- Independent third-party certifications and regulatory compliance—Google continually ensures Google Cloud is audited and certified by numerous third parties and complies with numerous standards and regulations. Google publishes a list of current certifications and other compliance resources for customer review.
Google has developed a suite of security measures designed to simplify the effort for customers to secure their applications, workloads, and data. Specific measures relevant to securing analytics workloads include:
- Encryption—Google encrypts all data by default, both in transit and at rest, and provides comprehensive encryption key management, enabling the use of customer-supplied encryption keys stored in the cloud and on-premises.
- DLP—Organizations can automatically discover, classify, mask, and redact sensitive data with more than 120 detectors to identify patterns and formats such as credit card and bank account numbers, and personal identification information.
- Identity and access management (IAM)—Google protects data, services, and applications with fine-grained access control and visibility, enabling administrators to authorize what actions can be taken on specific resources by which people or services. Context-aware access provides authorization based on contextual factors such as IP address, device security status, resource type, and access date and time.
- Measures to limit data exfiltration—Virtual private clouds (VPCs) enable administrators to define security perimeters for sensitive data, mitigating data exfiltration risks. VPCs can span Google Cloud and the organization’s on-premises infrastructure, keeping sensitive data private in a hybrid environment. Google VPCs facilitate context-aware access to data, applications, and Google Cloud services.
- Access transparency—Google Cloud maintains data access, system event, and admin activity logs for every cloud project, folder, and organization, enabling administrators to have visibility into data access and to identify who did what, where, and when. Logging extends to Google access, enabling organizations to know when a Google employee accessed the organization’s data.
ESG Technical Validation
ESG’s evaluation of security for Google Cloud Data Analytics was designed to demonstrate how Google increases data and application security with always-on data encryption, sensitive data identification, classification, and masking, granular and context-aware access controls, data exfiltration mitigation, and administrative access transparency.
The publicity surrounding data exposure and breaches because of misconfigured cloud services, malicious activities, or both have driven an increased awareness and demand for data encryption. Of those organizations that suspected or suffered data loss, 30% said that one of the primary contributing factors was that the data was not encrypted.
Organizations desire to encrypt data both at rest and in transit to ensure that sensitive information is protected in the case of exposure or exfiltration. Indeed, 44% of organizations ranked encryption as one of the most effective controls to protect cloud-resident analytics data sets, the most cited response.
More than half (57%) of organizations currently encrypt their cloud-resident analytics data sets with CSP-provided encryption. Within the next 12-18 months, most organizations plan to encrypt their data, and 96% will be using CSP-provided encryption.
Although encryption is perceived as effective, configuring and managing encryption of cloud-resident analytics data sets presents a significant challenge according to 29% of organizations. Twenty seven percent said they are challenged by encryption key management, and 23% struggle to retain custodianship of encryption keys.
Thus, organizations may postpone or forego configuring encryption for data analytics programs. To avoid the risk of data loss from this approach, organizations are seeking solutions that provide security, including encryption, as the default. Sixty percent of organizations ranked security and encryption by default as one of the top three most important attributes for cloud-resident data protection, and 21% ranked security and encryption by default as the top priority.
Google Cloud Encryption
With security as a core principle driving Google Cloud architecture, Google seeks to enable security by default. All Google Cloud data is encrypted at rest, and there is no mechanism to disable encryption.
Google Cloud divides data into chunks and encrypts each chunk with an individual data encryption key (DEK). Google Cloud distributes encrypted chunks across the storage infrastructure for reliability and data protection. The DEKs are separately encrypted by a key encryption key (KEK) that is stored in a key management system (KMS).
To prevent unauthorized access to data, Google Cloud assigns a unique identifier to each data chunk, and uses access control lists (ACLs) to ensure that only authorized services can decrypt data at that point in time.
The Google Cloud backup process further encrypts data with its own data encryption key, while a separated DEK is used to encrypt all metadata in backups. As with storage, the backup DEKs are encrypted with a key encryption key that is stored in the key management system.
Google Cloud provides two key management strategies: Google Cloud can store and manage keys on the organization’s behalf; alternatively, organizations can use Google Cloud’s Cloud KMS to manage keys. Organizations using Cloud KMS benefit from:
- Scalability—Cloud KMS has the ability to manage millions of cryptographic keys.
- Access control—Cloud KMS is coupled with Google Cloud’s Cloud IAM, enabling organizations to control access to keys, and thus to the data the keys protect.
- Monitoring—Cloud KMS is coupled with Cloud Audit Logging and the use of each key is logged for every operation that requires a key, enabling organizations to audit all accesses to keys and data.
- Flexibility—Through Cloud KMS, organizations can select and configure encryption algorithms and parameters, including specifying symmetric or asymmetric encryption keys.
- Certificate signing—Cloud KMS can act as a certification authority, and administrators can sign certificates with both RSA and elliptic curve keys of various lengths.
- Integration—Cloud KMS is integrated with the entire Google Cloud solution, including Google Kubernetes Engine (GKE). Cloud KMS provides API access, enabling organizations to automate and orchestrate the encryption process.
- Separation of duties—Cloud KMS partitions key management duties from encryption/decryption duties, increasing security.
- Policies—Organizations can set policies for periodic key rotation, whereby keys are changed, limiting the scope of data accessible with any single key version. Cloud KMS incorporates a built-in 24-hour delay for key destruction, preventing data inaccessibility caused by human error.
Cloud KMS can also interact with Cloud HSM, Google’s cloud-based FIPS 140-2 Level 3 certified hardware security module that can host encryption keys and perform cryptographic operations. Cloud KMS can also interact with external key management systems, whereby Cloud KMS sends DEKs to be encrypted or decrypted using the external KMS without ever accessing the external key, enabling organizations to have complete control of cryptographic keys.
ESG started by logging in to Google Cloud and selecting Security/Cryptographic Keys from the Google Cloud console, which brought up the Cloud KMS dashboard. As shown in Figure 5, the dashboard provides a filterable list of all managed key rings. A key ring is a collection of keys stored in the same location, analogous to a physical key ring. Users can operate on multiple key rings simultaneously by selecting the key rings, and then selecting the operation from the three-dot menu on the right.
ESG created a new key by selecting CREATE KEY RING from the top menu, and Cloud KMS displayed the key creation window, as shown in Figure 6. We entered the key ring name, chose a key storage location, and selected Create to create the key ring. Next, we entered a name for the new key, and selected the cryptographic parameters. We chose to create and store the key in a cloud HSM to meet our security requirements. We could specify additional parameters such as key rotation period or labels. We selected Create and created the key.
Next, we edited permissions to use a key. From the Cloud KMS dashboard, we clicked on the demo-euw2 keyring, which brought up a list of keys in the key ring. As shown in Figure 7, the key ring contained only one key, named hsm-key. We selected hsm-key and clicked on the top right option SHOW INFO PANEL. Cloud HSM overlayed the right side of the window with an information panel providing tabs for permissions, activities, and labels. The permissions panel displayed a filterable list of permissions. Clicking on the trashcan icon revokes the associated permission while clicking on the pencil icon enables the user to edit the permission.
Next, ESG configured Cloud KMS to use an externally managed key. Google focused on architectural simplicity when designing the externally managed key integration. Encryption and decryption follow the same process: data is encrypted with a DEK, and the DEK is itself encrypted with a KEK. All Google Cloud services send DEK encryption and decryption requests to Cloud KMS, which can either store and manage the KEK or send the DEK to an external KMS for encryption and decryption using a KEK created, stored, and managed in the external KMS. When using an external KMS, the KEK is never directly accessed by any Google Cloud service.
Likewise, Google focused on user interface simplicity, and added an externally managed key option to the key creation process.
ESG followed the same process to create a new key, and during the create process we configured the key type as Externally managed key, as shown in Figure 8.
The Google Cloud key creation process created a service account (email@example.com) specifically for this externally managed key. Next, we went to the external KMS system, created a key, and authorized the service account to access the key. The external KMS created a uniform resource identifier (URI) to identify and access the key, and we specified that URI in the Cloud KMS key creation window. We then clicked CREATE to create the key.
To verify the URI, we selected the key from the list of keys, and chose View key URI.
Next, we verified that Google Cloud was using the externally managed key. We successfully made a query using BigQuery where the data was encrypted using the externally managed key. Next, using the external KMS, we temporarily disabled the external key, and attempted the same query. This time, Google Cloud displayed an access denied message. As shown in Figure 9, the message included the external key URI, the service account used to access the key, and the complete error message provided by the external KMS. Including all relevant information in one access denied message enables organizations to quickly and easily troubleshoot external KMS issues.
Why This Matters
Encryption is a key component of defense-in-depth strategies, capable of protecting data even when malicious actors gain access to data stored in the cloud. However, configuring, encrypting, and managing the myriad of encryption keys presents significant challenges that are exacerbated in hybrid multi-cloud environments.
ESG validated that because Google Cloud data is always encrypted, Google Cloud simplifies encryption. Using Cloud KMS, we were able to create our own encryption keys with just a few steps, leveraging either software encryption or cloud-based HSMs. Cloud KMS supports the entire key lifecycle of creation, activation, suspension, deactivation, destruction, key rotation, and the handling of compromised keys, simplifying key management. Search and filtering features accelerated finding specific keys from a large inventory, and fine-grained key access control policies enabled us to use encryption keys to control data access.
Cloud KMS supports storing and managing encryption keys outside Google Cloud infrastructure using the Cloud External Key Manager (EKM) service. By changing one parameter in the Cloud KMS key creation process and pointing to a URI, we could use our own encryption key stored and managed outside of Google Cloud. This enabled us to centralize key management across our entire on-premises and cloud infrastructures, unifying data and encryption policies.
We found that Google Cloud’s encryption facilities simplified and accelerated our ability to use encryption to protect our cloud-resident data, regardless of the size and complexity of the environment.
Data Loss Prevention
Twenty-seven percent of organizations said that discovering and classifying data subject to regulatory requirements was one of their most significant data security challenges for protecting their cloud-resident analytics data. When asked about top cloud-resident analytics data loss concerns, 28% worried about the sharing of sensitive data with unauthorized employees and 25% worried about the sharing of sensitive data with external parties. These concerns are justified, as 30% said that they believed that data exposure from data misclassification was one of the factors that most contributed to suspected or actual data loss.
When asked which are the most effective capabilities or controls to protect their organization’s cloud-resident analytics data, 35% of respondents said data loss prevention (DLP), ranking DLP third after data encryption and data loss detection and response. Fifty-six percent of organizations are currently using DLP and 35% plan to deploy DLP in the next 12-18 months to protect their cloud-resident analytics data.
Google Cloud DLP
Google designed Cloud Data Loss Prevention (Cloud DLP) to automatically discover and redact sensitive data throughout all cloud-resident data and integrate with services like BigQuery, Data Catalog, Pub/Sub, Dataflow, and others. As of the time of this report, Cloud DLP includes more than 120 predefined sensitive data detectors, and organizations can apply rules to mask, tokenize, transform, or redact sensitive data. The benefits of Cloud DLP include:
- Flexible classification—Using contextual data, detectors can identify patterns, formats, and checksums, like credit card numbers, names, social security numbers, personal identifier numbers, phone numbers, and Google Cloud credentials. Users can define their own custom detectors, including custom dictionaries, and can tune detector sensitivity with likelihood scores.
- Flexible redaction—Users can partially or fully mask or redact sensitive data. Transforming sensitive data with format-preserving encryption or tokens enables users to perform complex correlation analyses without exposing the sensitive data.
- Discovery—Organizations can scan, discover, classify, and report on data stored in Google Cloud, including cloud storage, BigQuery, and Cloud Datastore. An API for scanning static and streaming data enables organizations to discover sensitive data in applications, workloads, and other data sources.
- Scale and automation—Classification and de-identification templates, job triggers, actions, and Pub/Sub notifications enable large-scale automation of DLP.
- DLP analysis—Classification results can be sent to BigQuery for analyses. Cloud DLP can measure statistical properties such as k-anonymity and l-diversity, enabling organizations to better understand and protect data privacy.
From the Google Cloud console, ESG selected Security, then Data Loss Prevention, which brought up the Cloud DLP console, as shown in Figure 10. The console displays a list of DLP jobs and last run status information. Clicking on the three-dot menu on the right enables a user to take action, such as cancel or rerun the DLP job.
Next, we clicked on the Trigger ID inspect-website1, which brought up the history for that job. At the top of the screen was a time chart displaying the job history. The bottom of the screen included a filterable list of every run for the DLP job.
We clicked on the first line for Job ID 5264108929869963980, begun on Oct 12, 2019, which displayed the results, shown in Figure 11. The filterable, tabular results show the sensitive data type, description, total found during the scan of data, and the percent of total sensitive items found. Cloud DLP provided a VIEW FINDINGS IN BIGQUERY button, which enabled us to export and analyze the results data using BigQuery.
Next, we created a new DLP job. From the Cloud DLP dashboard, we selected Create job or job trigger and Cloud DLP displayed the job creation window. We specified the job name, data storage type, and location of data. We could enter advanced parameters to filter out files or file paths.
We could also specify whether to scan the entire data set or to sample only a portion of the data. Sampling can be used when the user has advanced knowledge about the data, such as the fact that sensitive data is only stored at the beginning of the file. Sampling can reduce the workload and accelerate time to results.
Next, we specified the InfoTypes, the type of sensitive data, such as credit card numbers or Google Cloud credentials, and the confidence threshold or likelihood that the scanned data matches a pattern. We could also specify a template type or define custom InfoTypes.
Finally, we set the schedule for the job and the action to take upon job completion: publish to Pub/Sub, store in BigQuery, publish to Google Cloud Security Command Center, or notify by email. After specifying all parameters, we selected Create to create the DLP job.
During DLP job creation, we were also given options for sensitive data transformation. As shown in Figure 13, options include:
- Replace with InfoType—the sensitive data is replaced with the data type. For example, “Jennifer Nyong’o” was replaced with “[PERSON_NAME]”, completely redacting the sensitive data.
- Partial or Full Masking—some or all of the sensitive data is replaced with special characters such as * or #. For example, the SSN 555-44-1111 was replaced with ***-**-1111. Partial masking enables data owners to verify the data is correct while preventing unauthorized users from getting complete access to sensitive data.
- Tokenization—the sensitive data is replaced with a token. The same token is used for all occurrences of a unique sensitive data instance. For example, all occurrences of “Jennifer Nyong’o” are replaced with the 44-character long token AQ2iW3KdRQlgZWJYFNBWzU3JDcKNJbwoFXrRMBf/4PQ+. Data scientists can correlate information using the token without having access to the raw sensitive data.
Why This Matters
It’s unavoidable—cloud-resident data will include sensitive data, and organizations need to ensure that unauthorized parties cannot access this sensitive data.
ESG validated that Google Cloud DLP automatically discovers and redacts sensitive data in cloud-resident data sets. We were able to create DLP jobs to search for sensitive data using built-in and custom data detectors. Once found, we could partially or fully mask the sensitive data. We could also tokenize the data, replacing sensitive data with a token or random value. Tokenization enabled us to run analyses and correlate information using the token without having access to the raw sensitive data.
Job creation, scheduling, and management was quick and easy, taking just a few steps. We could configure email notification of job results or publish to Pub/Sub, which enabled us to automate actions based on job results. We found that Cloud DLP simplified the task of identifying, classifying, and redacting sensitive information, ensuring privacy and preventing unauthorized access.
Unauthorized access to data is worrying for many cybersecurity teams. When asked which causes of cloud-resident data loss concerned them most, 27% cited employees not properly applying access and permission controls when sharing data, and 26% cited the misuse of a privileged account by an inside employee.
A major contributing factor to organizational concern is the difficulty of correctly configuring ACLs. Thirty percent said that managing access permissions represented one of their most significant data security challenges for protecting cloud-resident analytics data sets. As a result, 35% said that data exposure from misuse of access or permission controls was one of the factors that most contributed to their suspected or actual data loss, one of the three most cited responses (along with data exposure from remote users and attackers masquerading as employees via stolen credentials).
Google Cloud IAM
Google Cloud identity and access management (Cloud IAM) provides organizations with fine-grained access control and enables organizations to centrally manage internal and external access to cloud resources. Google designed Cloud IAM on the core concept of policy. A policy defines who can do what to which thing. Policies comprise permissions that grant access to resources.
Identities define the who in the policy and can be either a Google Cloud user account or a Google Cloud service account. Both types of accounts have a username and some sort of credential. Permissions define the what in the policy and include actions such as read, write, delete, and more. Resources define the which thing in the policy, and include such things as Google Cloud projects, storage buckets, and pub/sub topics.
Identities can be bundled into groups, and permissions can be bundled into roles. Cloud IAM includes predefined primitive roles such as owner, editor, and viewer, and users can define their own roles. IAM policies are hierarchically inherited from the organization level to the project level to the resource level to individual resource instances.
Google Cloud applies machine learning techniques to further improve security, helping security professionals with identification and removal of unwarranted access to Google Cloud resources. The IAM recommender analyzes usage over 60-90 days and recommends changes to permissions. For example, if a user hasn’t used the write access permission to an S3 bucket, IAM recommender suggests removing write permissions to the bucket. IAM recommender analyzes user and service accounts, and suggested changes of permissions are designed to reduce over-provisioned access to Google Cloud resources, which reduces the attack surface area and increases security.
Organizations leveraging Cloud IAM benefit from:
- Flexibility—In addition to the traditional owner, editor, and view roles, organizations can develop their own roles with fine-grained permission, as well as leverage predefined roles for new services, such as the publisher and subscriber roles for the Cloud Pub/Sub service.
- Apply principle of least privilege—Organizations can implement fine-grained permissions, e.g., at the column-level in BigQuery, ensuring that members have only the permissions they actually need. IAM recommender applies machine learning techniques to guide administrators in defining appropriate access levels based on historical usage, reducing the attack surface area.
From the Google Cloud project console, ESG selected IAM and the Cloud IAM dashboard was displayed, as shown in Figure 14. The default MEMBERS tab displays a filterable list of all members of the project and their associated roles. The ROLES tab displays a filterable list of all roles and members that have been assigned to each role. Administrators can edit a member’s permissions or edit the role by selecting the pencil icon on the right.
Next, from the MEMBERS tab, ESG clicked on Add to add a new member to the project and assign a role. We entered the member name and then selected the appropriate roles from the filterable list of roles. We assigned the storage admin role to member firstname.lastname@example.org. Then we clicked on Save to add the member to the project.
From the ROLES tab, ESG clicked on the custom-defined role _ACME_Custom_Role, and then the pencil icon to display and edit the role permissions. Cloud IAM displayed a filterable list of permissions granted for this role. We filtered the list by Kubernetes and selected Kubernetes Engine Admin, clicked on the three-dot menu on the right, and from the options, selected disable to disable this permission.
We next reviewed the IAM recommendations for permissions associated with the project. From the IAM dashboard, we scrolled the list of the members to the right and reviewed the column labeled Permissions in use, as shown in Figure 16. This column displays the number of permissions used and the total granted. The IAM recommender has determined that one member has used 6 out of 296 permissions in the last 60-90 days. The lightbulb in the column indicates that IAM recommender has suggested changes of permissions for the member.
We clicked on the lightbulb to review the suggested changes, which are displayed in a “diff” format, with the original list of permissions on the left, and the suggested changes on the right. Permissions to be removed are noted with a red line number and a minus in the left margin. We clicked on Apply to immediately apply the recommended changes.
Why This Matters
Applying the principle of least privilege access—giving users the bare minimum permissions they need to perform their work—is critical in helping to prevent inadvertent or malicious unauthorized access to data. However, properly configuring access and permission controls can be challenging, and only 22% of organizations indicated there was no need for improving how they define and apply cloud-resident data access policies.
ESG validated that Cloud IAM provides fine-grained access control over Google Cloud resources. Configuring access was simple and quick, involving selecting a resource, then adding a member, and selecting the access rights for that member. We could use default roles of owner, editor, and viewer, and define our own custom roles with a specific list of rights. To simplify further, we could create groups of members and apply the same access policies to the entire group.
Preventing Data Exfiltration
One-quarter (25%) of organizations expressed concern about the risk of cloud-resident analytics data loss due to targeted penetration attacks, 24% were concerned about the complexity of protecting the data lifecycle, and 18% were concerned about misconfigured cloud services.
The level of concern is driven by experience, as 35% said that they suffered suspected or actual data loss from misuse of access or permission controls, while 28% attributed loss to misuse of sanctioned cloud services and 22% said the proximate cause was misconfigured object storage accounts.
To combat data loss, 30% of organizations are seeking to segment the systems hosting cloud-resident data analytics and the users who have access to those systems.
Google Cloud VPC Service Controls
Google designed Google Cloud Virtual Private Cloud Service Controls (VPC-SC) to prevent data exfiltration—the unauthorized copying, transfer, or retrieval of data, typically due to misconfigured access control, stolen credentials, compromised code, or malicious insiders.
With VPC-SC, enterprise security teams can define fine-grained perimeter controls and enforce that security posture across numerous Google Cloud services and projects. Users have the flexibility to create, update, and delete resources within service perimeters so they can easily scale their security controls. VPC-SC benefits include:
- Virtual perimeter—VPC-SC can define a security perimeter around Google Cloud resources including cloud storage buckets, BigTable instances, and BigQuery data sets.
- Protect hybrid environments—VPC-SC combined with Private Google Access can extend the security perimeter from the cloud to on-premises environments, enabling organizations to protect the private communications of hybrid environments.
- Mitigate data exfiltration—VPC-SC security perimeters can help protect against data exposure due to misconfigured access controls or malicious actors attempting to access sensitive data.
- Context-aware security—VPC-SC enable organizations to create granular context-aware access control polices to protect services and sensitive data.
- Defense-in-depth—VPC-SC can be part of a defense-in-depth strategy, helping organizations to apply the principle of least privilege access.
- Centralized security—VPC-SC enable organizations to define and enforce security policies across numerous Google Cloud services and projects.
From the Google Cloud console, ESG selected Security, then VPC Service Controls, which brought up the list of currently configured VPCs. We selected NEW PERIMETER, and started to configure the VPC, as shown in Figure 17. We entered a name for the perimeter, then clicked on ADD PROJECTS to select projects to be protected. Only selected projects will be allowed to make Google Cloud API calls. Next, we selected ADD SERVICES to configure which services and API calls will be allowed inside the VPC. We then clicked on SAVE to save and immediately deploy the VPC.
Simulating an attacker, we used a command line interface, and attempted to use rsync to copy data from the vpcsc-secure-bucket to the attacker-bucket. As shown in Figure 18, before we created the VPC, rsync successfully copied two files. After we deployed the VPC, rsync failed with a 403 Request violates VPC Service Controls error message.
Why This Matters
The ever-increasing volume and sophistication of attacks and concomitant risk of data loss is driving awareness of the need to protect cloud-resident data via network and user segmentation.
ESG validated that Google Cloud Virtual Private Clouds can be used to define a security perimeter around cloud-resident data and Google Cloud resources. While it has been possible to create perimeters around IP-based cloud resources such as virtual machines, VPC-SC extend the concept of segmentation to managed cloud services such as storage buckets and data warehouses. Configuring VPC-SC on Google Cloud was simple and quick, and we could restrict which services could access resources inside the perimeter. VPC-SC enabled us to apply the principle of least privilege access, ensuring we enabled only the minimum set of services necessary. We could also use VPC-SC to implement components of a zero-trust environment, where access is denied by default, and enabled only when all entities have been authenticated and authorized.
Cloud Audit Logs and Access Transparency
Half (50%) of organizations consider active monitoring of their most sensitive data use to be one of the highest priorities for protecting their cloud-resident analytics data sets, and 41% consider actively monitoring user access to be one. Further, 30% said that having an audit trail of the CSP’s employee access to the infrastructure hosting their cloud-resident analytics data was one of the most effective means of securing the data, while 18% valued user behavior analytics (UBA) and 14% valued an audit trail of user access.
Thus, almost three-quarters (74%) of organizations are planning to deploy or have deployed UBA or monitoring solutions as a means to secure their cloud-resident analytics data.
However, organizations are not satisfied with current capabilities, as just 22% said that there was no need for improvement of audit trails of accounts used to access analytics data sets. While 30% of organizations said that lack of visibility into the movement of data was a contributing factor into suspected or actual loss of cloud-related analytics data, 37% of organizations said they are challenged by attempting to identify when data is being accessed by unauthorized users, the second highest challenge for protecting cloud-resident analytics data sets. Fourteen percent said that an additional challenge was insufficient access audit trails and logs.
Google Cloud Audit Logs
Cloud Logging provides real-time log management and analysis facilities, enabling organizations to store, search, analyze, monitor, and alert on log data and events from Google Cloud.
Google Cloud maintains three audit logs for each Google Cloud project, folder, and organization. API calls or other actions that modify the configuration or metadata of resources are logged in the Admin Activity audit log. API calls that create, modify, or read user-provided resource data, and API calls that read the configuration or metadata of resources are logged in the Data Access audit log.
Access transparency logs contain entries for Google support and engineering access to an organization’s Google Cloud resources. Very few Google employees can access an organization’s data. Physical access to Google Cloud facilities is severely restricted and requires real-time authorization using an employee badge. All facility access is logged and audited by Google.
Google has automated most common support tasks to limit the need for Google employees to log in to VMs or otherwise access an organization’s data. Internal access and dashboards use Google Cloud’s DLP tools to automatically redact PII and other sensitive information.
For the rare case where a Google employee needs access to an organization’s data, Google Cloud employs fine-grained access controls to limit access, and the following requirements must be met before access is granted:
- Access must happen via a Google-owned and managed device; personal machines are prevented via context-aware access controls.
- The Google employee must be identified and authorized using a phishing-resistant hardware token.
- The Google employee must provide a valid and current support ticket identifier.
- The organization must grant approval.
Once all conditions have been met, Google Cloud generates a cryptographic identity that allows access and logs every action taken by the Google employee in the access transparency log.
Access Approval requests, when combined with Access Transparency logs, can be used to audit an end-to-end chain from support ticket to access request to approval, to eventual access.
The benefits of log monitoring include:
- Increased security—store, search, analyze, monitor, and alert on log data and events in real time to ensure no unauthorized access by users or Google employees has occurred.
- Customization—write any custom log entry from any source.
- Alerting—set alerts on any log event or custom log metric.
- Automation and orchestration—trigger actions based on log events or custom log metrics.
From the Google Cloud console, ESG selected logging, which brought up the dashboard, as shown in Figure 19. The dashboard displays a filterable list of all log events, and we could filter based on the log type, log level, time frame, and Google Cloud resource. We could also use freeform text and regular expressions to search and filter the logs.
We expanded a log entry by clicking on the triangle on the left. Log entries are stored and displayed as JSON objects, making log entries both human- and machine-readable. We noted that the log captured that Google Cloud granted permission for email@example.com to take the action storage.buckets.create.
We also noted that data access logging did not capture data read requests. Most data accesses are read requests and logging every read request produces many log entries and consumes a significant amount of storage. Thus, logging read operations is an optional feature that must be enabled by the organization.
Google Cloud retains the Admin Activity log entries for 400 days, and all other log entries for 30 days. Organizations that need to retain data for longer can export the data. Organizations can also export log data for archiving, data retention, and auditing. Log data can be exported directly into BigQuery for analysis and can be sent to external SIEMs (security information and event management) or other log analysis tools.
ESG selected CREATE EXPORT from the top of the screen. The export panel was displayed on the right, as shown in Figure 20. We clicked on the pulldown icon at the right of the filter bar and selected convert to advanced filter, which enabled us to see and modify the filter details and then export the filtered data. On the right, we named the export and configured the destination or sink. Google Cloud supports directly exporting to BigQuery, Cloud Storage, and Cloud Pub/Sub. We selected Cloud Storage and specified a storage bucket, then clicked Create Sink, which saved the filtered log entries in the storage bucket.
Organizations can use log exporting via Cloud Pub/Sub to drive automation and orchestration. Following the same steps to export a log, ESG created a new export, filtering logs for compute.instances.insert, which is the creation of a new VM. As shown in Figure 21, we configured the export to sink to Cloud Pub/Sub. Every time a new log entry is generated matching the filter, the log entry is published under the topic auditLogNewInstance.
From the console, we selected Cloud Functions, and created a new cloud function called addLabelToNewVMs. We specified a trigger type of Cloud Pub/Sub, and selected auditLogNewInstance as the topic. We used a Google supplied function that added a label to the VM based on the log entry information contained in the pub/sub message.
Organizations can create alerts based on log entries, and the alerts can trigger actions to automate activity. ESG’s first step in creating an alert was to create a log metric. After customizing the filter for the action storage.buckets.delete on the bucket very-important-bucket, ESG selected CREATE METRIC from the top of the window, as shown in Figure 22. We entered a name, specified the metric type as a counter, and then selected CREATE METRIC.
Next, we switched to the Google Operations console (formerly known as the Stackdriver console), and selected alerting, then added a policy to create the alert based on the metric we had defined. We selected Add Condition, and from the list of available metrics, selected our metric, veryImportantBucketIsGone, and then set the alert to be triggered when the metric value is above zero. We then saved the condition. We then configured an email address to receive notifications when the alert is triggered, provided a name for the alert, and saved the new alert policy.
As the last step, we reviewed an alert email generated by the new alert policy.
The access transparency logs contain entries for access to an organization’s Google Cloud resources made by Google employees. Google employees can access an organization’s resources only when approved, and the internal Google tools require a valid support case before granting access, preventing random employees from accessing sensitive data.
An organization can configure Google Cloud to require the organization’s approval before the Google employee is granted access. As shown in Figure 23, the Google Cloud support tool requires the employee to provide a valid business justification to the organization when requesting access.
From the console, we selected Security, then Access approval, which displayed the list of pending Google employee access requests. We noted the case number was the same as the case number provided by Google Cloud support. We selected the request and then selected APPROVE to approve the request, enabling the support technician to access our Google Cloud resources.
Next, we went back to the logging dashboard and filtered the logs for access_transparency, as shown in Figure 24. We noted that the log contained every access made by the support technician, and each log entry contained the employee identifier and the support case number. We also noted that the approval request and our approval response were included in the access transparency log.
Why This Matters
Monitoring and auditing data use may be the most mundane part of cybersecurity, requiring the least amount of technology. However, monitoring data access—by both the employees and the cloud service provider—can help prevent data loss, reveal malicious activity, and is one of the highest priorities for protecting their cloud-resident data reported by ESG research respondents.
ESG validated that Google Cloud Audit Logs enable organizations to monitor data. Using the Google Cloud Logging console, we could rapidly filter the logs, searching any access type by any account. We could readily identify data access as well as administrative actions such as VM create or delete. We were able to export logs for archival and forensics use. Exporting logs filtered for specific activities enabled us to automate actions based on logged activity, enhancing productivity and increasing security. Logs contained both customer admin and provider (Google) admin activity.
ESG validated the simplicity of configuring access approvals, ensuring that Google employees could only access our resources and data after our approval. We noted that access requests contained the Google support case number, and that all accesses were recorded in the access transparency log. Each log entry contained the case number, the employee identifier, and the action taken. We found that the access transparency log can alleviate concerns regarding Google access to an organization’s data and resources.
The Bigger Truth
Cloud infrastructures are becoming a preferred platform for storing data, and 91% of organizations feel their cloud-resident sensitive analytics data sets are sufficiently secured. Employing more security controls increases confidence: those with many security controls are 2.2x more likely than those with fewer controls to definitively feel their sensitive data is secure. Similarly, organizations with many security controls will double the amount of sensitive analytics data stored in the cloud in ~2.8 years (compared to ~6.5 years for those with fewer security controls).
ESG validated that five aspects of Google Cloud Platform—encryption, DLP, IAM, protection from data exfiltration, and access transparency—contribute to increased security without increasing the cybersecurity management workload. Our evaluation revealed:
- Google Cloud encrypted data by default. Creating software- or HSM-based encryption keys, or using external keys, was fast and easy. Google Cloud’s encryption capabilities enable organizations to control access and protect their sensitive cloud-resident data regardless of the size and complexity of their environment.
- Cloud DLP simplified and automated the discovery, classification, and redaction of cloud-sensitive data, ensuring privacy and preventing unauthorized data access. Tokenization enabled complex correlations and analyses while protecting sensitive data.
- Cloud IAM provided fine-grained access control over Google Cloud resources and cloud-resident data, and the IAM recommender applied ML and behavioral analytics to suggest access changes, helping organizations apply the principle of least privilege access. Custom roles and user groups simplified the management of security policies and can help large organizations with diverse access requirements.
- Google Cloud VPC Service Controls enabled us to define a security perimeter around cloud-resident data and Google Cloud resources, mitigating the risk of data exfiltration. VPC Service Controls can be part of a defense-in-depth strategy, enabling organizations to apply the principle of least privilege access, and to implement zero-trust security models.
- Google Cloud Audit Logs recorded every data access and administrative action, enabling organizations to monitor access. Exporting features enabled automation and orchestration of activities based upon logged actions. Google access transparency logs combined with the Google access approval process ensured that organizations are cognizant of any Google access to the organization’s data or resources.
Organizations needing to protect their sensitive data should thoroughly investigate the efficacy, functionality, and operational capabilities of the cybersecurity controls before purchasing or deploying any cloud-based solutions.
Security is a core principle driving the design and development of Google Cloud, and security is prevalent in every Google Cloud service. Google has developed a suite of integrated security controls that simplify securing an organization’s cloud-resident data and can operate in complex hybrid multi-cloud environments. If your organization is looking to streamline the effort to protect your cloud-resident data, then ESG believes that you should consider how Google Cloud’s encryption, DLP, IAM, VPC Service Controls, and logging can efficiently and effectively help mitigate risk and defend your cloud-resident critical assets.
1. Source: ESG Research Survey, Securing Cloud-resident Data Analytics Data Sets, conducted on behalf of Google, December 2019. All ESG research references and charts in this technical validation have been taken from this survey.↩
ESG Technical Validations
The goal of ESG Technical Validations is to educate IT professionals about information technology solutions for companies of all types and sizes. ESG Technical Validations are not meant to replace the evaluation process that should be conducted before making purchasing decisions, but rather to provide insight into these emerging technologies. Our objectives are to explore some of the more valuable features and functions of IT solutions, show how they can be used to solve real customer problems, and identify any areas needing improvement. The ESG Validation Team’s expert third-party perspective is based on our own hands-on testing as well as on interviews with customers who use these products in production environments.