Professional-Cloud-DevOps-Engineer Sample Questions Answers

Questions 4

Your company runs applications in Google Kubernetes Engine (GKE). Several applications rely on ephemeral volumes. You noticed some applications were unstable due to the DiskPressure node condition on the worker nodes. You need

to identify which Pods are causing the issue, but you do not have execute access to workloads and nodes. What should you do?

Options:

Check the node/ephemeral_storage/used_bytes metric by using Metrics Explorer.

Check the metric by using Metrics Explorer.

Locate all the Pods with emptyDir volumes. use the df-h command to measure volume disk usage.

Locate all the Pods with emptyDir volumes. Use the du -sh * command to measure volume disk usage.

Buy Now

Questions 5

You currently store the virtual machine (VM) utilization logs in Stackdriver. You need to provide an easy-to-share interactive VM utilization dashboard that is updated in real time and contains information aggregated on a quarterly basis. You want to use Google Cloud Platform solutions. What should you do?

Options:

1. Export VM utilization logs from Stackdriver to BigOuery.2. Create a dashboard in Data Studio.3. Share the dashboard with your stakeholders.

1. Export VM utilization logs from Stackdriver to Cloud Pub/Sub.2. From Cloud Pub/Sub, send the logs to a Security Information and Event Management (SIEM) system.3. Build the dashboards in the SIEM system and share with your stakeholders.

1. Export VM utilization logs (rom Stackdriver to BigQuery.2. From BigQuery. export the logs to a CSV file.3. Import the CSV file into Google Sheets.4. Build a dashboard in Google Sheets and share it with your stakeholders.

1. Export VM utilization logs from Stackdriver to a Cloud Storage bucket.2. Enable the Cloud Storage API to pull the logs programmatically.3. Build a custom data visualization application.4. Display the pulled logs in a custom dashboard.

Buy Now

Questions 6

You are developing reusable infrastructure as code modules. Each module contains integration tests that launch the module in a test project. You are using GitHub for source control. You need to Continuously test your feature branch and ensure that all code is tested before changes are accepted. You need to implement a solution to automate the integration tests. What should you do?

Options:

Use a Jenkins server for Cl/CD pipelines. Periodically run all tests in the feature branch.

Use Cloud Build to run the tests. Trigger all tests to run after a pull request is merged.

Ask the pull request reviewers to run the integration tests before approving the code.

Use Cloud Build to run tests in a specific folder. Trigger Cloud Build for every GitHub pull request.

Buy Now

Answer:

Explanation:

Cloud Build is a service that executes your builds on Google Cloud Platform infrastructure. Cloud Build can import source code from Google Cloud Storage, Cloud Source Repositories, GitHub, or Bitbucket, execute a build to your specifications, and produce artifacts such as Docker containers or Java archives1. Cloud Build can also run integration tests as part of your build steps2.

You can use Cloud Build to run tests in a specific folder by specifying the path to the folder in the dir field of your build step3. For example, if you have a folder named tests that contains your integration tests, you can use the following build step to run them:

steps:

- name: 'gcr.io/cloud-builders/go'

args: ['test', '-v']

dir: 'tests'

Copy

You can use Cloud Build to trigger builds for every GitHub pull request by using the Cloud Build GitHub app. The app allows you to automatically build on Git pushes and pull requests and view your build results on GitHub and Google Cloud console4. You can configure the app to run builds on specific branches, tags, or paths5. For example, if you want to run builds on pull requests that target the master branch, you can use the following trigger configuration:

includedFiles:

- '**'

name: 'pull-request-trigger'

github:

name: 'my-repo'

owner: 'my-org'

pullRequest:

branch: '^master$'

Using Cloud Build to run tests in a specific folder and trigger builds for every GitHub pull request is a good way to continuously test your feature branch and ensure that all code is tested before changes areaccepted. This way, you can catch any errors or bugs early and prevent them from affecting the main branch.

Using a Jenkins server for CI/CD pipelines is not a bad option, but it would require more setup and maintenance than using Cloud Build, which is fully managed by Google Cloud. Periodically running all tests in the feature branch is not as efficient as running tests for every pull request, as it may delay the feedback loop and increase the risk of conflicts or failures.

Using Cloud Build to run the tests after a pull request is merged is not a good practice, as it may introduce errors or bugs into the main branch that could have been prevented by testing before merging.

Asking the pull request reviewers to run the integration tests before approving the code is not a reliable way of ensuring code quality, as it depends on human intervention and may be prone to errors or oversights.

Questions 7

You are creating a CI/CD pipeline in Cloud Build to build an application container image The application code is stored in GitHub Your company requires thai production image builds are only run against the main branch and that the change control team approves all pushes to the main branch You want the image build to be as automated as possible What should you do?

Choose 2 answers

Options:

Create a trigger on the Cloud Build job Set the repository event setting to Pull request'

Add the owners file to the Included files filter on the trigger

Create a trigger on the Cloud Build job Set the repository event setting to Push to a branch

Configure a branch protection rule for the main branch on the repository

Enable the Approval option on the trigger

Buy Now

Questions 8

Your company runs services by using multiple globally distributed Google Kubernetes Engine (GKE) clusters Your operations team has set up workload monitoring that uses Prometheus-based tooling for metrics alerts: and generating dashboards This setup does not provide a method to view metrics globally across all clusters You need to implement a scalable solution to support global Prometheus querying and minimize management overhead What should you do?

Options:

Configure Prometheus cross-service federation for centralized data access

Configure workload metrics within Cloud Operations for GKE

Configure Prometheus hierarchical federation for centralized data access

Configure Google Cloud Managed Service for Prometheus

Buy Now

Questions 9

You support an application running on GCP and want to configure SMS notifications to your team for the most critical alerts in Stackdriver Monitoring. You have already identified the alerting policies you want to configure this for. What should you do?

Options:

Download and configure a third-party integration between Stackdriver Monitoring and an SMS gateway. Ensure that your team members add their SMS/phone numbers to the external tool.

Select the Webhook notifications option for each alerting policy, and configure it to use a third-party integration tool. Ensure that your team members add their SMS/phone numbers to the external tool.

Ensure that your team members set their SMS/phone numbers in their Stackdriver Profile. Select the SMS notification option for each alerting policy and then select the appropriate SMS/phone numbers from the list.

Configure a Slack notification for each alerting policy. Set up a Slack-to-SMS integration to send SMS messages when Slack messages are received. Ensure that your team members add their SMS/phone numbers to the external integration.

Buy Now

Questions 10

Your application images are built using Cloud Build and pushed to Google Container Registry (GCR). You want to be able to specify a particular version of your application for deployment based on the release version tagged in source control. What should you do when you push the image?

Options:

Reference the image digest in the source control tag.

Supply the source control tag as a parameter within the image name.

Use Cloud Build to include the release version tag in the application image.

Use GCR digest versioning to match the image to the tag in source control.

Buy Now

Questions 11

You encounter a large number of outages in the production systems you support. You receive alerts for all the outages that wake you up at night. The alerts are due to unhealthy systems that are automatically restarted within a minute. You want to set up a process that would prevent staff burnout while following Site Reliability Engineering practices. What should you do?

Options:

Eliminate unactionable alerts.

Create an incident report for each of the alerts.

Distribute the alerts to engineers in different time zones.

Redefine the related Service Level Objective so that the error budget is not exhausted.

Buy Now

Questions 12

You are monitoring a service that uses n2-standard-2 Compute Engine instances that serve large files. Users have reported that downloads are slow. Your Cloud Monitoring dashboard shows that your VMS are running at peak network throughput. You want to improve the network throughput performance. What should you do?

Options:

Deploy a Cloud NAT gateway and attach the gateway to the subnet of the VMS.

Add additional network interface controllers (NICs) to your VMS.

Change the machine type for your VMS to n2-standard-8.

Deploy the Ops Agent to export additional monitoring metrics.

Buy Now

Questions 13

Your organization is using Helm to package containerized applications Your applications reference both public and private charts Your security team flagged that using a public Helm repository as a dependency is a risk You want to manage all charts uniformly, with native access control and VPC Service Controls What should you do?

Options:

Store public and private charts in OCI format by using Artifact Registry

Store public and private charts by using GitHub Enterprise with Google Workspace as the identity provider

Store public and private charts by using Git repository Configure Cloud Build to synchronize contents of the repository into a Cloud Storage bucket Connect Helm to the bucket by using https: // [bucket] .srorage.googleapis.com/ [holnchart] as the Helm repository

Configure a Helm chart repository server to run in Google Kubernetes Engine (GKE) with Cloud Storage bucket as the storage backend

Buy Now

Questions 14

Your company allows teams to self-manage Google Cloud projects, including project-level Identity and Access Management (IAM). You are concerned that the team responsible for the Shared VPC project might accidentally delete the project, so a lien has been placed on the project. You need to design a solution to restrict Shared VPC project deletion to those with the resourcemanager.projects.updateLiens permission at the organization level. What should you do?

Options:

Enable VPC Service Controls for the container.googleapis.com API service.

Revoke the resourcemanager.projects.updateLiens permission from all users associated with the project.

Enable the compute.restrictXpnProjectLienRemoval organization policy constraint.

Instruct teams to only perform IAM permission management as code with Terraform.

Buy Now

Answer:

Explanation:

Comprehensive and Detailed Explanation From General Google Cloud IAM and Organization Policy Knowledge:

The core requirement is to prevent accidental deletion of a Shared VPC host project, even by project owners, by ensuring that only users with a specific permission at the organization level can remove the lien that protects the project.

A lien (resourcemanager.projects.delete) has already been placed on the project. This prevents its deletion. The challenge is to prevent the removal of this lien by project-level administrators.

The permission to remove a lien is resourcemanager.projectLiens.update (or resourcemanager.projects.updateLiens as stated in the question, which implies a broader update capability including liens).

Option A (Enable VPC Service Controls for the container.googleapis.com API service): VPC Service Controls are for data exfiltration prevention by creating service perimeters. They do not directly control IAM permissions for lien management or project deletion.

Option B (Revoke the resourcemanager.projects.updateLiens permission from all users associated with the project): While this would prevent project-level users from removing the lien, it doesn't enforce therequirement that only users with this permission at the organization level can remove it. A project owner could potentially re-grant themselves this permission at the project level if not otherwise restricted. The goal is a stronger, centrally enforced restriction.

Option C (Enable the compute.restrictXpnProjectLienRemoval organization policy constraint): This is specifically designed for the scenario described.Organization Policies allow centralized control over resource configurations across the organization.

The compute.restrictXpnProjectLienRemoval constraint, when enforced (set to True), restricts the removal of liens on Shared VPC host projects. Only users who have the resourcemanager.projectLiens.update permission (or resourcemanager.projects.updateLiens) granted at the organization level can then remove such liens. This prevents project owners or other project-level principals from removing the lien unless they also have this specific permission at the org level.

Option D (Instruct teams to only perform IAM permission management as code with Terraform): While Infrastructure as Code (IaC) is a good practice for managing IAM, it's an operational guideline and doesn't technically enforce the restriction on lien removal. A user with sufficient project-level IAM permissions could still manually remove the lien via the console or gcloud if not prevented by an organization policy.

Therefore, enabling the compute.restrictXpnProjectLienRemoval organization policy is the direct and most effective way to meet the requirement.

Reference (Based on Google Cloud Organization Policy and Shared VPC documentation):

Google Cloud documentation on Resource Manager Liens: https://cloud.google.com/resource-manager/docs/project-liens

Google Cloud documentation on Organization Policy Constraints: https://cloud.google.com/resource-manager/docs/organization-policy/org-policy-constraints

Specifically, the compute.restrictXpnProjectLienRemoval constraint: "When set to true, liens on Shared VPC host projects can only be removed by users that have resourcemanager.projectLiens.update permission on the organization." (or similar wording indicating org-level permission is required). This constraint ensures that the protection afforded by the lien on a critical Shared VPC host project cannot be easily circumvented at the project level.

Questions 15

Your application’s performance in Google Cloud has degraded since the last release. You suspect that downstream dependencies might be causing some requests to take longer to complete. You need to investigate the issue with your application to determine the cause. What should you do?

Options:

Configure Cloud Trace in your application.

Configure Error Reporting in your application.

Configure Cloud Profiler in your application.

Configure Google Cloud Managed Service for Prometheus in your application.

Buy Now

Questions 16

You need to introduce postmortems into your organization during the holiday shopping season. You are expecting your web application to receive a large volume of traffic in a short period. You need to prepare your application for potential failures during the event What should you do?

Choose 2 answers

Options:

Monitor latency of your services for average percentile latency.

Review your increased capacity requirements and plan for the required quota management.

Create alerts in Cloud Monitoring for all common failures that your application experiences.

Ensure that relevant system metrics are being captured with Cloud Monitoring and create alerts at levels of interest.

Configure Anthos Service Mesh on the application to identify issues on the topology map.

Buy Now

Questions 17

Your company runs an ecommerce website built with JVM-based applications and microservice architecture in Google Kubernetes Engine (GKE) The application load increases during the day and decreases during the night Your operations team has configured the application to run enough Pods to handle the evening peak load You want to automate scaling by only running enough Pods and nodes for the load What should you do?

Options:

Configure the Vertical Pod Autoscaler but keep the node pool size static

Configure the Vertical Pod Autoscaler and enable the cluster autoscaler

Configure the Horizontal Pod Autoscaler but keep the node pool size static

Configure the Horizontal Pod Autoscaler and enable the cluster autoscaler

Buy Now

Questions 18

Your company uses a CI/CD pipeline with Cloud Build and Artifact Registry to deploy container images to Google Kubernetes Engine (GKE). Images are tagged with the latest commit hash and promoted to production after successful testing in the development and pre-production environments. A recent production deployment caused the application to fail due to untested integration functionality, requiring a disruptive manual rollback. During the rollback, you noticed many old and unused container images accumulating in Artifact Registry. You need to improve rollout and rollback management and clean up the old container images. What should you do?

Options:

Adopt Cloud Deploy for managing deployments, and schedule a Cloud Build job for container image cleanup.

Deploy Cloud Service Mesh across the GKE clusters, and manually clean up Artifact Registry images.

Adopt Cloud Deploy for managing deployments, and implement an Artifact Registry cleanup policy.

Set up a rollback pipeline in Cloud Build, and implement an Artifact Registry cleanup policy.

Buy Now

Questions 19

Your company runs services by using Google Kubernetes Engine (GKE). The GKE clusters in the development environment run applications with verbose logging enabled. Developers view logs by using the kubect1 logs

command and do not use Cloud Logging. Applications do not have a uniform logging structure defined. You need to minimize the costs associated with application logging while still collecting GKE operational logs. What should you do?

Options:

Run the gcloud container clusters update --logging—SYSTEM command for the development cluster.

Run the gcloud container clusters update logging=WORKLOAD command for the development cluster.

Run the gcloud logging sinks update _Defau1t --disabled command in the project associated with the development environment.

Add the severity >= DEBUG resource. type "k83 container" exclusion filter to the Default logging sink in the project associated with the development environment.

Buy Now

Questions 20

You support a popular mobile game application deployed on Google Kubernetes Engine (GKE) across several Google Cloud regions. Each region has multiple Kubernetes clusters. You receive a report that none of the users in a specific region can connect to the application. You want to resolve the incident while following Site Reliability Engineering practices. What should you do first?

Options:

Reroute the user traffic from the affected region to other regions that don’t report issues.

Use Stackdriver Monitoring to check for a spike in CPU or memory usage for the affected region.

Add an extra node pool that consists of high memory and high CPU machine type instances to the cluster.

Use Stackdriver Logging to filter on the clusters in the affected region, and inspect error messages in the logs.

Buy Now

Questions 21

You are ready to deploy a new feature of a web-based application to production. You want to use Google Kubernetes Engine (GKE) to perform a phased rollout to half of the web server pods.

What should you do?

Options:

Use a partitioned rolling update.

Use Node taints with NoExecute.

Use a replica set in the deployment specification.

Use a stateful set with parallel pod management policy.

Buy Now

Questions 22

Your company follows Site Reliability Engineering practices. You are the Incident Commander for a new. customer-impacting incident. You need to immediately assign two incident management roles to assist you in an effective incident response. What roles should you assign?

Choose 2 answers

Options:

Operations Lead

Engineering Lead

Communications Lead

Customer Impact Assessor

External Customer Communications Lead

Buy Now

Questions 23

You have an application running in Google Kubernetes Engine. The application invokes multiple services per request but responds too slowly. You need to identify which downstream service or services are causing the delay. What should you do?

Options:

Analyze VPC flow logs along the path of the request.

Investigate the Liveness and Readiness probes for each service.

Create a Dataflow pipeline to analyze service metrics in real time.

Use a distributed tracing framework such as OpenTelemetry or Stackdriver Trace.

Buy Now

Questions 24

You recently noticed that one Of your services has exceeded the error budget for the current rolling window period. Your company's product team is about to launch a new feature. You want to follow Site Reliability Engineering (SRE) practices.

What should you do?

Options:

Notify the team that their error budget is used up. Negotiate with the team for a launch freeze or tolerate a slightly worse user experience.

Look through other metrics related to the product and find SLOs with remaining error budget. Reallocate the error budgets and allow the feature launch.

Escalate the situation and request additional error budget.

Notify the team about the lack of error budget and ensure that all their tests are successful so the launch will not further risk the error budget.

Buy Now

Questions 25

You manage an application that runs in Google Kubernetes Engine (GKE) and uses the blue/green deployment methodology Extracts of the Kubernetes manifests are shown below:

The Deployment app-green was updated to use the new version of the application During post-deployment monitoring you notice that the majority of user requests are failing You did not observe this behavior in the testing environment You need to mitigate the incident impact on users and enable the developers to troubleshoot the issue What should you do?

Options:

Update the Deployment app-blue to use the new version of the application

Update the Deployment ape-green to use the previous version of the application

Change the selector on the Service app-2vc to app: my-app.

Change the selector on the Service app-svc to app: my-app, version: blue

Buy Now

Questions 26

You want to share a Cloud Monitoring custom dashboard with a partner team What should you do?

Options:

Provide the partner team with the dashboard URL to enable the partner team to create a copy of the dashboard

Export the metrics to BigQuery Use Looker Studio to create a dashboard, and share the dashboard with the partner team

Copy the Monitoring Query Language (MQL) query from the dashboard; and send the MQL query to the partner team

Download the JSON definition of the dashboard, and send the JSON file to the partner team

Buy Now

Questions 27

You support a Node.js application running on Google Kubernetes Engine (GKE) in production. The application makes several HTTP requests to dependent applications. You want to anticipate which dependent applications might cause performance issues. What should you do?

Options:

Instrument all applications with Stackdriver Profiler.

Instrument all applications with Stackdriver Trace and review inter-service HTTP requests.

Use Stackdriver Debugger to review the execution of logic within each application to instrument all applications.

Modify the Node.js application to log HTTP request and response times to dependent applications. Use Stackdriver Logging to find dependent applications that are performing poorly.

Buy Now

Questions 28

You have migrated an e-commerce application to Google Cloud Platform (GCP). You want to prepare the application for the upcoming busy season. What should you do first to prepare for the busy season?

Options:

Load teat the application to profile its performance for scaling.

Enable AutoScaling on the production clusters, in case there is growth.

Pre-provision double the compute power used last season, expecting growth.

Create a runbook on inflating the disaster recovery (DR) environment if there is growth.

Buy Now

Questions 29

You are using Terraform to manage infrastructure as code within a Cl/CD pipeline You notice that multiple copies of the entire infrastructure stack exist in your Google Cloud project, and a new copy is created each time a change to the existing infrastructure is made You need to optimize your cloud spend by ensuring that only a single instance of your infrastructure stack exists at a time. You want to follow Google-recommended practices What should you do?

Options:

Create a new pipeline to delete old infrastructure stacks when they are no longer needed

Confirm that the pipeline is storing and retrieving the terraform. if state file from Cloud Storage with the Terraform gcs backend

Verify that the pipeline is storing and retrieving the terrafom.tfstat* file from a source control

Update the pipeline to remove any existing infrastructure before you apply the latest configuration

Buy Now

Questions 30

Your team of Infrastructure DevOps Engineers is growing, and you are starting to use Terraform to manage infrastructure. You need a way to implement code versioning and to share code with other team members. What should you do?

Options:

Store the Terraform code in a version-control system. Establish procedures for pushing new versions and merging with the master.

Store the Terraform code in a network shared folder with child folders for each version release. Ensure that everyone works on different files.

Store the Terraform code in a Cloud Storage bucket using object versioning. Give access to the bucket to every team member so they can download the files.

Store the Terraform code in a shared Google Drive folder so it syncs automatically to every team member’s computer. Organize files with a naming convention that identifies each new version.

Buy Now

Questions 31

You use Cloud Build to build and deploy your application. You want to securely incorporate database credentials and other application secrets into the build pipeline. You also want to minimize the development effort. What should you do?

Options:

Create a Cloud Storage bucket and use the built-in encryption at rest. Store the secrets in the bucket and grant Cloud Build access to the bucket.

Encrypt the secrets and store them in the application repository. Store a decryption key in a separate repository and grant Cloud Build access to the repository.

Use client-side encryption to encrypt the secrets and store them in a Cloud Storage bucket. Store a decryption key in the bucket and grant Cloud Build access to the bucket.

Use Cloud Key Management Service (Cloud KMS) to encrypt the secrets and include them in your Cloud Build deployment configuration. Grant Cloud Build access to the KeyRing.

Buy Now

Questions 32

You built a serverless application by using Cloud Run and deployed the application to your production environment You want to identify the resource utilization of the application for cost optimization What should you do?

Options:

Use Cloud Trace with distributed tracing to monitor the resource utilization of the application

Use Cloud Profiler with Ops Agent to monitor the CPU and memory utilization of the application

Use Cloud Monitoring to monitor the container CPU and memory utilization of the application

Use Cloud Ops to create logs-based metrics to monitor the resource utilization of the application

Buy Now

Questions 33

You have an application deployed to Cloud Run. A new version of the application has recently been deployed using the canary deployment strategy. Your Site Reliability Engineering (SRE) teammate informs you that an SLO has been exceeded for this application. You need to make the application healthy as quickly as possible. What should you do first?

Options:

Configure traffic splitting to send 100% of the traffic to the latest revision.

Configure traffic splitting to send 100% of the traffic to the previous revision.

Create a new revision using the last known good version of the application.

Identify the cause of the latency by using Cloud Trace.

Buy Now

Questions 34

Your company operates in a highly regulated domain that requires you to store all organization logs for seven years You want to minimize logging infrastructure complexity by using managed services You need to avoid any future loss of log capture or stored logs due to misconfiguration or human error What should you do?

Options:

Use Cloud Logging to configure an aggregated sink at the organization level to export all logs into a BigQuery dataset

Use Cloud Logging to configure an aggregated sink at the organization level to export all logs into Cloud Storage with a seven-year retention policy and Bucket Lock

Use Cloud Logging to configure an export sink at each project level to export all logs into a BigQuery dataset

Use Cloud Logging to configure an export sink at each project level to export all logs into Cloud Storage with a seven-year retention policy and Bucket Lock

Buy Now

Questions 35

You need to reduce the cost of virtual machines (VM| for your organization. After reviewing different options, you decide to leverage preemptible VM instances. Which application is suitable for preemptible VMs?

Options:

A scalable in-memory caching system

The organization's public-facing website

A distributed, eventually consistent NoSQL database cluster with sufficient quorum

A GPU-accelerated video rendering platform that retrieves and stores videos in a storage bucket

Buy Now

Questions 36

You manage your company's primary revenue-generating application. You have an error budget policy in place that freezes production deployments when the application is close to breaching its SLO. A number of issues have recently occurred, and the application has exhausted its error budget. You need to deploy a new release to the application that includes a feature urgently required by your largest customer. You have been told that the release has passed all unit tests. What should you do?

Options:

Start the deployment of the feature immediately.

Delay the deployment of the feature until the error budget is replenished.

Re-run the unit tests, and start the deployment of the feature if the tests pass.

Deploy the feature to a subset of users, and gradually roll out to all users if there are no errors reported.

Buy Now

Answer:

Explanation:

Comprehensive and Detailed Explanation From SRE Principles:

This scenario presents a classic SRE conflict: maintaining reliability (as dictated by the exhausted error budget and deployment freeze) versus delivering an urgent business requirement. The error budget policy is there for a reason – to protect users from further instability.

A. Start the deployment of the feature immediately: This directly violates the established error budget policy and the deployment freeze. While the feature is urgent, deploying without caution when the system is already unstable (as indicated by the exhausted error budget) is highly risky and could exacerbate existing problems or introduce new ones, further impacting revenue and customer trust.

B. Delay the deployment of the feature until the error budget is replenished: This strictly adheres to the policy but might not be acceptable given the "urgently required by your largest customer" clause. SRE principles allow for reasoned exceptions and risk management, not just blind adherence if the business context is compelling enough and risks are managed.

C. Re-run the unit tests, and start the deployment of the feature if the tests pass: Unit tests are foundational but insufficient to guarantee a complex application will perform reliably in production, especially when the system is already indicating instability (exhausted error budget). Passing unit tests doesn't negate the risk signaled by the depleted error budget.

D. Deploy the feature to a subset of users, and gradually roll out to all users if there are no errors reported: This is the most balanced SRE approach in this situation. It acknowledges the urgency while attempting to mitigate risk:Risk Mitigation: A canary release (deploying to a small subset of users) limits the potential negative impact if the new feature introduces new errors or worsens existing instability.

Observation: It allows for careful monitoring of the new release in the production environment with real users.

Data-Driven Decision: The decision to proceed with a wider rollout is based on observed behavior ("if there are no errors reported"), not just assumptions.

Controlled Rollout: A gradual rollout allows for quick rollback if issues arise.

While an exhausted error budget signals a deployment freeze, critical business needs can sometimes necessitate a carefully managed exception. A canary release is a standard SRE technique for deploying changes with reduced risk, making it the most appropriate course of action when faced with such conflicting priorities. The team would also need to communicate clearly about the risks and the rationale for this exception. It's implied that this urgent feature might also fix existing issues or is critical enough to warrant the carefully managed risk.

Reference (Based on SRE principles from Google's SRE books and general practices):

Error Budgets: "The SRE Book" (Site Reliability Engineering: How Google Runs Production Systems) discusses error budgets and deployment freezes. An exhausted error budget typically means no more risky changes until reliability improves.

Canary Releases: This is a fundamental practice for safely deploying new versions. It's about testing in production with a small percentage of traffic.

Managing Risk: SRE is about managing risk, not eliminating it entirely. In situations like this, a calculated risk with strong mitigation (canary, monitoring, rollback plan) can be justified for critical business needs. The decision involves weighing the risk of deploying against the risk of not deploying the urgent feature.

Option D represents a pragmatic SRE approach to navigate this difficult situation by minimizing the blast radius of the change.

Questions 37

Your Cloud Run application writes unstructured logs as text strings to Cloud Logging. You want to convert the unstructured logs to JSON-based structured logs. What should you do?

Options:

A Install a Fluent Bit sidecar container, and use a JSON parser.

Install the log agent in the Cloud Run container image, and use the log agent to forward logs to Cloud Logging.

Configure the log agent to convert log text payload to JSON payload.

Modify the application to use Cloud Logging software development kit (SDK), and send log entries with a jsonPay10ad field.

Buy Now

Answer:

Explanation:

The correct answer is D. Modify the application to use Cloud Logging software development kit (SDK), and send log entries with a jsonPayload field.

Cloud Logging SDKs are libraries that allow you to write structured logs from your Cloud Run application. You can use the SDKs to create log entries with a jsonPayload field, which contains a JSON object with the properties of your log entry.The jsonPayload field allows you to use advanced features of Cloud Logging, such as filtering, querying, and exporting logs based on the properties of your log entry1.

To use Cloud Logging SDKs, you need to install the SDK for your programming language, and then use the SDK methods to create and send log entries to Cloud Logging.For example, if you are using Node.js, you can use the following code to write a structured log entry with a jsonPayload field2:

// Imports the Google Cloud client library

const {Logging} = require('@google-cloud/logging');

// Creates a client

const logging = new Logging();

// Selects the log to write to

const log = logging.log('my-log');

// The data to write to the log

const text = 'Hello, world!';

const metadata = {

// Set the Cloud Run service name and revision as labels

labels: {

service_name: process.env.K_SERVICE || 'unknown',

revision_name: process.env.K_REVISION || 'unknown',

// Set the log entry payload type and value

jsonPayload: {

message: text,

timestamp: new Date(),

};

// Prepares a log entry

const entry = log.entry(metadata);

// Writes the log entry

await log.write(entry);

console.log(`Logged: ${text}`);

Using Cloud Logging SDKs is the best way to convert unstructured logs to structured logs, as it provides more flexibility and control over the format and content of your log entries.

Using a Fluent Bit sidecar container is not a good option, as it adds complexity and overhead to your Cloud Run application.Fluent Bit is a lightweight log processor and forwarder that can be used to collect and parse logs from various sources and send them to different destinations3. However, Cloud Run does not support sidecar containers, so you would need to run Fluent Bit as part of your main container image. This would require modifying your Dockerfile and configuring Fluent Bit to read logs from supported locations and parse them as JSON. This is more cumbersome and less reliable than using Cloud Logging SDKs.

Using the log agent in the Cloud Run container image is not possible, as the log agent is not supported on Cloud Run. The log agent is a service that runs on Compute Engine or Google Kubernetes Engine instances and collects logs from various applications and system components. However, Cloud Run does not allow you to install or run any agents on its underlying infrastructure, as it is a fully managed service that abstracts away the details of the underlying platform.

Storing the password directly in the code is not a good practice, as it exposes sensitive information and makes it hard to change or rotate the password. It also requires rebuilding and redeploying the application each time the password changes, which adds unnecessary work and downtime.

[References:, 1:Writing structured logs | Cloud Run Documentation | Google Cloud, 2:Write structured logs | Cloud Run Documentation | Google Cloud, 3: Fluent Bit - Fast and Lightweight Log Processor & Forwarder, : Logging Best Practices for Serverless Applications - Google Codelabs, : About the logging agent | Cloud Logging Documentation | Google Cloud, : Cloud Run FAQ | Google Cloud, , , , , ]

Questions 38

You are using Stackdriver to monitor applications hosted on Google Cloud Platform (GCP). You recently deployed a new application, but its logs are not appearing on the Stackdriver dashboard.

You need to troubleshoot the issue. What should you do?

Options:

Confirm that the Stackdriver agent has been installed in the hosting virtual machine.

Confirm that your account has the proper permissions to use the Stackdriver dashboard.

Confirm that port 25 has been opened in the firewall to allow messages through to Stackdriver.

Confirm that the application is using the required client library and the service account key has proper permissions.

Buy Now

Questions 39

You need to run a business-critical workload on a fixed set of Compute Engine instances for several months. The workload is stable with the exact amount of resources allocated to it. You want to lower the costs for this workload without any performance implications. What should you do?

Options:

Purchase Committed Use Discounts.

Migrate the instances to a Managed Instance Group.

Convert the instances to preemptible virtual machines.

Create an Unmanaged Instance Group for the instances used to run the workload.

Buy Now

Questions 40

You support a user-facing web application When analyzing the application's error budget over the previous six months you notice that the application never consumed more than 5% of its error budget You hold a SLO review with business stakeholders and confirm that the SLO is set appropriately You want your application's reliability to more closely reflect its SLO What steps can you take to further that goal while balancing velocity, reliability, and business needs?

Choose 2 answers

Options:

Add more serving capacity to all of your application's zones

Implement and measure all other available SLIs for the application

Announce planned downtime to consume more error budget and ensure that users are not depending on a tighter SLO

Have more frequent or potentially risky application releases

Tighten the SLO to match the application's observed reliability

Buy Now

Questions 41

Your company stores a large volume of infrequently used data in Cloud Storage. The projects in your company's CustomerService folder access Cloud Storage frequently, but store very little data. You want to enable Data Access audit logging across the company to identify data usage patterns. You need to exclude the CustomerService folder projects from Data Access audit logging. What should you do?

Options:

Enable Data Access audit logging for Cloud Storage for all projects and folders, and configure exempted principals to include users of the CustomerService folder.

Enable Data Access audit logging for Cloud Storage at the organization level, with no additional configuration.

Enable Data Access audit logging for Cloud Storage at the organization level, and configure exempted principals to include users of the CustomerService folder.

Enable Data Access audit logging for Cloud Storage for all projects and folders other than the CustomerService folder.

Buy Now

Questions 42

You are working with a government agency that requires you to archive application logs for seven years. You need to configure Stackdriver to export and store the logs while minimizing costs of storage. What should you do?

Options:

Create a Cloud Storage bucket and develop your application to send logs directly to the bucket.

Develop an App Engine application that pulls the logs from Stackdriver and saves them in BigQuery.

Create an export in Stackdriver and configure Cloud Pub/Sub to store logs in permanent storage for seven years.

Create a sink in Stackdriver, name it, create a bucket on Cloud Storage for storing archived logs, and then select the bucket as the log export destination.

Buy Now

Questions 43

Your company is migrating its production systems to Google Cloud. You need to implement site reliability engineering (SRE) practices during the migration to minimize customer impact from potential future incidents. Which two SRE practices should you implement?

Choose 2 answers

Options:

Ensure that full autonomy and permissions are only granted to the on-call team.

Automate common tasks to analyze key impact information and intelligently suggest mitigating actions for the on-call team.

Ensure that all teams can modify the production environment to resolve issues.

Create an alerting mechanism for your SRE team based on your system's internal behavior.

Create up-to-date playbooks with instructions for debugging and mitigating issues.

Buy Now

Answer:

B, E

Explanation:

Comprehensive and Detailed Explanation From General SRE Principles and Google Cloud Knowledge:

Site Reliability Engineering (SRE) emphasizes reliability, automation, and a data-driven approach to operations. The goal is to minimize the "time to detect" (TTD) and "time to resolve" (TTR) for incidents.

Option A (Ensure that full autonomy and permissions are only granted to the on-call team): While the on-call team needs appropriate permissions to act decisively during an incident, granting full autonomy and only to them can be a bottleneck and goes against the principle of least privilege if not carefully scoped. Broader teams might need specific, controlled access for their responsibilities. SRE encourages empowering teams but within a structured framework.

Option B (Automate common tasks to analyze key impact information and intelligently suggest mitigating actions for the on-call team): This is a core SRE practice. Automation reduces toil, speeds up response, and ensures consistency. Analyzing impact and suggesting mitigations helps the on-call team resolve issues faster and more effectively.

Option C (Ensure that all teams can modify the production environment to resolve issues): This is generally a bad practice and against SRE principles of controlled changes and reducing the blast radius of errors. Production changes should be managed, audited, and ideally automated, not open to modification by all teams, as this increases the risk of unintended incidents.

Option D (Create an alerting mechanism for your SRE team based on your system's internal behavior): While alerting is crucial, SRE emphasizes alerting on symptoms that affect users (Service Level Objectives - SLOs) rather than just internal behavior or causes. Alerting solely on internal behavior can lead to alert fatigue and may not correlate directly with user impact. Good alerting focuses on user-facing impact first.

Option E (Create up-to-date playbooks with instructions for debugging and mitigating issues): Playbooks (or runbooks) are essential in SRE. They document known issues, troubleshooting steps, and mitigation procedures. Keeping them up-to-date ensures that on-call engineers can respond to incidents quickly and consistently, even for less common issues, thereby minimizing customer impact.

Therefore, automating incident response tasks (B) and maintaining clear, actionable playbooks (E) are two key SRE practices to implement for minimizing customer impact.

Reference (Based on SRE principles):

The SRE books by Google (e.g., "Site Reliability Engineering: How Google Runs Production Systems") heavily emphasize automation to reduce toil and the importance of playbooks for incident management.

Google Cloud SRE solutions: https://cloud.google.com/sre

Specifically, regarding playbooks and automation:"Playbooks should be living documents, updated regularly as systems change and new incidents provide new lessons."

"SREs aim to automate repetitive tasks (toil) to free up time for engineering projects that improve reliability."

Questions 44

Your team is running microservices in Google Kubernetes Engine (GKE) You want to detect consumption of an error budget to protect customers and define release policies What should you do?

Options:

Create SLIs from metrics Enable Alert Policies if the services do not pass

Use the metrics from Anthos Service Mesh to measure the health of the microservices

Create a SLO Create an Alert Policy on select_slo_bum_rate

Create a SLO and configure uptime checks for your services Enable Alert Policies if the services do not pass

Buy Now

Questions 45

Your company's security team needs to have read-only access to Data Access audit logs in the _Required bucket You want to provide your security team with the necessary permissions following the principle of least privilege and Google-recommended practices. What should you do?

Options:

Assign the roles/logging, viewer role to each member of the security team

Assign the roles/logging. viewer role to a group with all the security team members

Assign the roles/logging.privateLogViewer role to each member of the security team

Assign the roles/logging.privateLogviewer role to a group with all the security team members

Buy Now

Questions 46

You use Terraform to manage an application deployed to a Google Cloud environment The application runs on instances deployed by a managed instance group The Terraform code is deployed by using aCI/CD pipeline When you change the machine type on the instance template used by the managed instance group, the pipeline fails at the terraform apply stage with the following error message

You need to update the instance template and minimize disruption to the application and the number of pipeline runs What should you do?

Options:

Delete the managed instance group and recreate it after updating the instance template

Add a new instance template update the managed instance group to use the new instance template and delete the old instance template

Remove the managed instance group from the Terraform state file update the instance template and reimport the managed instance group.

Set the create_bef ore_destroy meta-argument to true in the lifecycle block on the instance template

Buy Now

Questions 47

You are the Site Reliability Engineer responsible for managing your company's data services and products. You regularly navigate operational challenges, such as unpredictable data volume and high cost, with your company's data ingestion processes. You recently learned that a new data ingestion product will be developed in Google Cloud. You need to collaborate with the product development team to provide operational input on the new product. What should you do?

Options:

Deploy the prototype product in a test environment, run a load test, and share the results with the product development team.

When the initial product version passes the quality assurance phase and compliance assessments, deploy the product to a staging environment. Share error logs and performancemetrics with the product development team.

When the new product is used by at least one internal customer in production, share error logs and monitoring metrics with the product development team.

Review the design of the product with the product development team to provide feedback early in the design phase.

Buy Now

Questions 48

You use Spinnaker to deploy your application and have created a canary deployment stage in the pipeline. Your application has an in-memory cache that loads objects at start time. You want to automate the comparison of the canary version against the production version. How should you configure the canary analysis?

Options:

Compare the canary with a new deployment of the current production version.

Compare the canary with a new deployment of the previous production version.

Compare the canary with the existing deployment of the current production version.

Compare the canary with the average performance of a sliding window of previous production versions.

Buy Now

Questions 49

You manage an application that is writing logs to Stackdriver Logging. You need to give some team members the ability to export logs. What should you do?

Options:

Grant the team members the IAM role of logging.configWriter on Cloud IAM.

Configure Access Context Manager to allow only these members to export logs.

Create and grant a custom IAM role with the permissions logging.sinks.list and logging.sink.get.

Create an Organizational Policy in Cloud IAM to allow only these members to create log exports.

Buy Now

Questions 50

You support a high-traffic web application with a microservice architecture. The home page of the application displays multiple widgets containing content such as the current weather, stock prices, and news headlines. The main serving thread makes a call to a dedicated microservice for each widget and then lays out the homepage for the user. The microservices occasionally fail; when that happens, theserving thread serves the homepage with some missing content. Users of the application are unhappy if this degraded mode occurs too frequently, but they would rather have some content served instead of no content at all. You want to set a Service Level Objective (SLO) to ensure that the user experience does not degrade too much. What Service Level Indicator {SLI) should you use to measure this?

Options:

A quality SLI: the ratio of non-degraded responses to total responses

An availability SLI: the ratio of healthy microservices to the total number of microservices

A freshness SLI: the proportion of widgets that have been updated within the last 10 minutes

A latency SLI: the ratio of microservice calls that complete in under 100 ms to the total number of microservice calls

Buy Now

Questions 51

Your development team has created a new version of their service’s API. You need to deploy the new versions of the API with the least disruption to third-party developers and end users of third-party installed applications. What should you do?

Options:

Introduce the new version of the API.Announce deprecation of the old version of the API.Deprecate the old version of the API.Contact remaining users of the old API.Provide best effort support to users of the old API.Turn down the old version of the API.

Announce deprecation of the old version of the API.Introduce the new version of the API.Contact remaining users on the old API.Deprecate the old version of the API.Turn down the old version of the API.Provide best effort support to users of the old API.

Announce deprecation of the old version of the API.Contact remaining users on the old API.Introduce the new version of the API.Deprecate the old version of the API.Provide best effort support to users of the old API.Turn down the old version of the API.

Introduce the new version of the API.Contact remaining users of the old API.Announce deprecation of the old version of the API.Deprecate the old version of the API.Turn down the old version of the API.Provide best effort support to users of the old API.

Buy Now

Questions 52

Your application artifacts are being built and deployed via a CI/CD pipeline. You want the CI/CD pipeline to securely access application secrets. You also want to more easily rotate secrets in case of a security breach. What should you do?

Options:

Prompt developers for secrets at build time. Instruct developers to not store secrets at rest.

Store secrets in a separate configuration file on Git. Provide select developers with access to the configuration file.

Store secrets in Cloud Storage encrypted with a key from Cloud KMS. Provide the CI/CD pipeline with access to Cloud KMS via IAM.

Encrypt the secrets and store them in the source code repository. Store a decryption key in a separate repository and grant your pipeline access to it

Buy Now

Questions 53

You need to define Service Level Objectives (SLOs) for a high-traffic multi-region web application. Customers expect the application to always be available and have fast response times. Customers are currently happy with the application performance and availability. Based on current measurement, you observe that the 90th percentile of latency is 120ms and the 95th percentile of latency is 275ms over a 28-day window. What latency SLO would you recommend to the team to publish?

Options:

90th percentile – 100ms95th percentile – 250ms

90th percentile – 120ms95th percentile – 275ms

90th percentile – 150ms95th percentile – 300ms

90th percentile – 250ms95th percentile – 400ms

Buy Now

Questions 54

Your company operates in a highly regulated domain. Your security team requires that only trusted container images can be deployed to Google Kubernetes Engine (GKE). You need to implement a solution that meets the requirements of the security team, while minimizing management overhead. What should you do?

Options:

Grant the roles/artifactregistry. writer role to the Cloud Build service account. Confirm that no employee has Artifact Registry write permission.

Use Cloud Run to write and deploy a custom validator Enable an Eventarc trigger to perform validations when new images are uploaded.

Configure Kritis to run in your GKE clusters to enforce deploy-time security policies.

Configure Binary Authorization in your GKE clusters to enforce deploy-time security policies

Buy Now

Questions 55

You use Google Cloud Managed Service for Prometheus with managed collection to gather metrics from your service running on Google Kubernetes Engine (GKE). After deploying the service, there is no metric data appearing in Cloud Monitoring, and you have not encountered any error messages. You need to troubleshoot this issue. What should you do?

Options:

Determine if your service has exceeded its quota for writes to the Cloud Monitoring API.

Check if the Grafana service is installed on your GKE cluster.

Confirm that your service has the monitoring.servicesViewer IAM role.

Verify that your PodMonitoring configuration references a valid port.

Buy Now

Questions 56

You need to define SLOs for a high-traffic web application. Customers are currently happy with the application performance and availability. Based on current measurement, the 90th percentile Of latency is 160 ms and the 95th

percentile of latency is 300 ms over a 28-day window. What latency SLO should you publish?

Options:

90th percentile - 150 ms95th percentile - 290 ms

90th percentile - 160 ms95th percentile - 300 ms

90th percentile - 190 ms95th percentile - 330 ms

90th percentile - 300 ms95th percentile - 450 ms

Buy Now

Questions 57

You support a service that recently had an outage. The outage was caused by a new release that exhausted the service memory resources. You rolled back the release successfully to mitigate the impact on users. You are now in charge of the post-mortem for the outage. You want to follow Site Reliability Engineering practices when developing the post-mortem. What should you do?

Options:

Focus on developing new features rather than avoiding the outages from recurring.

Focus on identifying the contributing causes of the incident rather than the individual responsible for the cause.

Plan individual meetings with all the engineers involved. Determine who approved and pushed the new release to production.

Use the Git history to find the related code commit. Prevent the engineer who made that commit from working on production services.

Buy Now

Questions 58

You are designing a new multi-tenant Google Kubernetes Engine (GKE) cluster for a customer. Your customer is concerned with the risks associated with long-lived credentials use. The customer requires that each GKE workload has the minimum Identity and Access Management (IAM) permissions set following the principle of least privilege (PoLP). You need to design an IAM impersonation solution while following Google-recommended practices. What should you do?

Options:

Create a Google service account.

Create a Kubernetes service account in a Workload Identity-enabled cluster.

Link the Google service account with the Kubernetes service account by using the roles/iam.workloadIdentityUser role and iam.gke.io/gcp-service-account annotation.

Map the Kubernetes service account to the workload.

Repeat for each workload.

Create a Google service account.

Create a node pool, and set the Google service account as the default identity.

Ensure that workloads can only run on the designated node pool by using node selectors, taints, and tolerations.

Repeat for each workload.

Create a Google service account.

Create a service account key for the Google service account.

Create a Kubernetes secret with a service account key.

Ensure that workload mounts the secret and set the GOOGLE_APPLICATION_CREDENTIALS environment variable to point at the mount path.

Repeat for each workload.

Create a Google service account.

Create a node pool without taints, and set the Google service account as the default identity.

Grant IAM permissions to the Google service account.

Buy Now

Exam Code: Professional-Cloud-DevOps-Engineer

Exam Name: Google Cloud Certified - Professional Cloud DevOps Engineer Exam

Last Update: Oct 18, 2025

Questions: 194

PDF + Testing Engine

$57.75 ~~$164.99~~

Testing Engine (only)

$43.75 ~~$124.99~~

PDF (only)

$36.75 ~~$104.99~~

buy now Professional-Cloud-DevOps-Engineer

Weekend Sale - Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 65percent

dumpspedia logo

Navigation:

Professional-Cloud-DevOps-Engineer Sample Questions Answers

Options:

Answer:

Explanation:

Options:

Answer:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation: