Databricks-Certified-Data-Engineer-Associate Sample Questions Answers

Questions 4

A data engineer needs to create a table in Databricks using data from their organization's existing SQLite database. They run the following command:

CREATE TABLE jdbc_customer360

USING

OPTIONS (

url "jdbc:sqlite:/customers.db", dbtable "customer360"

)

Which line of code fills in the above blank to successfully complete the task?

Options:

autoloader

org.apache.spark.sql.jdbc

sqlite

org.apache.spark.sql.sqlite

Buy Now

Questions 5

Which of the following commands can be used to write data into a Delta table while avoiding the writing of duplicate records?

Options:

DROP

IGNORE

MERGE

APPEND

INSERT

Buy Now

Questions 6

A new data engineering team team has been assigned to an ELT project. The new data engineering team will need full privileges on the table sales to fully manage the project.

Which command can be used to grant full permissions on the database to the new data engineering team?

Options:

grant all privileges on table sales TO team;

GRANT SELECT ON TABLE sales TO team;

GRANT SELECT CREATE MODIFY ON TABLE sales TO team;

GRANT ALL PRIVILEGES ON TABLE team TO sales;

Buy Now

Questions 7

A data engineer wants to create a new table containing the names of customers who live in France.

They have written the following command:

CREATE TABLE customersInFrance

_____ AS

SELECT id,

firstName,

lastName

FROM customerLocations

WHERE country = ’FRANCE’;

A senior data engineer mentions that it is organization policy to include a table property indicating that the new table includes personally identifiable information (Pll).

Which line of code fills in the above blank to successfully complete the task?

Options:

COMMENT "Contains PIT

511

"COMMENT PII"

TBLPROPERTIES PII

Buy Now

Questions 8

In order for Structured Streaming to reliably track the exact progress of the processing so that it can handle any kind of failure by restarting and/or reprocessing, which of the following two approaches is used by Spark to record the offset range of the data being processed in each trigger?

Options:

Checkpointing and Write-ahead Logs

Structured Streaming cannot record the offset range of the data being processed in each trigger.

Replayable Sources and Idempotent Sinks

Write-ahead Logs and Idempotent Sinks

Checkpointing and Idempotent Sinks

Buy Now

Questions 9

A data engineer needs to use a Delta table as part of a data pipeline, but they do not know if they have the appropriate permissions.

In which of the following locations can the data engineer review their permissions on the table?

Options:

Databricks Filesystem

Jobs

Dashboards

Repos

Data Explorer

Buy Now

Questions 10

A data engineer only wants to execute the final block of a Python program if the Python variable day_of_week is equal to 1 and the Python variable review_period is True.

Which of the following control flow statements should the data engineer use to begin this conditionally executed code block?

Options:

if day_of_week = 1 and review_period:

if day_of_week = 1 and review_period = "True":

if day_of_week == 1 and review_period == "True":

if day_of_week == 1 and review_period:

if day_of_week = 1 & review_period: = "True":

Buy Now

Questions 11

A data engineer has a Job that has a complex run schedule, and they want to transfer that schedule to other Jobs.

Rather than manually selecting each value in the scheduling form in Databricks, which of the following tools can the data engineer use to represent and submit the schedule programmatically?

Options:

pyspark.sql.types.DateType

datetime

pyspark.sql.types.TimestampType

Cron syntax

There is no way to represent and submit this information programmatically

Buy Now

Questions 12

Which of the following code blocks will remove the rows where the value in column age is greater than 25 from the existing Delta table my_table and save the updated table?

Options:

SELECT * FROM my_table WHERE age > 25;

UPDATE my_table WHERE age > 25;

DELETE FROM my_table WHERE age > 25;

UPDATE my_table WHERE age <= 25;

DELETE FROM my_table WHERE age <= 25;

Buy Now

Questions 13

A new data engineering team team has been assigned to an ELT project. The new data engineering team will need full privileges on the table sales to fully manage the project.

Which of the following commands can be used to grant full permissions on the database to the new data engineering team?

Options:

GRANT ALL PRIVILEGES ON TABLE sales TO team;

GRANT SELECT CREATE MODIFY ON TABLE sales TO team;

GRANT SELECT ON TABLE sales TO team;

GRANT USAGE ON TABLE sales TO team;

GRANT ALL PRIVILEGES ON TABLE team TO sales;

Buy Now

Questions 14

A data engineer wants to create a relational object by pulling data from two tables. The relational object does not need to be used by other data engineers in other sessions. In order to save on storage costs, the data engineer wants to avoid copying and storing physical data.

Which of the following relational objects should the data engineer create?

Options:

Spark SQL Table

View

Database

Temporary view

Delta Table

Buy Now

Questions 15

Which of the following statements regarding the relationship between Silver tables and Bronze tables is always true?

Options:

Silver tables contain a less refined, less clean view of data than Bronze data.

Silver tables contain aggregates while Bronze data is unaggregated.

Silver tables contain more data than Bronze tables.

Silver tables contain a more refined and cleaner view of data than Bronze tables.

Silver tables contain less data than Bronze tables.

Buy Now

Questions 16

A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.

The cade block used by the data engineer is below:

If the data engineer only wants the query to execute a micro-batch to process data every 5 seconds, which of the following lines of code should the data engineer use to fill in the blank?

Options:

trigger("5 seconds")

trigger()

trigger(once="5 seconds")

trigger(processingTime="5 seconds")

trigger(continuous="5 seconds")

Buy Now

Questions 17

Which of the following data lakehouse features results in improved data quality over a traditional data lake?

Options:

A data lakehouse provides storage solutions for structured and unstructured data.

A data lakehouse supports ACID-compliant transactions.

A data lakehouse allows the use of SQL queries to examine data.

A data lakehouse stores data in open formats.

A data lakehouse enables machine learning and artificial Intelligence workloads.

Buy Now

Questions 18

A data engineering team has noticed that their Databricks SQL queries are running too slowly when they are submitted to a non-running SQL endpoint. The data engineering team wants this issue to be resolved.

Which of the following approaches can the team use to reduce the time it takes to return results in this scenario?

Options:

They can turn on the Serverless feature for the SQL endpoint and change the Spot Instance Policy to "Reliability Optimized."

They can turn on the Auto Stop feature for the SQL endpoint.

They can increase the cluster size of the SQL endpoint.

They can turn on the Serverless feature for the SQL endpoint.

They can increase the maximum bound of the SQL endpoint's scaling range

Buy Now

Questions 19

A data engineer has realized that they made a mistake when making a daily update to a table. They need to use Delta time travel to restore the table to a version that is 3 days old. However, when the data engineer attempts to time travel to the older version, they are unable to restore the data because the data files have been deleted.

Which of the following explains why the data files are no longer present?

Options:

The VACUUM command was run on the table

The TIME TRAVEL command was run on the table

The DELETE HISTORY command was run on the table

The OPTIMIZE command was nun on the table

The HISTORY command was run on the table

Buy Now

Questions 20

A data engineer runs a statement every day to copy the previous day’s sales into the table transactions. Each day’s sales are in their own file in the location "/transactions/raw".

Today, the data engineer runs the following command to complete this task:

After running the command today, the data engineer notices that the number of records in table transactions has not changed.

Which of the following describes why the statement might not have copied any new records into the table?

Options:

The format of the files to be copied were not included with the FORMAT_OPTIONS keyword.

The names of the files to be copied were not included with the FILES keyword.

The previous day’s file has already been copied into the table.

The PARQUET file format does not support COPY INTO.

The COPY INTO statement requires the table to be refreshed to view the copied rows.

Buy Now

Questions 21

Which of the following describes the storage organization of a Delta table?

Options:

Delta tables are stored in a single file that contains data, history, metadata, and other attributes.

Delta tables store their data in a single file and all metadata in a collection of files in a separate location.

Delta tables are stored in a collection of files that contain data, history, metadata, and other attributes.

Delta tables are stored in a collection of files that contain only the data stored within the table.

Delta tables are stored in a single file that contains only the data stored within the table.

Buy Now

Questions 22

Which of the following is stored in the Databricks customer's cloud account?

Options:

Databricks web application

Cluster management metadata

Repos

Data

Notebooks

Buy Now

Questions 23

A data engineer has a Python variable table_name that they would like to use in a SQL query. They want to construct a Python code block that will run the query using table_name.

They have the following incomplete code block:

____(f"SELECT customer_id, spend FROM {table_name}")

Which of the following can be used to fill in the blank to successfully complete the task?

Options:

spark.delta.sql

spark.delta.table

spark.table

dbutils.sql

spark.sql

Buy Now

Questions 24

Which tool is used by Auto Loader to process data incrementally?

Options:

Spark Structured Streaming

Unity Catalog

Checkpointing

Databricks SQL

Buy Now

Questions 25

A dataset has been defined using Delta Live Tables and includes an expectations clause:

CONSTRAINT valid_timestamp EXPECT (timestamp > '2020-01-01') ON VIOLATION DROP ROW

What is the expected behavior when a batch of data containing data that violates these constraints is processed?

Options:

Records that violate the expectation are dropped from the target dataset and loaded into a quarantine table.

Records that violate the expectation are added to the target dataset and flagged as invalid in a field added to the target dataset.

Records that violate the expectation are dropped from the target dataset and recorded as invalid in the event log.

Records that violate the expectation are added to the target dataset and recorded as invalid in the event log.

Records that violate the expectation cause the job to fail.

Buy Now

Questions 26

A data organization leader is upset about the data analysis team’s reports being different from the data engineering team’s reports. The leader believes the siloed nature of their organization’s data engineering and data analysis architectures is to blame.

Which of the following describes how a data lakehouse could alleviate this issue?

Options:

Both teams would autoscale their work as data size evolves

Both teams would use the same source of truth for their work

Both teams would reorganize to report to the same department

Both teams would be able to collaborate on projects in real-time

Both teams would respond more quickly to ad-hoc requests

Buy Now

Questions 27

A data engineer needs access to a table new_table, but they do not have the correct permissions. They can ask the table owner for permission, but they do not know who the table owner is.

Which of the following approaches can be used to identify the owner of new_table?

Options:

Review the Permissions tab in the table's page in Data Explorer

All of these options can be used to identify the owner of the table

Review the Owner field in the table's page in Data Explorer

Review the Owner field in the table's page in the cloud storage solution

There is no way to identify the owner of the table

Buy Now

Questions 28

A data engineering team has two tables. The first table march_transactions is a collection of all retail transactions in the month of March. The second table april_transactions is a collection of all retail transactions in the month of April. There are no duplicate records between the tables.

Which of the following commands should be run to create a new table all_transactions that contains all records from march_transactions and april_transactions without duplicate records?